PDF OCR
Optical Character Recognition is commonly used for recognizing text in scanned documents. You can use the OCR API for recognizing texts in scanned documents, images .etc. PDF files can be created with scanned images and pictures of text, without much difference in the quality of content from the source image. OCR method is aptly used in this feature of PDF4me.
Code sample
Try the API in the language you prefer
- C#
- Java
- JavaScript
- PHP
- Python
- Ruby
// setup recognizeDocument object
var recognizeDocument = new RecognizeDocument()
{
// document
Document = new Document()
{
DocData = File.ReadAllBytes("myPdf.pdf"),
Name = "myPdf.pdf",
},
// action
OcrAction = new OcrAction()
{
OutputType = OcrActionOutputType.PdfSearchable
},
};
// conversion
var res = Pdf4me.Instance.OcrClient.RecognizeDocumentAsync(recognizeDocument);
// extract the json and write it to disk
File.WriteAllText("generatedPdf.pdf", res.StructuredDataJson);