PDF OCR X



Back to Articles List

Question

How do I perform OCR on documents?

How do I convert image-based documents into text-searchable documents?

Answer

($29.99)

The OCR software takes JPG, PNG, GIF images or PDF documents as input. PDF OCR supports multi-page documents and multi-column text. The only restriction of the free online OCR that the images/PDF must not be larger than 5MB. If you need to automate your OCR and process many documents, do not web-scrape this page. ‎PDF OCR X Community Edition is a simple drag-and-drop utility that converts your single-page PDFs and images into text documents or searchable PDF files. It uses advanced OCR (optical character recognition) technology to extract the text of the PDF even if that text is contained in an image. (For information on how to edit text, images, or objects in a PDF, click the appropriate link above.) Acrobat can easily turn your scanned documents into editable PDFs. When you open a scanned document for editing, Acrobat automatically runs OCR (optical character recognition) in the background and converts the document into editable image. PDF OCR X Community Edition 2.0.25: Windows: Windows XP or newer (includes Vista, 7, 8, 10 etc.) Download Now: PDF OCR X Community Edition 1.9.36: Mac: OS X 10.5 or higher (Leopard, Snow Leopard, Mountain Lion, Lion, Mavericks, Yosemite, El Capitan) Download Now: PDF OCR X Community Edition 1.9.32: Windows: Windows XP or newer (includes Vista.

Please note that OCR (optical character recognition) scans image-based documents, recognizes text and then inserts an invisible text-layer over the text. The text layer contains identical text to that recognized in the document. This means that the original, image-based text in documents can effectively be searched and selected via the invisible text layer, which is the main benefit of OCR. However, it should be noted that the document text cannot be edited in the same manner as normal, text-based documents - as it remains an image-based document, despite the invisible text layer. Follow the steps below to perform OCR:

1. Click Convert in the Ribbon Toolbar, then click OCR Page(s) in the submenu. The OCR Pages dialog box will open:

The Page Range options are as follows:

  • Select All to OCR all the pages of the document.

  • Select Current Page to OCR only the current page.

  • Use Selected Pages to OCR only the pages pre-selected from the Thumbnails pane.

  • Use the Pages box to determine specific pages of the document on which to perform the OCR process. Page range settings are detailed here.

  • Use the Subset option to select All Pages, Odd Pages Only or Even Pages Only.

The Recognition options determine the language and accuracy of the OCR process:

  • If the desired language is not available in the dropdown menu, then click Add/Remove Languages for further options. Increasing the accuracy increases the time that the process takes and vice versa. Additionally, it should be noted that setting the accuracy to high may result in unusual output if the document on which the operation is carried out features imperfections. This is because the software will search to a greater depth and may attempt to recognize imperfections as text.

The Output options determine the format of the output information from the OCR process:

  • Select one of Searchable Image, Editable Text and Images, or Fine Page Content,as desired.

    • These three options are explained in greater detail in the dropdown itself, as well as in the Manual.

  • Select the Auto Deskew option to deskew documents automatically. (Deskewing is a useful feature that straightens images that have been photographed or scanned crookedly).

Pdf Ocr X Community Edition

2. Click OK to OCR documents.Please note that it is also possible to OCR documents when scanned content or images are used to create PDF documents, seen next section.

1. Click File in the Ribbon Toolbar, then click New Document and click From Image File(s):

PDF OCR X

The Images to PDF dialog box will open:

2. Add files and determine settings as detailed here.

3. Click Options for further options. The Image to PDF Options dialog box will open. Click Image Post-Processing to view OCR options when images are converted to PDF:

4. Select the Run OCR box to OCR images when they are converted to PDF. Click OCR Settings to determine language and accuracy options, as detailed above.

1. Click File,then click New Document.

2. Click From Scanner, then click Custom Scan:

3. The Scan Properties dialog box will open:

4. Determine settings as detailed here.

5. Click Images Insertion Options to determine options for inserted images. The Image to PDF Options dialog box will open. Click Image Post-Processing to view OCR options when scanned content is converted to PDF:

6. Select the Run OCR box to OCR images when they are converted to PDF. Click OCR Settings to determine language and accuracy options, as detailed above.

Note that you can create custom tools, including the OCR or Scan actions, by following the steps in this article.

Pdf Ocr X Enterprise Edition

1. Open PDF-Tools and locate the OCR Pages tool (or your custom tool), double click it to run it:

2. Select the file(s)/Folder(s) to be processed by this tool. (You can skip this step by simply dragging and dropping the desired files directly onto the Tool mentioned in step 1)
3. The OCR Pages dialog box will open (unless your custom tool is preconfigured and set to skip this step):

The Page Range options are as follows:

  • Select All to OCR all the pages of the document.

  • Select Current Page to OCR only the current page.

  • Use Selected Pages to OCR only the pages pre-selected from the Thumbnails pane.

  • Use the Pages box to determine specific pages of the document on which to perform the OCR process. Page range settings are detailed here.

  • Use the Subset option to select All Pages, Odd Pages Only or Even Pages Only.

PDF OCR X

The Recognition options determine the language and accuracy of the OCR process:

  • If the desired language is not available in the dropdown menu, then click Add/Remove Languages for further options. Increasing the accuracy increases the time that the process takes and vice versa. Additionally, it should be noted that setting the accuracy to high may result in unusual output if the document on which the operation is carried out features imperfections. This is because the software will search to a greater depth and may attempt to recognize imperfections as text.

The Output options determine the format of the output information from the OCR process:

  • Select one of Searchable Image, Editable Text and Images, or Fine Page Content,as desired.

    • These three options are explained in greater detail in the dropdown itself, as well as in the Manual.

  • Select the Auto Deskew option to deskew documents automatically. (Deskewing is a useful feature that straightens images that have been photographed or scanned crookedly).

3. Click OK to OCR documents.Please note that it is also possible to OCR documents when scanned content or images are used to create PDF documents. You can either create a Custom tool to performing both scanning and OCR, or you can perform that step in our PDF-XChange Editor, as detailed ion the section above.

1. Click Document in the Menu Toolbar, then click OCR Pages in the submenu (or press Ctrl+Shift+C). The OCR Pages dialog box will open:

  • The Pages Range options are as follows:

  • Select All to OCR all the pages of the document.

  • Select Selected Pages to OCR only the pages currently selected in the document.

  • Select Current Page to OCR only the current page.

  • Select Pages to determine specific pages of the document on which to perform the OCR process. Enter the desired page range(s) in the text box.

  • The Recognition options determine the language and accuracy of the OCR process. If the desired language is not available in the dropdown menu, then click More Languages for further options. Increasing the accuracy increases the time that the process takes and vice versa. Additionally, it should be noted that setting the accuracy to high may result in unusual output if the document on which the operation is carried out features imperfections. This is because the software will search to a greater depth and may attempt to recognise imperfections as text.

  • The Output options determine the format of the output information from the OCR process:

  • Select Preserve Original Content & Add Text Layer to have PDF-XChange Viewer analyze the document, recognize text and then insert an invisible text-layer over the text. N.b. The text layer contains identical text to that recognized in the document. This means that the original, image-based text in documents can effectively be searched and selected via the invisible text layer, which is the main benefit of OCR. However, it should be noted that the document text cannot be edited in the same manner as normal, text-based documents - as it remains an image-based document, despite the invisible text layer.

  • Select Convert Page Content to Image only - Add Text As a Layerto convert documents that contain both images and text into a single, consolidated image. If this option is selected then use the Images Quality dropdown menu to determine the resolution in dpi (dots per inch) of the created image. N.b. If this mode is used for image-only documents, then the only change will be the resolution of the image (when the initial dpi is different from the dpi specified in the Images Quality dropdown menu - otherwise no changes will occur). Please note that output documents from this process will replace input documents. If input documents in their original format will be needed subsequently then a copy should be made before this process is performed.

2. Click OK to OCR documents.

Thanks for your feedback!