OCR Pages

<< Click to Display Table of Contents >>

Navigation:  Actions Library > Pages Actions >

OCR Pages


ocr.pages.large.iconOCR Pages



The OCR Pages action performs optical character recognition on documents. Note that two optical character recognition engines are available in PDF-Tools: the Default OCR engine and the Enhanced OCR engine, which is available when PDF-Tools is purchased via the PDF-XChange PRO bundle. The Enhanced OCR engine is faster, more accurate and more dynamic than the default OCR engine, and it also contains some extra features. Further information about the Enhanced OCR engine is available here. If you have purchased PDF-XChange PRO, then you can use the OCR preferences detailed here to switch between the Default and Enhanced OCR engines.


This action contains the following customizable parameters:



Figure 1. OCR Pages Action Options


Select an option in the dropdown menu to determine the action taken when input documents contain text:

Select OCR Document to perform OCR on input documents.

Select Do not OCR but continue processing to omit the OCR process from the operation and continue with the remaining actions.

Select Skip processing the document to exclude the document from processing.

Click More Options to view/edit all options. The OCR Pages dialog box will open, as detailed below.

Select the Show setup dialog while running box to launch the OCR Pages dialog box and customize settings each time this action is used. Clear this box to disable the OCR Pages dialog box from opening each time the action is used, which is useful when the same settings are used consistently.


Note that the options in the OCR Pages dialog box depend on the OCR engine being used:


Default OCR Engine



Figure 2. OCR Pages Dialog Box


Use the Page Range settings to determine the page range for OCR:

Select All to specify all pages.

Select Custom to specify a custom page range. Further information on how to specify page ranges is available here.

Use the Subset option to select All Pages, Odd Pages or Even Pages as desired.

Select the Skip pages that already contain text content items box to omit pages that contain text content from the OCR process.


Use the Recognition settings to determine the language and accuracy of the OCR process. Please note that increasing the accuracy also increases the time that the process takes and vice versa. Additionally, it should be noted that setting the accuracy to high may result in unusual output if the document contains imperfections. This is because the software will search to a greater depth and may attempt to recognize imperfections as text. Click More Languages to view available language packs.


Click OK to save settings.


Enhanced OCR Engine



Figure 3. OCR Pages (Enhanced) Dialog Box


The options in this dialog box are the same as those detailed in (figure 2) but with additional Output Options, which are available in the dropdown menu:


Select Searchable Image to retain the image-based content on which OCR is performed and insert a duplicate, invisible text layer on the text recognized during the operation. This will make the source text selectable and searchable in the same manner as ordinary text.

Select Editable Text and Images to replace image-based text in source documents with the text recognized in the process of optical character recognition. This will convert image-based text into editable text.

Select Fine Page Content to replace the content of source documents with new content that contains only the text and images recognized during optical character recognition.


Click OK to OCR documents.