PDF to Text OCR
Version 1.0 released 09/28/2015
PDF-to-Text OCR is a program to convert scanned Adobe PDF documents into plain text (*.txt) format with a minimum loss of formatting information. The product implements optical character recognition algorithm and so it can extract text from any kind of graphics used in PDF documents (photos, pictures, charts, etc). Command line support allows to script, automate and schedule the conversion process.
OCR or Optical Character Recognition is an technique of extracting text from graphics. It allows to convert scanned paper documents or images made by digital camera into editable and searchable data.
OCR is based on quite sophisticated algorithms. First, the input document is analyzed and divided into small blocks: lines, words and finally separate symbols. Then each symbol is compared with a set of graphic patterns to find out what letter or digit it is. Here might the problem appear: sometime it is hard to destinguish one symbol from another. For example letter 'O' is similar to zero digit. Statistical analysis helps to solve such problems. The algorithm analyzes frequency tables of numerous symbol combinations. Assume that ambiguous symbol from example above follows the symbol recognized as 'N'. Combination of 'N' and letter 'O' appears more frequently than the same with zero digit. That's why the symbol will be recognized as letter 'O'.
Try before you buy
Still not sure the program fits your needs? Try free demo version with limited features. It replaces random characters in the destination document with asterisks. Test the quality of the demo and come back to place an order if satisfied with the results.
Intelligent Converters software is distributed through downloading from our server only. For online credit card purchasing select the desired software package in the table below and click the corresponding ORDER NOW link. To learn about alternative payment options please visit our Ordering Page.
order single product