Versatile, Reliable OCR Pre-Processing for Pain-Free Digitization

Optical Character Recognition technology, usually abbreviated as OCR, has caused a revolution in every field dealing with any kind of written archive. OCR technology has long been in use in one form or another, but recent technical advances have put

Optical Character Recognition technology, usually abbreviated as OCR, has caused a revolution in every field dealing with any kind of written archive. OCR technology has long been in use in one form or another, but recent technical advances have put

OCR works by scanning through a document and recognizing each letter, converting the text into a digital form that can be stored and, just as importantly, quickly searched and indexed. This process also serves to preserve the original as a high-resolution digital image. Valuable as it is, however, OCR can often be frustrating, time-consuming, and error-prone. This is often caused by difficulties in the quality of the image the software is being called on to digitize. Documents often suffer from ink spatter, smeared letters, degraded paper, faded ink, spotty background, and many other issues that confuse the character-recognition algorithms.

Inter-language difficulties can also present a problem; software solutions tend to be calibrated for one particular alphabet or language (for example, using a language's lexicon to guess what an ambiguous character should be read as). This is can be a very serious issue in multilingual documents such as foreign-language dictionaries, many academic works containing Latin, French, Greek, or other languages, and a whole host of other non-monoglot texts.

Much of this trouble can be mitigated or avoided entirely with good pre-processing software. Pre-processing "cleans up" the image the OCR software is to work on in order to greatly improve the chances of successful recognition. It helps turn texts that are badly spattered, spotted, faded, degraded by time, or damaged into more ideal, high-contrast text a computer can more easily work with. It also produces a cleaned-up image of the original that is often much easier for the human eye to read directly as well.

Pre-processing software is generally specialized toward a particular type of task, such as cleaning up texts in a particular language, or clarifying handwriting, or printed letters, or certain fonts. This limited flexibility is one of the major limitations of most such applications.

Novatext LTD's new Novatext pre-processing software, however, is capable of performing crystal-clear text cleanup for any language, either printed or handwritten. This includes highly cursive scripts like Arabic and Farsi, ideographic writing such as Japanese and Chinese, and both the cursive and the block forms of the various Roman alphabets. Cyrillic? No problem. Modern alphabetic Korean? Ancient or modern Greek? It doesn't matter; the software is language-independent.

And because OCR users often have very high volumes of text to process, Novatext offers automated batch processing and archiving. This feature alone can eliminate untold quantities of tedious man-hours spent going through each image individually, streamlining the process and making it much more reliable. There's no set packet size for a batch, either, meaning that huge quantities of information can be cleaned up and readied without any additional input once the process is started.

These advances in technology have given us the opportunity to preserve the collected knowledge and history of the human race more easily and accessibly than ever before. Don't let the stuff of History fall victim to Time.


Dimitriou Mitropoulou, 17
Strovolos, 2013, Nicosia, Cyprus,


More Press Releases