Using post processing to improve OCR text recognition results

Head Owl

February 25, 2021

Cover Image for Using post processing to improve OCR text recognition results

Head Owl

February 25, 2021

While in general the OCR engines do a pretty good job these days, none of them are unfortunately perfect. There's always a case where some word or character gets continuously incorrectly detected and one has to go back to fix it.

OwlOCR introduced post-processing rules to help with such scenarios. In the current version, using the post-processing rules, you can define a simple find/replace relation. Simply put, enter an input criteria and what it should be replaced with in the results.

We try to keep things simple, but there are some occasions that can lead to confusion with this one. It helps to understand when the rules get applied.

OwlOCR flow for the text recognition is

User chooses an input file or grabs a screenshot.
User clicks the OCR page or OCR all pages buttons to start OCR.
OwlOCR runs text recognition on the images, noting the locations of objects that look like text and what most probably is written there. These are called observations.
OwlOCR applies the post process rules to each observation separately. Typically the observations are rows of text as they appear in the input image.
OwlOCR corrects the output by adding line breaks or spaces between the findings as determined by the user's settings.

Note: the post process rules are applied to each observation separately and before the lines are combined. This means that if you have a post process rule to look for "steaming hot sauna" and in your input image that phrase is split into two lines, it won't be replaced by the post processing rule. This is because when the rule is applied the lines have not yet been combined and as such the full criteria can't be matched. Future versions of OwlOCR may support changing the sequence in which the results are transformed.

More Stories

M2 Max proves stunningly fast in text recognition benchmark test

M2 Max ran full text recognition on 198 pages in just 13 seconds, Intel MBP taking over 7x longer!

Head Owl

January 25, 2023

OwlOCR 5 command line interface (CLI)

In OwlOCR 5, a command line interface (CLI) is provided for the first time. CLIs are very powerful for integration as you can call them from practically anywhere to integrate them deeper to your process, for example calling OwlOCR from a Hazel or Alfred workflow.

Head Owl

December 14, 2021

How to create searchable PDFs from photos, images and PDF files in MacOS Finder

OwlOCR has provided the tools to do this before, but with version 4.5 we are for the first time including Finder Extensions, a way to do the steps above quickly and easily, right from the Finder.

Head Owl

March 7, 2021

OwlOCR v4.5, birthday edition 🎂🎉, released!

For the first time OwlOCR actions can be used directly from the Finder. Create searchable PDFs, extract text from files to clipboard or plain text files - with only a couple clicks needed in Finder.

Head Owl

March 7, 2021

Changing line spacing to correctly run OCR text recognition on double spaced documents on the Mac

Linespacing can vary a between sources and lead unnecessary line breaks in the results. By syncing the line spacing between the app settings and source document, the results can be improved.

Head Owl

February 25, 2021

How to run OCR text recognition on an image on the Mac?

Running text recognition on images is a handy way to grab the text information from them. OwlOCR support images from clipboard, display or files.

Head Owl

February 25, 2021

How can I use OCR to capture text from the Mac screen?

Optical Character Recognition (OCR) can be used to capture text off the Mac screen. The processing can done right on your Mac for near instantaneous results, while ensuring privacy.

Head Owl

February 24, 2021

OwlOCR v4.4 released!

Another feature-packed release; post-processing, custom dictionary, more customization.

Head Owl

February 10, 2021

OwlOCR v4.3 released!

First feature release of 2021; wrapping, line spacing and multiple languages!

Head Owl

January 23, 2021

How I turned lockdown into a side project and why you should too

Death. Disease. Unemployment. Missed games and events. Disneyland closed. Weddings canceled. All bad? No!👇🏼

Head Owl

December 6, 2020