Automatically correct skew in images containing text



I wanted to crop out a specific rectangle in a few dozen scanned documents with ImageMagick like this:
convert -crop 1600x1880+100+420 image.nrm.png cropped.png
Scanning often results in a tiny skew, which would lead to a slightly different rectangle location on every image. It seems that a common preprocessing step when doing OCR is to automatically correct this skew. The Python toolset ocropy ((https://github.com/tmbdev/ocropy)) for example contains a tool to do this: ./ocropus-nlbin image.jpg creates the file image.nrm.png which is optimized for OTR and has corrected skew.

Leave a Reply

Your email address will not be published. Required fields are marked *