Optical character recognition on the N900


This week I decided to spend some time playing with something a little different on my n900. Namely optical character recognition.
This was inspired by a demo by Cybercomchannel called phototranslator. It looks cool and I’m looking forward to them making it availiable for people to try. However I am not a patient man… So considering they mentioned they simply used Tesseract I figured I could just have a go myself.

I required no particularly special skills to do this, I already had a fremantle scratchbox environment setup, even though I don’t really need it for Witter. So I downloaded tesseract into scratchbox, did a ./configure, make, make install and presto it built no problem.
Then I realised it only works on tif images, but the n900 camera spits out jpgs. After a short search I found convert from the Imagemagik tools.

Another simple download and compile and I was now able to convert jpg to tif. I copied the files accross, and quickly found the libraries that also needed copying as they weren’t initially found.

Tools in hand I knocked up a simple script to tie them together.

ocr.sh:
export LD_LIBRARY_PATH=LD_LIBRARY_PATH:/usr/local/lib
echo “converting image”
convert $1.jpg /tmp/$1.tif
echo “recognising text”
tesseract /tmp/$1.tif /home/user/MyDocs/$1
echo “text written to /home/user/MyDocs/”
rm /tmp/$1.tif
leafpad /home/user/MyDocs/$1.txt

first exports the library path to pick up where I put the imagemagik libs. Then converts to a temp file, before running tesseract, and finally launching leafpad with the output.

This isn’t a slick script, but does mean I can just have a terminal open in the /home/user/MyDocs/DCIM folder and run
ocr.sh 20100307_001
note no .jpg extension makes it easier for the script to handle without and messing around.

I’ve had a lot of problems with convert crashing out failing to perform the conversion. Not sure why, but normally modifying brightness/contrast in the source is enough to make it work. Sometimes I have to specify the -monochrome option on convert. So far I’ve not failed to be able to convert an image, it just sometimes takes more tries than I’d like.

20100307_001-7622398

test1cropped-7426385

Some examples of it in action.
Test1 – glossy magazine text
source image:
Note the flash reflection, I was careful to keep this away from the text I wanted to OCR.
Cropped image:

It’s important to keep the image as cropped as possible to the text to be recognised.

test1brightnessandcontrast-8538866

Adjusted for brightness & contrast

I found on images like this it’s helpful to turn up the contrast and turn the brightness down.

Resulting ocr text:
“A Each will accept a 5/8″ shanl< tool. But Sovereign is not
juétra lighdle it is a total system. It comes complete with 3/8″
fand l/2″ collet adaptors allowing tools with those shank .
diameters to be fitted securely. That means it will take an array
of spindle and bowl gouges; To add to the versatility we have
also adapted a couple of highly popular hollowing tools- the
hollowmaster and multi tip hollowing t0ol.»Ihese are now _ _
available in three lengths and without handle to make the i
Sovereign System one very practical and
it”

As you can see it’s not perfect, but really pretty good. The process is not that lengthy either. Perhaps 20 seconds for convert and ocr to run.

Then I tried some plain black text on white background

20100307_002-2087378

test2brightnessandcontrast-4209985

Which i also adjusted for brightness/contrast

Which got these results:
“The Championships Wimbledon 20l0
The Wingheld Restaurant is the only bookable, waiter-served restaurant offering a 3·course lunch with wine and mineral water
for £6O per person, including service. To make a reservation for a maximum of six guests per table, piease visit our website
www.fmccatcrlng.c¤.uk and click on “Food and Drink atWimbledon”. The Reservations page will own from Monday l5th
February 20lO.
Your reservation will be confirmed once you have completed the on·line booking and payment form and operates on a striittiy
first come first served basis.After the transaction is complete, you will receive a confirmation email which you shouid keep safe
and bring it with you on the day of your reservation.
if you do not have access to the internet, we will still accept reservations by fax on O20 8944 2253 or by letter toThe Reservations
Manages; Facilities Management Catering Ltd., Church Road,Wimbledon, London SW l 9 SAE. Please remember to include aii your
contact details, the date you would like a reservation and for how many people. Cheques should be made payable to Compass
Services UK Ltd. Confirmation of non-internet reservations will be sent out during the third week of May.
Reservations may not be made by telephone but if you have a query on a confirmed booking, you may telephone 020 S24? liu;}
from Monday 26th April. The dress code is smart casual (no jeans) and we are unable to accept msemuens for
s The restaurant opens at l l.l Sam and we do not allocate individual seating arrangements prior m your awmio yi _ y T r .ve i
~ E _y°. s ; .i i C; _ liiv y T s Official caterers to The Championships, Wimbledon C s yl _ if yyes v it Qi _‘ .. e yipyr y lsly g

This is pretty much best case conditions and I think it does a really good job.

test3contrast-9598332

I did also try a hand written test

But it only manged to detect “over #06 lazy”
Which I guess considering my handwriting is pretty good too 🙂

Sadly I lack the time to turn all this into a consumable download for others. Which brings me to a realisation that some people missunderstand what is meant when the n900 is referred to as a great developer phone. I’ve seen people complaining that there aren’t great apps available so it’s not great at all. But the point is that this is a fantastic phone for *me* and others like me. The fact that I could just grab some open source software and put this together is awesome. But I don’t have time to wrap it up in a nice package for other people, and I have no great motivation to do so. Anyone prepared to invest a little time and effort can do amazing things with this device, but if you are just sitting and waiting for someone else to put in that effort and give you something on a plate, you just might be waiting for some time.


13 responses to “Optical character recognition on the N900”

  1. A trick for your future shell scripts: stripping away filename extensions
    #!/bin/sh
    # assuming you know the extension will be jpg
    filename_with_no_extension=${1%.jpg}
    echo $filename_with_no_extension

    # assuming the extension can be anything
    filename_with_no_extension=${1%.*}
    echo $filename_with_no_extension

  2. Pixelpipe supply a pipe into Google Docs OCR.
    I take a photo of the text, share it to pixelpipe with the tag @ocr and a few minutes later it’s (hopefully) appeared in my Google Docs account.

    Works pretty well for english stuff. Not tried on any other language so ymmv.

    • That’s cool, I’ll have to give that a go. Do you have to pre-crop the image to get good results?

      • I guess it depends on how good the source is but it seems to work pretty well by just taking the photo and sending it off. Though a lot of the stuff I’ve done is on plain white paper. I found newspapers produce more errors. Best to experiment and see how it goes.
        I did a bit of investigation and it only supports latin characters which would explain why it didn’t work when i tried it on some chinese text I had to translate 🙂

  3. i tried to download tesseract & imagemagic;as for tesseract i only found .tar files & i don;t know how to install them
    & the imagemagic i didn’t find the deb files, only .tar & .zip files & i don’t know how to install them as i said
    wish you can help me here.
    would be great if you post the links for the n900 downloads
    thanks in advance.

    • kind of the point of the post is that there are no n900 downloads for either of those projects. but there are source downloads which can be compiled in a fremantle development enironment. which is what I did. this is not something I’d expect the average person to have a clue how to do.

  4. MICR and Associated Technology…
    The Sort-A-Matic system included 100 metal or leather dividers numbered 00 through 99. Each check was placed in the corresponding divider by the first two numbers of the account. When the process was complete, the checks were grouped by account number….

  5. Any idea which version of Tesseract you used? I’m assuming you downloaded the source and produced a debian armel target?
    I’m asking because I’ve been working on a project using the tesseract-ocr package from extras-devel on Maemo.org, which I think was compiled from Tesseract 2.04 source. I’ve been having problems left right and centre with it, and from the sound of your post it seems like it’d be quite easy to port Tesseract 3.00 to Fremantle using the same method you used. Which would be a lot easier to use and add training data to.

    David.

    • Sorry I don’t really recall. I suspect I just pulled the latest source code and compiled in my development environment. I don’t think I even built a package, just copied the binaries accross to my device. Sadly the device has since been wiped clean and is now being used by my wife so I have no way to double check. Good luck with your project.

Leave a Reply

Your email address will not be published. Required fields are marked *