Google’s indexing of scanned pages should help books

Google has announced that it is now indexing pages that are stored as scanned images rather than text. For many older books, whose electronic format might simply be the scanned page images, this could be a boon.  It will allow material inside the book to be located from a Google search where, until now, the contents have been pretty much hidden.

Google uses OCR (Optical Character Recognition) technology to do this and, while the resulting text will not be a perfect record of what’s on the page, Google says it now feels confident enough of its accuracy to launch this new service.

One limitation: it appears the only images Google is indexing at this stage are those that have been scanned into the PDF format. No mention of JPEG or other common image formats.

