Seems we're now at a point of time when OCR is doing so well, that printing text out and letting computers literally read it is suggested to be superior to processing the endoded text directly.
Neural networks have essentially solved perception. It doesn't matter what format your data comes in, as long as you have enough of it to learn the patterns.