←back to thread

101 points kozmonaut | 10 comments | | HN request time: 0s | source | bottom
1. spwa4 ◴[] No.45393748[source]
Luckily one thing LLMs with image input are ridiculously good at is piracy. You want to get a book off a kindle? Easier than with a real book, easily.

What amazon could block is getting books from other sources onto a kindle. But there's plenty of devices. I use an iPad.

replies(2): >>45393904 #>>45394196 #
2. duskwuff ◴[] No.45393904[source]
An LLM? Just what I always wanted - an OCR tool that hallucinates.
replies(2): >>45394155 #>>45394243 #
3. spookie ◴[] No.45394155[source]
While dictionaries are often reliable enough to get the correct words back when some glyphs are misrecognised I wonder if some type of LLM would help in some cases. Not a worry in modern digital first documents though.
4. qmr ◴[] No.45394196[source]
I've tried to coax ChatGPT to do this but have not been successful beyond cover shots random page views.
replies(1): >>45394706 #
5. spwa4 ◴[] No.45394243[source]
You don't? Think about it. If your picture/source data is not perfectly clear ... what do you want? We all want perfection, but if you can't have that ...

Would you prefer what current OCR does and just suddenly sentences go 2#!@%7Q&*@3 ladfk !@$?

Or would you rather have a reasonable completion of a sentence that is nearly always (but not quite always) correct, that even actually takes the context into account?

replies(2): >>45394259 #>>45394549 #
6. duskwuff ◴[] No.45394259{3}[source]
> Would you prefer what current OCR does and just suddenly sentences go 2#!@%7Q&*@3 ladfk !@$?

Yes, actually. I'd rather be aware that the OCR tool failed somewhere than have the tool silently fabricate part of the text, or "correct" perceived errors which were present in the source document.

replies(1): >>45394417 #
7. boredhedgehog ◴[] No.45394417{4}[source]
But you aren't aware, because the OCR doesn't know that it failed. You would have to go through the entire text by hand to fix the corruptions, but that's too much work, so you won't, and the corruptions stay in.

In practice and at scale, the guesses of the LLM are the superior outcome.

replies(1): >>45394712 #
8. akho ◴[] No.45394549{3}[source]
Your picture of a ebook is perfectly clear.
9. spwa4 ◴[] No.45394706[source]
I have. Hell, these days, both ChatGPT-5 and GPT-OSS are very good at taking my writing (as in paper writing) and, as long as you specify step-by-step what to do, get it through. Either discussing those with me in voice, or correcting assignments I make on paper. I use it to practice language and math.

Oh, and to pirate textbooks. The issue is that an LLM-entered (as in in context) version of part of a textbook is something that I can talk to, write to, and have it judge my skills. Normally I'd have to find someone who'd be willing to spend a short time talking to me about a subject, and correct me, and who's willing to spend hours correcting assignments from me. Even when paying, that's essentially unavailable.

Now I take a few pages, let's say up to a chapter but usually less, load it into ChatGPT-5, tell it to ask me progressively harder questions when I activate voice mode. Or I take one of those for-teacher "how to grade X" notes, write an assignment, scan the whole thing into ChatGPT, and tell it to correct my assignment, justifying everything on the teacher note and deliver a final grade. I tell it to be way too strict, and this has helped me, among other things, get very good, and one perfect score on language certs. I can prove that I am fluent in 4 languages (en, fr, de, and my mother tongue). If we're talking anything but specialized language it's even true.

10. thaumasiotes ◴[] No.45394712{5}[source]
> But you aren't aware, because the OCR doesn't know that it failed. You would have to go through the entire text by hand to fix the corruptions, but that's too much work, so you won't, and the corruptions stay in.

Well, if you assume that you're never going to read the book, then sure. But in that case it's even more efficient to not OCR the book either. You'll never know the difference.

If you do read the book, you'll know where the failures are. And they're easy to correct if you can edit the document. I usually file reports of printing errors in Kindle books when I encounter them.

(Do the errors get corrected? No.)