The Amazon Kindle War Against Piracy

(goodereader.com)

101 points kozmonaut | 5 comments | 27 Sep 25 06:02 UTC | HN request time: 0s | source

Show context

spwa4 ◴[27 Sep 25 07:07 UTC] No.45393748[source]▶

Luckily one thing LLMs with image input are ridiculously good at is piracy. You want to get a book off a kindle? Easier than with a real book, easily.

What amazon could block is getting books from other sources onto a kindle. But there's plenty of devices. I use an iPad.

replies(2): >>45393904 #>>45394196 #

duskwuff ◴[27 Sep 25 07:47 UTC] No.45393904[source]▶

>>45393748 #

An LLM? Just what I always wanted - an OCR tool that hallucinates.

replies(2): >>45394155 #>>45394243 #

1. spwa4 ◴[27 Sep 25 09:01 UTC] No.45394243[source]▶

>>45393904 #

You don't? Think about it. If your picture/source data is not perfectly clear ... what do you want? We all want perfection, but if you can't have that ...

Would you prefer what current OCR does and just suddenly sentences go 2#!@%7Q&*@3 ladfk !@$?

Or would you rather have a reasonable completion of a sentence that is nearly always (but not quite always) correct, that even actually takes the context into account?

replies(2): >>45394259 #>>45394549 #

2. duskwuff ◴[27 Sep 25 09:03 UTC] No.45394259[source]▶

>>45394243 (TP) #

> Would you prefer what current OCR does and just suddenly sentences go 2#!@%7Q&*@3 ladfk !@$?

Yes, actually. I'd rather be aware that the OCR tool failed somewhere than have the tool silently fabricate part of the text, or "correct" perceived errors which were present in the source document.

replies(1): >>45394417 #

3. boredhedgehog ◴[27 Sep 25 09:43 UTC] No.45394417[source]▶

>>45394259 #

But you aren't aware, because the OCR doesn't know that it failed. You would have to go through the entire text by hand to fix the corruptions, but that's too much work, so you won't, and the corruptions stay in.

In practice and at scale, the guesses of the LLM are the superior outcome.

replies(1): >>45394712 #

4. akho ◴[27 Sep 25 10:16 UTC] No.45394549[source]▶

>>45394243 (TP) #

Your picture of a ebook is perfectly clear.

5. thaumasiotes ◴[27 Sep 25 10:57 UTC] No.45394712{3}[source]▶

>>45394417 #

> But you aren't aware, because the OCR doesn't know that it failed. You would have to go through the entire text by hand to fix the corruptions, but that's too much work, so you won't, and the corruptions stay in.

Well, if you assume that you're never going to read the book, then sure. But in that case it's even more efficient to not OCR the book either. You'll never know the difference.

If you do read the book, you'll know where the failures are. And they're easy to correct if you can edit the document. I usually file reports of printing errors in Kindle books when I encounter them.

(Do the errors get corrected? No.)

↑