We outsmarted CSGO cheaters with IdentityLogger

1. DanielHB ◴[17 Oct 24 13:31 UTC] No.41869510[source]▶

>>41862028 (OP) #

I want to share a story in a somewhat related topic:

anti web-scraping techniques

The most devious version I ever seen of this, I was baffled, astonished and completely helpless:

This website I was trying to scrap generated a new font (as in a .woff file) on every request, the font had the position of the letters randomly moved around (for example, the 'J' would be in place of the 'F' character in the .woff and so on) and the text produced by the website would be encoded to match that specific font.

So every time you loaded the website you got a completely different font with a completely different text, but for the user the text would look fine because the font mapped it to the original characters. If you tried to copy-and-paste the text from the website you would get some random garbled text.

The only way I could think of to scrap that would have been to OCR the .woff font files, but OCR could easily prevent mass-scraping due to sheer processing costs.

replies(7): >>41869674 #>>41869684 #>>41869775 #>>41869796 #>>41869877 #>>41870330 #>>41871277 #

2. DaiPlusPlus ◴[17 Oct 24 13:51 UTC] No.41869674[source]▶

>>41869510 (TP) #

> easily prevent mass-scraping due to sheer processing costs.

my 2018 iPad Pro does OCR on images in Safari instantly. People only think OCR is slow because Adobe Acrobat still uses the same single-threaded OCR algo it’s had for decades now; then consider how blazing a GPU-based impl would be…

replies(2): >>41870062 #>>41870521 #

3. teraflop ◴[17 Oct 24 13:52 UTC] No.41869684[source]▶

>>41869510 (TP) #

That seems like it ought to be straightforward to defeat without OCR. If you know that a particular glyph looks like the letter J, then you just need to parse the WOFF file, find that glyph's data, and find the character that maps to it. It's definitely annoying enough to deter a casual scraper, but there's nothing conceptually difficult about it.

You do need to determine the "correct" character code for each glyph, but there are lots of ways to do that, on a spectrum from manual to automated. And you only need to do it once.

4. sebstefan ◴[17 Oct 24 14:02 UTC] No.41869775[source]▶

>>41869510 (TP) #

If it's just swapping letters then rather than trying to dive into the WOFF you could just get the garbled data and treat it as a cesar cypher, I guess. A few dozen rotations and you're through

It's kind of annoying and prone to break but I'd rather have that than whatever Facebook is doing where every class name, ID & identifiable tags in the markup gets randomly generated every once in a while

replies(2): >>41870310 #>>41872578 #

5. flerchin ◴[17 Oct 24 14:04 UTC] No.41869796[source]▶

>>41869510 (TP) #

LOL the replies are hilarious. You've sniped several nerds today. Neat story.

replies(1): >>41870094 #

6. wildpeaks ◴[17 Oct 24 14:13 UTC] No.41869877[source]▶

>>41869510 (TP) #

A downside is it makes the site unusable for screen readers and SEO, plus it adds backend costs (compared to a plain backend that serves static files) if it's generated dynamically, although one can pre-generate a bunch of variants and randomly pick one at runtime (which could be handled by the load balancer) to minimize the costs.

replies(1): >>41870860 #

7. DanielHB ◴[17 Oct 24 14:34 UTC] No.41870062[source]▶

>>41869674 #

I dunno, I never measured it. If you are scraping billions of small social media posts I would expect it to add up and make it unviable.

8. DanielHB ◴[17 Oct 24 14:37 UTC] No.41870094[source]▶

>>41869796 #

I know right? I just scraped another website instead.

I am actually surprised no one went: "actually that technique is called 'chicken ostrich sandwich' and was first employed in babylon in 2000BC"

replies(1): >>41871100 #

9. wbl ◴[17 Oct 24 15:00 UTC] No.41870310[source]▶

>>41869775 #

Could be an arbitrary permutation or worse have multiple equivalent characters. Fonts can do a lot.

10. voldacar ◴[17 Oct 24 15:01 UTC] No.41870330[source]▶

>>41869510 (TP) #

So it's a Caesar cipher, which is trivial to break. You don't need OCR or any computationally intensive solution.

replies(1): >>41870365 #

11. NoMoreNicksLeft ◴[17 Oct 24 15:05 UTC] No.41870365[source]▶

>>41870330 #

You need OCR unless you're going to personally sit there and break it by hand so you can feed the tr/// translation yourself every time you need to scrape. And it's a bit more tedious than the puzzles we did as kids, likely the punctuation and lowercase/uppercase were mixed into the slop.

replies(1): >>41870387 #

12. connicpu ◴[17 Oct 24 15:08 UTC] No.41870387{3}[source]▶

>>41870365 #

If there's a part that doesn't change, eg a footer or something, you can get a head start and have it figure out the rest by deduction with a spellchecker

replies(1): >>41870416 #

13. NoMoreNicksLeft ◴[17 Oct 24 15:12 UTC] No.41870416{4}[source]▶

>>41870387 #

You might manage to cobble together frequency analysis too, but that would be challenging. If the ciphertext is very small, or is marketspeak without any sense to its message, then that's going to fall flat. And all this assumes just ascii rather than say a (even limited) unicode font. These assholes could be doing that just to have curly quotes or whatever.

14. jakjak123 ◴[17 Oct 24 15:24 UTC] No.41870521[source]▶

>>41869674 #

It pre processes your photo library while charging

replies(1): >>41870659 #

15. ChadNauseam ◴[17 Oct 24 15:37 UTC] No.41870659{3}[source]▶

>>41870521 #

The GP mentioned it working for pictures viewed in safari

replies(2): >>41871218 #>>41877658 #

16. ksp-atlas ◴[17 Oct 24 15:58 UTC] No.41870860[source]▶

>>41869877 #

Yeah, my immediate thought was this would be bad for screen readers, plus OCR could easily defeat this

17. viciousvoxel ◴[17 Oct 24 16:22 UTC] No.41871100{3}[source]▶

>>41870094 #

Actually that technique is called a "Caesar cipher" and it has been employed since at least the 1st c. BCE.

18. hhh ◴[17 Oct 24 16:38 UTC] No.41871218{4}[source]▶

>>41870659 #

it works for any photo anywhere in the OS, same for macOS

19. ◴[17 Oct 24 16:43 UTC] No.41871277[source]▶

>>41869510 (TP) #

20. Apofis ◴[17 Oct 24 18:49 UTC] No.41872578[source]▶

>>41869775 #

That likely wouldn't work, doesn't mean the letters were simply rotated. Would probably be just random.

21. jakjak123 ◴[18 Oct 24 09:22 UTC] No.41877658{4}[source]▶

>>41870659 #

Yeah, i was thinking more about why it looks like it works so fast when you browse your photo library