Reverse engineering the obfuscated TikTok VM

1. kleiba ◴[21 Apr 25 07:38 UTC] No.43749288[source]▶

I've been using a shitty streaming website whose player interrupts the playback of a video in irregular intervals and presents a cryptic error message. I've started looking into the JavaScript code to see if I can't code up a work-around mechanism (basically debugging their garbage implementation), and of course (why actually?) their player code is also obfuscated.

And I've gotta say, emplying an AI assistant has proven to be an invaluable help in trying to understand obfuscated code. It's actually really cool to take a function of gobbledegook JavaScript and ask the AI to rewrite it in a more canonical and easily understandable way, with inline comments. Of course, there are flaws every now and then, but the ability to do this has been such a game changer for reverse engineering, IMO.

I can even ask to take a guess at finding better variable/function names and the AI can infer from the code (maybe has seen the unobfuscated libraries during training?) what this code is actually doing on a high-level and turn something like e.g(e.g) into player.initialize(player.state) which is nothing short of amazing.

So for anyone doing similar work, I cannot recommend highly enough to have an AI agent as another tool in your tool belt.

replies(4): >>43749332 #>>43750153 #>>43750771 #>>43758666 #

2. lukan ◴[21 Apr 25 07:46 UTC] No.43749332[source]▶

>>43749288 (TP) #

Which AI agents did you use?

replies(1): >>43749368 #

3. kleiba ◴[21 Apr 25 07:51 UTC] No.43749368[source]▶

>>43749332 #

I've tried different ones, they all seem to do a great job.

replies(4): >>43749490 #>>43749664 #>>43752083 #>>43757415 #

4. klabetron ◴[21 Apr 25 08:15 UTC] No.43749490{3}[source]▶

>>43749368 #

Out of curiosity (as someone disappointingly new to prompt engineering), what’s an example prompt you used with some success?

replies(3): >>43750158 #>>43750239 #>>43753794 #

5. sureIy ◴[21 Apr 25 08:45 UTC] No.43749664{3}[source]▶

>>43749368 #

Could you name a couple?

6. saagarjha ◴[21 Apr 25 10:03 UTC] No.43750153[source]▶

>>43749288 (TP) #

Is it truly obfuscated, or just minified?

replies(1): >>43751235 #

7. esseph ◴[21 Apr 25 10:04 UTC] No.43750158{4}[source]▶

>>43749490 #

Ask questions. Be disappointed in the outcomes.

Ask more questions. Get some right answers. Repeat.

Make question asking muscle get swole.

8. nurettin ◴[21 Apr 25 10:20 UTC] No.43750239{4}[source]▶

>>43749490 #

Actually knowing the subject and presenting insights gives me much better results than simply asking it to do what I mean.

9. poincaredisk ◴[21 Apr 25 11:38 UTC] No.43750771[source]▶

>>43749288 (TP) #

I'm surprised by this. As a professional reverse engineering I've actually found LLMs to be terrible at deobfuscation of JS (especially in the context of JS malware). But maybe my requirements are higher and it's actually OK for occasional use against weak packers?

replies(2): >>43751627 #>>43754398 #

10. johann8384 ◴[21 Apr 25 12:32 UTC] No.43751235[source]▶

>>43750153 #

Well the example in the article was obfuscated with several specific examples.

replies(1): >>43754973 #

11. Bilal_io ◴[21 Apr 25 13:10 UTC] No.43751627[source]▶

>>43750771 #

I've used it for small files and it did very well prettifying, naming the variables and adding comments for context. But I can imagine it doing a bad job with large files.

12. ImPostingOnHN ◴[21 Apr 25 13:58 UTC] No.43752083{3}[source]▶

>>43749368 #

next up is using AI to obfuscate it better in the first place, and then the terrible code gets scraped and used in further training, with an arms race ensuing, until all code on the internet is unintelligible but somehow works and can only be maintained by a specific AI that has a particularly encoded form of insanity

13. Loughla ◴[21 Apr 25 16:31 UTC] No.43753794{4}[source]▶

>>43749490 #

For help with prompt engineering, take a graduate level grant writing course. It teaches you how to ask the right questions to get answers from humans and how to break down complicated processes into bite size pieces; really useable for llm's.

replies(1): >>43755274 #

14. ctoth ◴[21 Apr 25 17:34 UTC] No.43754398[source]▶

>>43750771 #

Have you seen this?

https://github.com/jehna/humanify

What they do is ground the LLM to the AST with Babel to ensure you still get the same shape of AST out of your deobfuscation pass. Probably this tool could be cleaned up, made to work with multiple llm and parser backends, have its prompts improved, &c.

replies(1): >>43780986 #

15. saagarjha ◴[21 Apr 25 18:31 UTC] No.43754973{3}[source]▶

>>43751235 #

I mean the JavaScript the LLM reversed for them

16. specialist ◴[21 Apr 25 19:00 UTC] No.43755274{5}[source]▶

>>43753794 #

Heh. Probably also useful should a djinn ever grant you three wishes.

17. titaphraz ◴[21 Apr 25 22:57 UTC] No.43757415{3}[source]▶

>>43749368 #

> they all seem to do a great job

Yeah right.

18. pcwalton ◴[22 Apr 25 02:51 UTC] No.43758666[source]▶

>>43749288 (TP) #

I tried ChatGPT 4o to help me reverse engineer some game code with the symbols missing and the results were quite disappointing. To say it had a tendency to hallucinate is an understatement. It didn't have any clue what was going on.

For me, those AI tools are much better at saving me time looking up documentation when doing simple things where it has examples of the exact code pattern I'm looking for in its training set. ChatGPT is great at writing one-off Blender scripts for me to give to artists, for instance.

19. rfoo ◴[24 Apr 25 10:16 UTC] No.43780986{3}[source]▶

>>43754398 #

This is great idea! But it's more about having LLMs to give function & variables names, instead of having LLM to deobfuscate. The (traditional) deobfuscations (e.g. unpack, de-flatten, de-virtualization etc) were done by 100% precise human made Babel plugins and is totally unrelated to a LLM.