←back to thread

412 points xfeeefeee | 8 comments | | HN request time: 0.621s | source | bottom
1. Wowfunhappy ◴[] No.43750333[source]
...can I ask a really stupid question? What is a VM in this context?

I've used VM's for years to run Windows on top of macOS or Linux on top of Windows or macOS on top of macOS when I need an isolated testing environment. I also know that Java works via the "Javascript Virtual Machine" which I've always thought of as "Java code actually runs in its own lightweight operating system on top of the host OS, which makes it OS-agnostic". The JVM can't run on bare metal because it doesn't have hardware drivers, but presumably it could if you wrote those drivers.

But presumably the VM being discussed in TFA isn't that kind of VM, right? Bytedance didn't write an operating system in Javascript?

I've been seeing "VM" used in lots of contexts like this recently and it makes me think I must be missing something, but it's the sort of question I don't know how to Google. AIs have not been helpful either, plus I don't trust them.

replies(6): >>43750437 #>>43750454 #>>43753234 #>>43753422 #>>43754128 #>>43754561 #
2. jacobp100 ◴[] No.43750437[source]
Yes the VM discussed is similar to JVM
3. turtleyacht ◴[] No.43750454[source]
Virtual Machine Decompiling: https://github.com/LukasOgunfeitimi/TikTok-ReverseEngineerin...

And also VM223, with statements that do stuff to an array "stack": https://github.com/LukasOgunfeitimi/TikTok-ReverseEngineerin...

One obvious giveaway for a VM is laying out memory, or processing some intermediate language. In this case, it could be the latter.

In-browser, you have Chrome V8 running Javascript; that Javascript could be running an interpreted environment where abstractions are not purely business logic, but an execution model separate from domain stuff: auth, video, user, etc.

By that observation, this C snippet is a VM:

  char instruction = 'p'; /* or array */

  if (instruction == 'p') {
    println("document.appendChild(...)");
  }
If the program outputs to a vm.js file, it's kinda-sorta a "VM." I would call it something else, maybe a generator of sorts (for now). Just in my opinion, for me, if I were working on a VM, the threshold of calling it that would be much higher than the above.

On the other hand, if I had to comment in the generated Javascript debugging hints referring to execution stack or stack pointers, it is kind of a VM idea.

4. yjftsjthsd-h ◴[] No.43753234[source]
Nit:

> I also know that Java works via the "Javascript Virtual Machine"

Java Virtual machine. That Java and JavaScript are named the way they are is... basically a historical accident of a cross-promotion gone too far, IMO. They aren't really related (at least, in the way that the name might imply).

Now to your real question. Virtual machines are anything that is one computer pretending to be another computer. Sometimes, that's an x86_64 PC pretending to be another x86_64 PC to run a different OS. Sometimes that's an x86_64 PC pretending to be a 50-year-old mainframe ( https://opensimh.org/ really shines there). Sometimes it's an ARM laptop running macOS pretending to be an x86_64 PC so it can run Windows. And, relevant here, sometimes it's a phone pretending to be a machine that has never actually existed in hardware. You can just make up an imaginary machine that has any old characteristics you want. Maybe it has a built-in high-level network card that magically turns HTTP requests into responses without programs having to implement HTTP themselves. Maybe it has an imaginary graphics card that directly renders buttons. Maybe you imagine a CPU that runs Java opcodes directly. Whatever it is, if you can imagine a system and then write a program that emulates it, you can make a virtual machine and run stuff in it.

replies(1): >>43754852 #
5. ngneer ◴[] No.43753422[source]
This is not a stupid question. I have seen other comments on the thread that confuse the two terms and run with it. Better to ask than assume. Especially since "VM" is the same label for two or three distinct yet related notions in security.

The VM you are familiar with indeed can run an OS, and is indeed not what TikTok does.

#1 VMM - hypervisor runs VMs

#2 JVM/.NET - efficient bytecode

#3 Obfuscation - obscure bytecode

The main thing is that for #2 and #3 the machine language changes.

With "virtualization" as used in most contexts, involving a virtual machine monitor, or hypervisor, one creates zero or more new (virtual) machines, to execute on multiple software recipes. All the recipes are written in the same (machine) language, for all the machines. This can help security by introducing isolation, for example, where one VM cannot read memory belonging to another VM unless the hypervisor allows it.

With the "virtual machine" used for obfuscation, the machine language changes. The system performs the same actions as it would without obfuscation, but now it is performing those actions using a different machine language. Behaviorally, the result is the same. But, the new language makes it harder to reverse engineer the behavior.

Stupid example:

Original instruction: MOV A,B

Under hypervisor virtualization, VM0 and VM1 will perform this same instruction.

Under obfuscation virtualization, software will perform instructions that amount to the same result, but are harder to figure out. So, the MOV instruction is redefined and mapped onto a new (virtual) machine. The new machine does not simply leverage the existing instruction, rather an obfuscated sequence. For example:

A <- B + C + D * E

A <- A - C

A <- A - D * E

Obviously, the above transformation is easy to understand and undo. Others are harder to understand and undo. Look up MOVfuscator to see how crazy things may get.

6. fmxsh ◴[] No.43754128[source]
It sounds more advanced than it is.

It's a function wrapping the functionality of its host environment. Then provides the caller with its own byte code language to execute instructions. The virtual machine translates those instructions to the corresponding real functionality of the host environment (Javascript) upon execution.

This particular case is sophisticated but the idea is simple.

Correct me if I'm wrong. I'm not knowledgeable in this. This is my current understanding of it.

7. Jasper_ ◴[] No.43754561[source]
The words "virtual machine" and "interpreter" are mostly interchangeable; they both refer to a mechanism to run a computer program not by compiling it to machine code, but to some intermediate "virtual" machine code which will then get run. The terminology is new, but the idea is older, "P-code" was the term we used to use before it fell out of favor.

Sun popularized the term "virtual machine" when marketing Java instead of using "interpreter" or "P-code", both for marketing reasons (VMware had just come on the scene and was making tech headlines), but also to get away from the perception of classic interpreters being slower than native code since Java had a JIT compiler. Just-in-time compilers that compiled to the host's machine code at runtime were well-known in research domains at the time, but were much less popular than the more dominant execution models of "AST interpreter" and "bytecode interpreter".

There might be some gatekeepers that suggest that "interpreter" means AST interpreter (not true for the Python interpreter, for instance), or VM always means JIT compiled (not true for Ruby, which calls its bytecode-based MRI "RubyVM" in a few places), but you can ignore them.

8. Wowfunhappy ◴[] No.43754852[source]
> Java Virtual machine. That Java and JavaScript are named the way they are is... basically a historical accident of a cross-promotion gone too far

Oops, that was a typo! Thank you.