←back to thread

268 points aapoalas | 2 comments | | HN request time: 0s | source

We're building a different kind of JavaScript engine, based on data-oriented design and willingness to try something quite out of left field. This is most concretely visible in our major architectural choices:

1. All data allocated on the JavaScript heap is placed into a type-specific vector. Numbers go into the numbers vector, strings into the strings vector, and so on.

2. All heap references are type-discriminated indexes: A heap number is identified by its discriminant value and the index to which it points to in the numbers vector.

3. Objects are also split up into object kind -specific vectors. Ordinary objects go into one vector, Arrays go into another, DataViews into yet another, and so on.

4. Unordinary objects' heap data does not contain ordinary object data but instead they contain an optional index to the ordinary objects vector.

5. Objects are aggressively split into parts to avoid common use-cases having to reading parts that are known to be unused.

If this sounds interesting, I've written a few blog posts on the internals of Nova over in our blog, you can jump into that here: https://trynova.dev/blog/what-is-the-nova-javascript-engine

Show context
Etheryte ◴[] No.42171237[source]
Architectural choices are interesting to talk about, but I think most people reading this won't have any context to compare against, me included. How does this compare to e.g. the architecture of V8? What benefits do these choices give when compared against other engines? Etc, reading through the list it's easy to nod along, but it's hard to actually have an intuition about whether these are good choices or not.
replies(3): >>42171249 #>>42171285 #>>42171820 #
VPenkov ◴[] No.42171285[source]
They seem to have a blog post on that: https://trynova.dev/blog/why-build-a-js-engine

It reads like an experimental approach because someone decided to will it into existence. That and to see if they can achieve better performance because of the architectural choices.

> Luckily, we do have an idea, a new spin on the ECMAScript specification. The starting point is data-oriented design (...)

> So, when you read a cache line you should aim for the entire cache line to be used. The best data structure in the world, bar none, is the humble vector (...)

> So what we want to explore is then: What sort of an engine do you get when almost everything is a vector or an index into a vector, and data structures are optimised for cache line usage? Join us in finding out (...)

replies(3): >>42171319 #>>42171414 #>>42175089 #
1. lucms_ ◴[] No.42175089[source]
Isn't data-oriented design about organizing the data in a way that reflects the most common access patterns of the program? The approach of placing all numbers in a big number vector, all Arrays into a big Array vector, and so on, would be "data-oriented design" if it actually reflects the most common access patterns. So, is it the case that when you read a number you also want all those other numbers that come together with it in the cache line? Is that the case for Arrays? For DataViews? In other words, does this approach to allocating memory reflect the most common data access patterns in JavaScript programs? I'm not saying it's a bad approach, and I'm not even trying to imply that it's not DOD, I'm genuinely asking.

Edit: maybe a better question is: does it reflect the most common data access patterns of a JavaScript Engine?

replies(1): >>42175513 #
2. aapoalas ◴[] No.42175513[source]
Excellent question: In a theoretical sense the answer would be that we cannot know since it depends on the JavaScript being run. But: In practice I think that is indeed the case. Especially for the more common an object is, the likelier it is that it is used in conjunction with others around it. At the same time, the more important their memory placement becomes.

eg. Say you have a JS programs that has about a 100 DataViews: I'd say it's unlikely these are used in conjunction with others very often, but they're also only a small part of the program, so their placement is mostly whatever.

Now what if that number is a million instead? Now I'm betting they're mostly all created together, used together, and that their placement is critical to the program's performance.

So, I'm betting that making random memory access performance worse while guaranteeing that data created together stays together and improving linear memory performance will be an overall win.

Whether this is true data-oriented design is then in the eye of the beholder: Maybe someone will think I'm wrong, my assumptions are wrong, and I'm thus not doing things in a data-oriented way.