Most active commenters
  • aapoalas(6)
  • mbrock(4)
  • liontwist(3)

←back to thread

268 points aapoalas | 14 comments | | HN request time: 1.278s | source | bottom

We're building a different kind of JavaScript engine, based on data-oriented design and willingness to try something quite out of left field. This is most concretely visible in our major architectural choices:

1. All data allocated on the JavaScript heap is placed into a type-specific vector. Numbers go into the numbers vector, strings into the strings vector, and so on.

2. All heap references are type-discriminated indexes: A heap number is identified by its discriminant value and the index to which it points to in the numbers vector.

3. Objects are also split up into object kind -specific vectors. Ordinary objects go into one vector, Arrays go into another, DataViews into yet another, and so on.

4. Unordinary objects' heap data does not contain ordinary object data but instead they contain an optional index to the ordinary objects vector.

5. Objects are aggressively split into parts to avoid common use-cases having to reading parts that are known to be unused.

If this sounds interesting, I've written a few blog posts on the internals of Nova over in our blog, you can jump into that here: https://trynova.dev/blog/what-is-the-nova-javascript-engine

1. liontwist ◴[] No.42173207[source]
This is a great idea! I had thought about doing this with a lisp interpreter. I had identified a few key advantages:

- homogenous allocation means no alignment gaps - linear access win in garbage collection - indices smaller than pointers - type discriminated index can save some size

I haven’t verified whether those actually work out in the details. I’ll read your blog article.

Don’t bother with these comments immediately comparing it to V8 (a multi billion dollar venture). I don’t know how many creative projects they’ve done before.

You may be be interested in looking at Fabrice Bellard’s JS engine for ideas.

replies(2): >>42173504 #>>42176218 #
2. aapoalas ◴[] No.42173504[source]
Thank you for the encouragement! Avoiding alignment gaps is indeed pretty great: I have a vision of packing Arrays into 9 bytes split over two or three cache lines.

On typed indexes: If we accept only about 2^24 possible index values then we could use a 32 bit integer for our Values, or at least for Objects (if we want to keep 7 bytes worth of stack data, which is pretty hard to pass on).

I love the comments comparing Nova to V8: That's what I want to aim for after all :) I'm not sure I've heard of Fabrice Bellard's JS engine, thanks, I'll take a look!

replies(1): >>42174861 #
3. NoahKAndrews ◴[] No.42174861[source]
Your blog mentions QuickJS, which I believe is the mentioned engine by Fabrice
replies(1): >>42175070 #
4. aapoalas ◴[] No.42175070{3}[source]
Oops :D
5. mbrock ◴[] No.42176218[source]
I actually made a Lisp interpreter in Zig a couple of years ago that has each object type in a separate heap array. In fact each field of each object type has its own array: every CDR is in one contiguous array. This was mostly for fun and to experiment with data-driven techniques using Zig metaprogramming. The code turned out relatively clean and simple.

https://github.com/mbrock/wisp

GC is stop&copy which as a side effect compacts each of those arrays and improves locality. I think most lists should end up having their CDRs next to each other in memory making iteration very cache friendly. But I didn't verify any performance qualities, beyond making it efficient enough for basic use.

It also has delimited continuation control, compiles to WebAssembly, and hooks promises into the continuation system, among some other pretty cool features!

replies(3): >>42176308 #>>42176372 #>>42176767 #
6. mbrock ◴[] No.42176308[source]
Oh yeah, continuation pointers also have their own array like every other field kind, which should have similar benefits as list traversal but for continuation copying... It's a really interesting design area, I think. Zig makes it easy!
replies(1): >>42176386 #
7. aapoalas ◴[] No.42176372[source]
Well I'll be damned! That sounds very much like what I want Nova to eventually be :) We don't have fields split apart at present, mostly because Rust doesn't make that quite as easy as I would want to. Otherwise it sounds like it's very much all the same, in a good way.

I'll definitely be taking a look at wisp, thank you very much for the link! If you ever have the time, I'd love seeing a comparison of this sort of engine design against a more traditional one.

Sorry, what is "CDR" in this context though?

replies(1): >>42176472 #
8. aapoalas ◴[] No.42176386{3}[source]
Yeah, I'm quite envious of the MultiArrayList or whatever it was that Zig has: If only Rust had that sort of a type built-in <3
replies(1): >>42176557 #
9. mbrock ◴[] No.42176472{3}[source]
Quick reply to the cdr thing: car/cdr are old Lisp names for the head/tail fields of linked list cells! :)
replies(1): >>42176599 #
10. mbrock ◴[] No.42176557{4}[source]
That's how I got interested in this kind of memory layout in the first place. I wanted a nice Lisp for WebAssembly and had recently gotten into Zig. When I started defining the word structure I remembered Andy Kelley's talk about using data-oriented design to make the Zig compiler fast, so I thought I'd try it, and the more I thought about it the more reasonable it seemed.

There are like a dozen object types with different growing multiarrays. Words are 32 bit with 1 for GC state and 27 for index and the rest are the type tag. Ints are 28 bits. Byte arrays have their own heap too, as well as general 32 bit vectors.

11. aapoalas ◴[] No.42176599{4}[source]
Ah, of course!
12. liontwist ◴[] No.42176767[source]
Yes. the right thing to do is to treat a list as a general case and other uses of cons as special case
replies(1): >>42177938 #
13. codr7 ◴[] No.42177938{3}[source]
I've flipped that idea around in a few of my own language designs, where pairs are the central feature and lists are just pairs with pair cdrs. Works fine from what I can see.
replies(1): >>42178815 #
14. liontwist ◴[] No.42178815{4}[source]
Yes pairs is the 1980s lisp design, but it’s not good for modern caches. Both obviously work.