←back to thread

268 points aapoalas | 4 comments | | HN request time: 1.046s | source

We're building a different kind of JavaScript engine, based on data-oriented design and willingness to try something quite out of left field. This is most concretely visible in our major architectural choices:

1. All data allocated on the JavaScript heap is placed into a type-specific vector. Numbers go into the numbers vector, strings into the strings vector, and so on.

2. All heap references are type-discriminated indexes: A heap number is identified by its discriminant value and the index to which it points to in the numbers vector.

3. Objects are also split up into object kind -specific vectors. Ordinary objects go into one vector, Arrays go into another, DataViews into yet another, and so on.

4. Unordinary objects' heap data does not contain ordinary object data but instead they contain an optional index to the ordinary objects vector.

5. Objects are aggressively split into parts to avoid common use-cases having to reading parts that are known to be unused.

If this sounds interesting, I've written a few blog posts on the internals of Nova over in our blog, you can jump into that here: https://trynova.dev/blog/what-is-the-nova-javascript-engine

1. aapoalas ◴[] No.42173413[source]
Thank you for your interest and fascination! To answer in brief:

1. We have async support but are still lacking some important parts (mainly interleaved GC) before complex, long-running performance benchmarks can be performed. I expect the performance to initially be relatively bad as we're missing important optimisations like shapes and inline caching.

2. The vector compacting is done so as to ensure that the heap allocated data does not fragment in memory. That being said, it's definitely possible that the heap vectors themselves slowly fragment to span disjoint memory areas instead of being all right next to each other. I don't think this will be a big problem though, as the vectors are still themselves densely packed.

I expect the heap design will definitely suffer some penalties in cases where old data is slowly trickling out from underneath a mass of newer but still live data: During a major GC in these circumstances, the majority of data gets copied to densely pack the vector again. That being said, this isn't too different from a half-space copying garbage collector, and I don't think those are particularly terrible.

replies(1): >>42173531 #
2. whizzter ◴[] No.42173531[source]
I've built some prototypes and is there any particular reason you didn't go for NaN-tagged indexes instead unless it's for 32bit? Numeric cases would've had less loads since it's a number directly and indexes in the mantissa should be enough for at least about 48bit of ref space.

I did consider a similar system ages ages ago for more easily embedding a JS engine into a C/C++ codebase, type-shapes would be allocated on a per page-basis so the runtime/GC wouldn't need any V-table pointers,etc on top of regular plain C object shapes to locate the type info but instead rely on an indirection per-page for those types shared with the C world. Ultimately felt a bit too complicated for something meant for embedding.

replies(1): >>42173881 #
3. aapoalas ◴[] No.42173881{3}[source]
I didn't really consider NaN tagged indexes: Rust made using an enum basically a given. That being said, I probably wouldn't change even if could now. A NaN tagging scheme blocks out at least 11 bits out of your from your useful payload, leaving you with at most 46 bits to split between your discriminant and payload, while giving you freedom to express arbitrary doubles on the stack.

A tagged index gives you 7 bytes to use for payload: This for instance gives us the possibility of representing all but the most decimal heavy doubles on the stack (we drop the bottom byte from a double if it is all zeroes, and save the remaining data on the stack), but also allowing up to 7 byte strings on the stack! And all safe integers! And up to 56 bits worth of Bigints!

So, a tagged enum is pretty powerful :)

replies(1): >>42187433 #
4. whizzter ◴[] No.42187433{4}[source]
Some pros definetly, my target was mainly games so coherent and/or low cost handling of primitive values (numbers) was a priority. With NaN tagging you can do all operations sans addition with the regular floating point instructions and if the CPU has a canonical NaN representaation that doesn't collide with the chosen tag pattern then there is basically no cost for the dynamic typing when it comes to numeric operations.