←back to thread

218 points signa11 | 2 comments | | HN request time: 0.506s | source
Show context
siev ◴[] No.43703045[source]
I like the sentiment, I love C. But this book seems riddled with errors and baffling decisions.

First of all, the fixed points are LITERALLY NOT FIXED POINTS. They're decimal floats. Fixed points are just integers that re-scale when multiplied or divided. There is no exponent field, no nothing. The author seems to have confused the notion "fixed points allow for precise calculations of monetary values" to mean that they're decimal. They're not. That section of the book contradicts itself constantly and also the code is wrong.

Also an ordered vector is used to implement a map/set. Because:

> Most people would likely instinctively reach for hash tables, and typically spend the next few months researching optimal hash algorithms and table designs.

> A binary searched vector is as simple as it gets and performs pretty well while being more predictable.

A basic hash table or hash set[1] is both simpler and faster than this solution. And I don't see what's stopping someone from spending the next few months researching optimal dynamic array growth and searching algorithms instead. This line of reasoning just doesn't make any sense.

And "Once nice advantage is that since they don't need any infrastructure, they're comparably cheap to create." What? It needs a dynamic array!

[1] https://github.com/skeeto/scratch/tree/master/set32

replies(3): >>43703760 #>>43704820 #>>43705016 #
tialaramex ◴[] No.43705016[source]
Oh wow, yeah, that's not a fixed point type, it's a bad decimal type, code which uses this type is more likely to be faulty. If you can afford 64-bit integers, just work in pennies (cents, whatever) everywhere and don't sweat it.

The growable array type ("vector" following C++ parlance) lacks the bifurcated reservation API meaning it has the same problem as Bjarne's std::vector - but it's 2025 people, just because C++ made this mistake last century doesn't mean you need to copy them.

And finally yes you want a really good general purpose hash table, this is one of the places where generics shine most brightly, don't "spend the next few months researching" pick a language which does a decent job of this out of the box, but since you're in C, your utility library should likewise provide a decent hash table out of the box.

Swiss Tables are literally just a single growable allocation, this idea that you've somehow made your thing cheaper than a hash table by using the growable array type underneath it means you're at best four decades behind the state of the art, which is a bad sign.

This is a Young Discipline. David Musser's "Introspective sorting" paper was written after I learned sorting at University. Literally the class where they taught me about sorting was held before that paper was even written, let alone widely disseminated. The whole terminology of "Lock free" versus "Wait free" again, that's newer than my undergraduate final project on distributed systems. Because this is a Young Discipline it's crucial to go check, hey, the stuff I learned in class years ago, is that actually still correct, and does my understanding match reality - or am I about to recite a known falsehood because I forgot how time works and/or I didn't pay attention in class?

replies(1): >>43705799 #
1. badmintonbaseba ◴[] No.43705799[source]
> The growable array type ("vector" following C++ parlance) lacks the bifurcated reservation API meaning it has the same problem as Bjarne's std::vector - but it's 2025 people, just because C++ made this mistake last century doesn't mean you need to copy them.

What's "the bifurcated reservation API"?

replies(1): >>43707140 #
2. tialaramex ◴[] No.43707140[source]
So, these types [the growable array, C++ and this library call them "vector", Rust calls them Vec, a lot of the GC languages call this ArrayList or even plain List] have the amortized constant time append operation which they achieve via a growth factor, often doubling. However, as the programmer we might well know useful things about how big our growable array will be, either immediately or in its eventual future, this won't influence correctness but responding appropriately can have a large impact on performance. To use this knowledge, the type should provide a reservation API - a way to tell it what you know.

There are several ways you could arrange this, but some of them can't optimize certain scenarios practically. I call Rust's choice here a "bifurcated" API because it has two functions named `reserve` and `reserve_exact` where many provide only one (typically named `reserve` but analogous to `reserve_exact`)

Because we know the circumstance, we can use the amortized growth strategy where appropriate in `reserve` even though we don't use it for `reserve_exact`.

Suppose I'm receiving bundles of Doodads, to form a Shipment, I can see how many Doodads are in the bundle I received, but I only know it's the last bundle of the shipment or it's not, I don't have advance notice of the full size of the Shipment.

If I receive bundles of 10, 15, 11, 20, 9, 14 and finally 13 Doodads. Total shipment size was 92 Doodads.

With just naive doubling, we grow to 1, 2, 4, 8, 16, 32, 64 and finally 128 Doodads space, we perform 127 copies + 92 new writes = 219 Doodad writes, 8 heap allocations. That's our base case, it's what Bjarne Stroustrup recommends and what you'd get in many languages out of the box or if you've never used a reservation API.

If we abuse Vec::reserve_exact - as might be tempting if it's the only API - we grow 10, 25, 36, 56, 65, 79, 92, we perform 271 copies + 92 new writes = 363 Doodad writes, 7 heap allocations, one fewer allocation but many more copies, probably slightly worse. Ouch.

If we use the bifurcated API we grow 10, 25, 50, 100, we perform 85 copies + 92 new writes = 177 Doodad writes, 4 allocations, we're doing markedly better.