Making a StringBuffer in C, and questioning my sanity

(briandouglas.ie)

44 points coneonthefloor | 4 comments | 15 Jul 25 10:48 UTC | HN request time: 0.005s | source

Show context

WalterBright ◴[19 Jul 25 00:25 UTC] No.44611357[source]▶

    new_capacity *= 2;

A better value is to increase size by 1.5:

https://stackoverflow.com/questions/1100311/what-is-the-idea...

replies(3): >>44611645 #>>44612459 #>>44613415 #

1. tialaramex ◴[19 Jul 25 07:43 UTC] No.44613415[source]▶

>>44611357 #

This seems clever if we forget that it runs on a real machine. Once we remember there's a real machine we run into discontinuities and suddenly "Naive doubling" works best for a lot of cases.

At the small end, on the real machine tiny allocations are inefficient, so rather than 1, 2, 3, 5, 8, 12, 27, 41 turns out we should probably start with 16 or even 64 bytes.

Then at the large end the "clever" reuse is a nonsense because of virtual memory, so that 1598, 2397, 3596 doesn't work out as cleanly as 1024, 2048, 4096

Folly has a growable array type which talks a big game on this 1.5 factor, but as I indicated above they quietly disable this "optimisation" for both small and large sizes in the code itself.

Of course YMMV, it is entirely possible that some particular application gets a noticeable speedup from a hand rolled 1.8x growth factor, or from starting at size 18 or whatever.

replies(1): >>44614130 #

2. sparkie ◴[19 Jul 25 10:10 UTC] No.44614130[source]▶

>>44613415 (TP) #

Also depends what we're trying to optimize. If we're trying to optimize for space then using a constant is a bad idea: Consider if we have 2Gi elements in the array: We have to grow it to 3Gi, but we may only need to add a few additional elements. That's pretty much a whole Gi of wasted space.

Clearly we don't just want a blanket constant, but the growth factor should be a function of the current length - decreasing as the size of the array grows.

For space optimization an ideal growth factor is 1+1/√(length). In the above example, where we have 2 Gi array, we would grow it only by 64ki elements. Obviously this results in many more allocations, and would only use this technique where we're optimizing for space rather than time.

We don't want to be messing around with square roots, and ideally, we want arrays to always be a multiple of some power of 2, so the trick is to approximate the square root:

    inline int64_t approx_sqrt(int64_t length) {
        return 1 << (64 - __builtin_clzll(length)) / 2;
        // if C23, use stdc_first_leading_one_ull() from <stdbit.h>
    }

    inline int64_t new_length(int64_t length) {
        if (length == 0) return 1;
        return length + approx_sqrt(length);
    }

Some examples - for all powers of 2 between UINT16_MAX and UINT32_MAX, the old and new lengths:

    old length: 2^16 -> new length: 0x00010100 (growth: 2^8)
    old length: 2^17 -> new length: 0x00020200 (growth: 2^9)
    old length: 2^18 -> new length: 0x00040200 (growth: 2^9)
    old length: 2^19 -> new length: 0x00080400 (growth: 2^10)
    old length: 2^20 -> new length: 0x00100400 (growth: 2^10)
    old length: 2^21 -> new length: 0x00200800 (growth: 2^11)
    old length: 2^22 -> new length: 0x00400800 (growth: 2^11)
    old length: 2^23 -> new length: 0x00801000 (growth: 2^12)
    old length: 2^24 -> new length: 0x01001000 (growth: 2^12)
    old length: 2^25 -> new length: 0x02002000 (growth: 2^13)
    old length: 2^26 -> new length: 0x04002000 (growth: 2^13)
    old length: 2^27 -> new length: 0x08004000 (growth: 2^14)
    old length: 2^28 -> new length: 0x10004000 (growth: 2^14)
    old length: 2^29 -> new length: 0x20008000 (growth: 2^15)
    old length: 2^30 -> new length: 0x40008000 (growth: 2^15)
    old length: 2^31 -> new length: 0x80010000 (growth: 2^16)

This is the growth rate used in Resizeable Arrays in Optimal Time and Space[1], but they don't use a single array with reallocation - instead they have an array of arrays, where growing the array appends an element to an index block which points to a data block of `approx_sqrt(length)` size, and the existing data blocks are all reused. The index block may require reallocation.

[1]:https://cs.uwaterloo.ca/research/tr/1999/09/CS-99-09.pdf

replies(1): >>44616317 #

3. teo_zero ◴[19 Jul 25 15:27 UTC] No.44616317[source]▶

>>44614130 #

> For space optimization an ideal growth factor is 1+1/√(length)

> return length + approx_sqrt(length);

They don't seem the same. Should one take the root or its reciprocal?

replies(1): >>44617161 #

4. sparkie ◴[19 Jul 25 16:56 UTC] No.44617161{3}[source]▶

>>44616317 #

They're the same in theory. The factor is the amount the original length is multiplied by (in place of *2 or *1.5 constants), and obviously x/√x = √x.

   length * (1 + 1/√(length)) = length + length/√(length) = length + √(length)

Addition is cheaper than the multiplication, and since we're approximating the sqrt we probably want to avoid calculating its reciprocal and multiplying. Our actual implementation is an approximation of 1+1/√(length)

Maybe I could've worded it a bit better.

More detail is in the RAOTS paper. It organizes the data blocks into so called logical "superblocks" (which have no material representation). Each superblock is ~n data blocks each of length n. In practice it's not exactly this because when we have an odd number of bits, say 5, we have to split it either 2:3 or 3:2 (for the log2 of num_blocks:block_size). The paper choses the former (same as above), but we could potentially also do it the other way where you have more blocks of smaller average size, but it would be a bit more awkward to index into the data blocks themselves since you'd have have to test for odd/even - which we don't do above because of the /2 integer division, and it would make the index block larger which we'd want to avoid because we have to reallocate it.

So in the approach taken, avg_block_size > √(length) and num_blocks < √(length), but avg_block_size * num_blocks == length still holds, and avg_block_size ≈ √(length) ≈ num_blocks.

If we're reallocating rather than using an index block, replace `num_blocks` with number of times we have to call the allocator.

↑