Making a StringBuffer in C, and questioning my sanity

1. WalterBright ◴[19 Jul 25 00:25 UTC] No.44611357[source]▶

    new_capacity *= 2;

A better value is to increase size by 1.5:

https://stackoverflow.com/questions/1100311/what-is-the-idea...

replies(3): >>44611645 #>>44612459 #>>44613415 #

2. emmelaich ◴[19 Jul 25 01:15 UTC] No.44611645[source]▶

>>44611357 (TP) #

I remember reading (decades ago) an extensive article in Software Practice and Experience reaching the same conclusion.

replies(1): >>44611788 #

3. manwe150 ◴[19 Jul 25 01:44 UTC] No.44611788[source]▶

>>44611645 #

Or like Python shows there, 1.25+k which can be better (faster growth and less memory wasted) than both

replies(1): >>44612494 #

4. burnt-resistor ◴[19 Jul 25 03:56 UTC] No.44612459[source]▶

>>44611357 (TP) #

Yep. And probably use tcmalloc or jemalloc (deprecated?) too. Most OS sbrk/libc malloc implementations are better than they used to be, but certain profiled programs can increased performance by tuning one of the nonstandard allocators. YMMV. Test, profile, and experiment.

5. burnt-resistor ◴[19 Jul 25 04:04 UTC] No.44612494{3}[source]▶

>>44611788 #

1.25 of what? Do you mean 2.25*k == 9*k/4.

replies(1): >>44613446 #

6. tialaramex ◴[19 Jul 25 07:43 UTC] No.44613415[source]▶

>>44611357 (TP) #

This seems clever if we forget that it runs on a real machine. Once we remember there's a real machine we run into discontinuities and suddenly "Naive doubling" works best for a lot of cases.

At the small end, on the real machine tiny allocations are inefficient, so rather than 1, 2, 3, 5, 8, 12, 27, 41 turns out we should probably start with 16 or even 64 bytes.

Then at the large end the "clever" reuse is a nonsense because of virtual memory, so that 1598, 2397, 3596 doesn't work out as cleanly as 1024, 2048, 4096

Folly has a growable array type which talks a big game on this 1.5 factor, but as I indicated above they quietly disable this "optimisation" for both small and large sizes in the code itself.

Of course YMMV, it is entirely possible that some particular application gets a noticeable speedup from a hand rolled 1.8x growth factor, or from starting at size 18 or whatever.

replies(1): >>44614130 #

7. tialaramex ◴[19 Jul 25 07:49 UTC] No.44613446{4}[source]▶

>>44612494 #

They say 1.25+k, in this context k will be some constant, suppose it's 16 bytes

Thus you might see a curve like 13, 32, 56, 86 - this curve is aggressive to start but then much gentler. Because it's so gentle it gets the re-use upside for medium sized allocations but it incurs a lot more copying, I can imagine in Python that might be a good trade.

8. sparkie ◴[19 Jul 25 10:10 UTC] No.44614130[source]▶

>>44613415 #

Also depends what we're trying to optimize. If we're trying to optimize for space then using a constant is a bad idea: Consider if we have 2Gi elements in the array: We have to grow it to 3Gi, but we may only need to add a few additional elements. That's pretty much a whole Gi of wasted space.

Clearly we don't just want a blanket constant, but the growth factor should be a function of the current length - decreasing as the size of the array grows.

For space optimization an ideal growth factor is 1+1/√(length). In the above example, where we have 2 Gi array, we would grow it only by 64ki elements. Obviously this results in many more allocations, and would only use this technique where we're optimizing for space rather than time.

We don't want to be messing around with square roots, and ideally, we want arrays to always be a multiple of some power of 2, so the trick is to approximate the square root:

    inline int64_t approx_sqrt(int64_t length) {
        return 1 << (64 - __builtin_clzll(length)) / 2;
        // if C23, use stdc_first_leading_one_ull() from <stdbit.h>
    }

    inline int64_t new_length(int64_t length) {
        if (length == 0) return 1;
        return length + approx_sqrt(length);
    }

Some examples - for all powers of 2 between UINT16_MAX and UINT32_MAX, the old and new lengths:

    old length: 2^16 -> new length: 0x00010100 (growth: 2^8)
    old length: 2^17 -> new length: 0x00020200 (growth: 2^9)
    old length: 2^18 -> new length: 0x00040200 (growth: 2^9)
    old length: 2^19 -> new length: 0x00080400 (growth: 2^10)
    old length: 2^20 -> new length: 0x00100400 (growth: 2^10)
    old length: 2^21 -> new length: 0x00200800 (growth: 2^11)
    old length: 2^22 -> new length: 0x00400800 (growth: 2^11)
    old length: 2^23 -> new length: 0x00801000 (growth: 2^12)
    old length: 2^24 -> new length: 0x01001000 (growth: 2^12)
    old length: 2^25 -> new length: 0x02002000 (growth: 2^13)
    old length: 2^26 -> new length: 0x04002000 (growth: 2^13)
    old length: 2^27 -> new length: 0x08004000 (growth: 2^14)
    old length: 2^28 -> new length: 0x10004000 (growth: 2^14)
    old length: 2^29 -> new length: 0x20008000 (growth: 2^15)
    old length: 2^30 -> new length: 0x40008000 (growth: 2^15)
    old length: 2^31 -> new length: 0x80010000 (growth: 2^16)

This is the growth rate used in Resizeable Arrays in Optimal Time and Space[1], but they don't use a single array with reallocation - instead they have an array of arrays, where growing the array appends an element to an index block which points to a data block of `approx_sqrt(length)` size, and the existing data blocks are all reused. The index block may require reallocation.

[1]:https://cs.uwaterloo.ca/research/tr/1999/09/CS-99-09.pdf

replies(1): >>44616317 #

9. teo_zero ◴[19 Jul 25 15:27 UTC] No.44616317{3}[source]▶

>>44614130 #

> For space optimization an ideal growth factor is 1+1/√(length)

> return length + approx_sqrt(length);

They don't seem the same. Should one take the root or its reciprocal?