new_capacity *= 2;
A better value is to increase size by 1.5:https://stackoverflow.com/questions/1100311/what-is-the-idea...
new_capacity *= 2;
A better value is to increase size by 1.5:https://stackoverflow.com/questions/1100311/what-is-the-idea...
At the small end, on the real machine tiny allocations are inefficient, so rather than 1, 2, 3, 5, 8, 12, 27, 41 turns out we should probably start with 16 or even 64 bytes.
Then at the large end the "clever" reuse is a nonsense because of virtual memory, so that 1598, 2397, 3596 doesn't work out as cleanly as 1024, 2048, 4096
Folly has a growable array type which talks a big game on this 1.5 factor, but as I indicated above they quietly disable this "optimisation" for both small and large sizes in the code itself.
Of course YMMV, it is entirely possible that some particular application gets a noticeable speedup from a hand rolled 1.8x growth factor, or from starting at size 18 or whatever.
Clearly we don't just want a blanket constant, but the growth factor should be a function of the current length - decreasing as the size of the array grows.
For space optimization an ideal growth factor is 1+1/√(length). In the above example, where we have 2 Gi array, we would grow it only by 64ki elements. Obviously this results in many more allocations, and would only use this technique where we're optimizing for space rather than time.
We don't want to be messing around with square roots, and ideally, we want arrays to always be a multiple of some power of 2, so the trick is to approximate the square root:
inline int64_t approx_sqrt(int64_t length) {
return 1 << (64 - __builtin_clzll(length)) / 2;
// if C23, use stdc_first_leading_one_ull() from <stdbit.h>
}
inline int64_t new_length(int64_t length) {
if (length == 0) return 1;
return length + approx_sqrt(length);
}
Some examples - for all powers of 2 between UINT16_MAX and UINT32_MAX, the old and new lengths: old length: 2^16 -> new length: 0x00010100 (growth: 2^8)
old length: 2^17 -> new length: 0x00020200 (growth: 2^9)
old length: 2^18 -> new length: 0x00040200 (growth: 2^9)
old length: 2^19 -> new length: 0x00080400 (growth: 2^10)
old length: 2^20 -> new length: 0x00100400 (growth: 2^10)
old length: 2^21 -> new length: 0x00200800 (growth: 2^11)
old length: 2^22 -> new length: 0x00400800 (growth: 2^11)
old length: 2^23 -> new length: 0x00801000 (growth: 2^12)
old length: 2^24 -> new length: 0x01001000 (growth: 2^12)
old length: 2^25 -> new length: 0x02002000 (growth: 2^13)
old length: 2^26 -> new length: 0x04002000 (growth: 2^13)
old length: 2^27 -> new length: 0x08004000 (growth: 2^14)
old length: 2^28 -> new length: 0x10004000 (growth: 2^14)
old length: 2^29 -> new length: 0x20008000 (growth: 2^15)
old length: 2^30 -> new length: 0x40008000 (growth: 2^15)
old length: 2^31 -> new length: 0x80010000 (growth: 2^16)
This is the growth rate used in Resizeable Arrays in Optimal Time and Space[1], but they don't use a single array with reallocation - instead they have an array of arrays, where growing the array appends an element to an index block which points to a data block of `approx_sqrt(length)` size, and the existing data blocks are all reused. The index block may require reallocation.[1]:https://cs.uwaterloo.ca/research/tr/1999/09/CS-99-09.pdf
length * (1 + 1/√(length)) = length + length/√(length) = length + √(length)
Addition is cheaper than the multiplication, and since we're approximating the sqrt we probably want to avoid calculating its reciprocal and multiplying. Our actual implementation is an approximation of 1+1/√(length)Maybe I could've worded it a bit better.
More detail is in the RAOTS paper. It organizes the data blocks into so called logical "superblocks" (which have no material representation). Each superblock is ~n data blocks each of length n. In practice it's not exactly this because when we have an odd number of bits, say 5, we have to split it either 2:3 or 3:2 (for the log2 of num_blocks:block_size). The paper choses the former (same as above), but we could potentially also do it the other way where you have more blocks of smaller average size, but it would be a bit more awkward to index into the data blocks themselves since you'd have have to test for odd/even - which we don't do above because of the /2 integer division, and it would make the index block larger which we'd want to avoid because we have to reallocate it.
So in the approach taken, avg_block_size > √(length) and num_blocks < √(length), but avg_block_size * num_blocks == length still holds, and avg_block_size ≈ √(length) ≈ num_blocks.
If we're reallocating rather than using an index block, replace `num_blocks` with number of times we have to call the allocator.