Indices, not Pointers | slacker news

1. zahlman ◴[02 Sep 25 23:26 UTC] No.45110423[source]▶

> There is a pattern I’ve learned while using Zig which I’ve never seen used in any other language.

I've done this in small applications in C (where nodes were already being statically allocated) and/or assembly (hacking on an existing binary).

No idea about the effect on speed in general; I was trying to save a few bytes of storage in a place where that mattered.

replies(5): >>45111351 #>>45111431 #>>45111876 #>>45112740 #>>45112993 #

2. throwawaymaths ◴[03 Sep 25 01:38 UTC] No.45111351[source]▶

>>45110423 (TP) #

> There is a pattern I’ve learned while using Zig which I’ve never seen used in any other language.

yeah, i feel like it's low key ECS (minus object/slot polymorphism)

3. anonymousiam ◴[03 Sep 25 01:50 UTC] No.45111431[source]▶

>>45110423 (TP) #

But in C, there's not really any difference between pointers and indices. You can access array elements using either method, and they're both equally efficient.

replies(4): >>45111456 #>>45111634 #>>45112113 #>>45112598 #

4. LegionMammal978 ◴[03 Sep 25 01:54 UTC] No.45111456[source]▶

>>45111431 #

Depending on how many elements you have, you can save some space using 32-bit or even 16-bit indices in place of 64-bit pointers. (Just make sure there isn't any route to overflow the index type.)

5. kazinator ◴[03 Sep 25 02:22 UTC] No.45111634[source]▶

>>45111431 #

There are some differences.

- You can check indices that are out of bounds without running into formal undefined behavior. ISO C does not require pointers to distinct objects to be comparable via inequality, only exact equality. (In practice it works fine in any flat-address-space implementation and may be regarded as a common extension.)

- Indices are implicitlyl scaled. If you have done a range check that index is valid, then it refers to an entry in your array. At worst it is some unoccupied/free entry that the caller shouldn't be using. If you have checked that a pointer points into the array, you don't know that it's valid; you also have to check that its displacement from the base of the array is a multiple of the element size; i.e. it is aligned.

replies(1): >>45112142 #

6. octoberfranklin ◴[03 Sep 25 03:07 UTC] No.45111876[source]▶

>>45110423 (TP) #

It also is/was common in Java when you need to reduce the pressure on the JVM garbage collector.

7. smadge ◴[03 Sep 25 03:52 UTC] No.45112113[source]▶

>>45111431 #

The distinction in the article really is between calling malloc for every added node in your data structure (“pointer”) or using the pre-allocated memory in the next element of an array (“index”).

replies(1): >>45112652 #

8. cyber_kinetist ◴[03 Sep 25 03:59 UTC] No.45112142{3}[source]▶

>>45111634 #

You do have to take care of the ABA problem - if you access memory using an index that became invalid before and another object is using instead, you will have some weird hard-to-debug logic errors (worse than use-after-free, since even Valgrind can't save you). To prevent this you need another generational counter to store along with your id (which is either incremented for every usage or assigned a random hash)

replies(1): >>45112628 #

9. adrian_b ◴[03 Sep 25 05:50 UTC] No.45112598[source]▶

>>45111431 #

After being translated literally in machine language, they are not equally efficient.

However, a C compiler may choose to use in machine language whichever of indices or pointers is more efficient on the target machine, regardless of whether the source program uses indices or pointers.

10. adrian_b ◴[03 Sep 25 05:55 UTC] No.45112628{4}[source]▶

>>45112142 #

This matters only for shared data structures. It is irrelevant for thread-local data.

For shared data structures, you have more to worry about, so regardless if you use indices or pointers you must use either atomic operations or means to ensure exclusive access to the entire data structure or means to detect the need for retries when using optimistic accesses.

replies(1): >>45112672 #

11. adrian_b ◴[03 Sep 25 06:00 UTC] No.45112652{3}[source]▶

>>45112113 #

This is only one of the advantages discussed in TFA. The others are those due to using indices instead of pointers (like smaller size, cache locality, range checking, possibility of data exchange between systems with distinct address spaces).

12. kazinator ◴[03 Sep 25 06:04 UTC] No.45112672{5}[source]▶

>>45112628 #

Well, solving/mitigating the ABA ambiguity can debug use-after-free errors in single-threaded programs also. Because when a pointer A is freed to B, and then recycled again for a new object, we can make it into a different pointer A' (e.g. with a tagging scheme). So then when the old A pointer copies are lingering around, we can tell they are invalid due to having the wrong tag.

Solving ABA is probably a point in favor of indices (if we are working in a higher level language) because their type supports the bit operations for tagging. However, some hardware has support for hardware tagging for pointers. E.g. ARM; Android uses it.

replies(1): >>45114096 #

13. adrian_b ◴[03 Sep 25 06:17 UTC] No.45112740[source]▶

>>45110423 (TP) #

While the author has seen this pattern in Zig, this pattern was the normal way of writing programs in FORTRAN, decades before the appearance of the C language.

The early versions of FORTRAN did not have dynamic memory allocation. Therefore the main program pre-allocated one or more work arrays, which were either known globally or they were passed as arguments to all procedures.

Then wherever a C program might use malloc, an item would be allocated in a work array and the references between data structures would use the indices of the allocated items. Items could be freed as described in TFA, by putting them in a free list.

The use of the data items allocated in work arrays in FORTRAN was made easier by the fact that the language allowed the aliasing of any chunk of memory to a variable of any type, either a scalar or an array of any rank and dimensions.

So this suggestion just recommends the return to the old ways. Despite its limitations, when maximum speed is desired, FORTRAN remains unbeatable by any of its successors.

replies(2): >>45113564 #>>45115238 #

14. anonnon ◴[03 Sep 25 06:58 UTC] No.45112993[source]▶

>>45110423 (TP) #

> No idea about the effect on speed in general; I was trying to save a few bytes of storage in a place where that mattered.

I had a decent sized C library that I could conditionally compile (via macros and ifdefs) to use pointers (64-bit) or indexes (32-bit), and I saw no performance improvement, at least for static allocation.

15. versteegen ◴[03 Sep 25 08:46 UTC] No.45113564[source]▶

>>45112740 #

> the language allowed the aliasing of any chunk of memory to a variable of any type, either a scalar or an array of any rank and dimensions.

Wait a minute, I've seen it stated many times that a primary reason FORTRAN can be better optimised than C is that it doesn't allow aliasing memory as easily as C does (what that means, maybe you can say), and that's why 'restrict' was added to C. On the other hand, C's "strict aliasing rule" allows compilers to assume that pointers of different types don't alias the same memory, which allows optimisations.

replies(2): >>45113995 #>>45114020 #

16. shakow ◴[03 Sep 25 10:00 UTC] No.45113995{3}[source]▶

>>45113564 #

Not the same aliasing.

GP is using aliasing as a synonym for casting; the aliasing you're thinking of is the one where, in C, pointer function arguments can refer to identical or overlapping memory spans.

17. adrian_b ◴[03 Sep 25 10:05 UTC] No.45114020{3}[source]▶

>>45113564 #

FORTRAN does not allow implicit aliasing between distinct function/procedure arguments, which helps optimization.

It allows explicit aliasing using the EQUIVALENCE statement, which declares that 2 variables of arbitrary types and names are allocated at the same memory address.

The C language has taken the keyword "union" from the language ALGOL 68, but instead of implementing true unions like in the original language (i.e. with tags handled by the compiler and with type safety) it has implemented a version of the FORTRAN EQUIVALENCE, which however is also weaker and less convenient than the FORTRAN declaration (unlike C's union, FORTRAN's EQUIVALENCE also worked for aliasing parts of bigger data structures, e.g. sub-arrays).

replies(1): >>45117193 #

18. adrian_b ◴[03 Sep 25 10:17 UTC] No.45114096{6}[source]▶

>>45112672 #

With indices what you say can be implemented trivially, much simpler than with pointers, by always incrementing a reallocated index (i.e. an index extracted from the free list) with the array size and always addressing the array with the indices modulo the array size.

With the array size chosen to be a power of two, this adds negligible overhead in time and no overhead in space.

replies(1): >>45114681 #

19. duped ◴[03 Sep 25 11:53 UTC] No.45114681{7}[source]▶

>>45114096 #

That's equivalent to packing two counters into one.

replies(1): >>45115063 #

20. adrian_b ◴[03 Sep 25 12:46 UTC] No.45115063{8}[source]▶

>>45114681 #

True.

21. physicsguy ◴[03 Sep 25 13:05 UTC] No.45115238[source]▶

>>45112740 #

I just came here to say exactly the same. I've also seen it used in C/C++ for Fast Multipole Method / Barnes Hut methods codes, I don't think this is a forgotten trick at all.

22. pklausler ◴[03 Sep 25 15:51 UTC] No.45117193{4}[source]▶

>>45114020 #

Fortran doesn't disallow aliasing per se of dummy arguments to functions or subroutines, but we do restrict what can be done with aliases in many cases. The details matter.