Most active commenters

MauranKilom(3)

Popular/hot comments

>>26340918 #

←back to thread

Don’t blindly prefer emplace_back to push_back

(quuxplusone.github.io)

1. jokoon ◴[04 Mar 21 10:40 UTC] No.26340841[source]▶

>>26339893 (OP) #

Thanks for the article, there should be a huge list of those type of classic C++ misconceptions/FAQ style.

One of my favorite is "you don't need a linked list".

Although I'm still genuinely interested when a linked list is best suited. I'm also curious why they were invented and why they're taught, maybe just for teaching purposes...

replies(12): >>26340881 #>>26340918 #>>26340936 #>>26341086 #>>26341090 #>>26341157 #>>26341167 #>>26341203 #>>26341240 #>>26341290 #>>26341787 #>>26347246 #

2. quietbritishjim ◴[04 Mar 21 10:46 UTC] No.26340881[source]▶

>>26340841 (TP) #

I used one the other day because it had the interface I needed and the number of items was so small and so infrequently iterated over that performance didn't matter. (I guess that's not really what you meant though: you were probably interested in a use where it really was needed for performance reasons - splicing things around or the like.)

Specifically I needed the guarantee that iterators weren't invalidated by other parts of the list changing. I could've used a vector and worked around it but it would've taken more effort and more code, and like I said the performance difference wouldn't have made an overall difference to the program.

3. beaconstudios ◴[04 Mar 21 10:53 UTC] No.26340918[source]▶

>>26340841 (TP) #

Naively, I would assume that LLs are best used for long lists which are typically traversed incrementally, like a FILO-only job queue. That wouldn't perform better than a static array of statically-sized objects, but it could be better for eg. a list of structs with flexible array members (ie, variable-sized structs).

They might be good for traversed lists with the possibility of partway insertion too, or performing insertion sort. Hell they might even be good for traversed lists in the general case, but I'm not a low level dev so I don't know without testing it.

Fun exercise though!

I'm guessing they're taught because they're taught, in that special inertial way that education propagates. Originally, I bet they were introduced when implementing your own data structures rather than pulling a library was normal industry practice.

Of course there is tons of value in knowing about data structures and their characteristics, but probably not in knowing how to implement LLs.

replies(3): >>26341133 #>>26341145 #>>26341155 #

4. w0utert ◴[04 Mar 21 10:56 UTC] No.26340936[source]▶

>>26340841 (TP) #

>> Although I'm still genuinely interested when a linked list is best suited. I'm also curious why they were invented and why they're taught, maybe just for teaching purposes...

It's not hard to come up with plenty of valid use cases for linked lists. Any time you need some algorithm that does a lot of non-predictable insertions/deletions, on a collection that is not regularly iterated in full or used to find elements (so pointer-chasing/cache effects are irrelevant). And/or when the elements are prohibitively expensive reorder/move on insertions/deletions, or some other piece of code is known to keep direct references to the elements that should not be invalidated when some unrelated element is inserted/removed and. Maintaining validity of iterators while an algorithm is operating on the list is another important benefit.

Some people will come up with all kinds of clever workarounds or more complicated datastructures/bookkeeping so you can achieve the same thing without a linked list, but in most cases I don't see the point of it. Especially not in cases that are not performance-sensitive at all. In such cases you're just increasing code complexity and obfuscating intent because of some ideological objection to using linked lists.

replies(1): >>26341111 #

5. 4fips ◴[04 Mar 21 11:21 UTC] No.26341086[source]▶

>>26340841 (TP) #

Linked lists become really powerful and performant when implemented as intrusive containers, such containers do not need to live on heap and can be embedded directly into another data structures, so an element can be linked into multiple containers at the same time, this gives great flexibility and compactness (especially when indices are used to form the links instead of pointers).

replies(1): >>26341829 #

6. lorenzhs ◴[04 Mar 21 11:22 UTC] No.26341090[source]▶

>>26340841 (TP) #

Several few decades ago, pointer chasing was much cheaper because the difference between an arithmetic operation and a memory access was far lower. CPUs in those times didn't even have caches (because they weren't needed). That made linked lists a reasonable choice. But CPU and memory speeds have been diverging -- both are growing at an exponential pace, but with different exponents, so the gap is also growing exponentially! So now we have a huge memory hierarchy and random accesses into memory are really slow and linked lists rarely make sense.

replies(1): >>26341165 #

7. MauranKilom ◴[04 Mar 21 11:26 UTC] No.26341111[source]▶

>>26340936 #

> Any time you need some algorithm that does a lot of non-predictable insertions/deletions, on a collection that is not regularly iterated in full or used to find elements (so pointer-chasing/cache effects are irrelevant).

You still should at least consider a vector here. If lookups are infrequent, you can just sort it and binary search it on demand (and just add new items at the end). There are cases when this is not good enough, but they really are surprisingly rare.

> And/or when the elements are prohibitively expensive reorder/move on insertions/deletions, or some other piece of code is known to keep direct references to the elements that should not be invalidated

In that case, why not store the elements in a unique_ptr (and then everything in a vector)?

> Especially not in cases that are not performance-sensitive at all. In such cases you're just increasing code complexity and obfuscating intent because of some ideological objection to using linked lists.

Fully agreed - if you are sure that it cannot possibly become performance relevant and a list has the right interface, use it. For example, when the number of items is fundamentally tiny (e.g. one item per dimension of your 3D data) and the thing you'll do with the list item will be complex (or infrequent) anyway so that a few pointer indirections are negligible.

replies(2): >>26341308 #>>26354859 #

8. gmueckl ◴[04 Mar 21 11:31 UTC] No.26341133[source]▶

>>26340918 #

Even if linked lists and double linked lists aren't all that universally useful by themselves, they teach all the basics about taversing data structures by references / pointers that more complex trees and graphs require. A linked list is just a degenerate tree with at most one child for each parent. So I think teaching that isn't a bad decision.

replies(2): >>26341153 #>>26341172 #

9. dmurray ◴[04 Mar 21 11:33 UTC] No.26341145[source]▶

>>26340918 #

Everyone would probably agree it's good to teach trees in a CS data structures class. Linked lists make a nice introduction to this: what is a linked list except a tree with max branching factor 1? So you teach LLs on day one, and introduce in the simplest useful way the idea of a recursive data structure.

10. dmurray ◴[04 Mar 21 11:33 UTC] No.26341153{3}[source]▶

>>26341133 #

I couldn't have said it better :)

11. ywei3410 ◴[04 Mar 21 11:34 UTC] No.26341155[source]▶

>>26340918 #

They're also an example of a data-structure which works pretty in a concurrent setting - you can get away with a single atomic reference for an unbounded FILO queue implemented with a linked list. Adding and removing items to it is then dirt-cheap - provided you don't go through the queue quickly and job running time > iteration time.

But with all things in computer science, it really depends on your use-case.

12. creata ◴[04 Mar 21 11:34 UTC] No.26341157[source]▶

>>26340841 (TP) #

Linked lists are pretty nice.

* Lazy linked lists are the same thing as iterators, a fact which is used pervasively in Haskell.

* Intrusive or otherwise inline linked lists like are used everywhere, e.g. the DOM has a doubly linked list of children for each node, simple memory pools are often implemented as free lists, and tons of resources inside kernels are members of many linked lists.

* Linked lists are guaranteed O(1) append, but dynamic arrays are only amortized O(1) append. Intrusive linked lists also never need to allocate. That makes them useful inside resource-constrained places.

replies(1): >>26341229 #

13. masklinn ◴[04 Mar 21 11:36 UTC] No.26341165[source]▶

>>26341090 #

> Several few decades ago, pointer chasing was much cheaper because the difference between an arithmetic operation and a memory access was far lower.

Also allocation was just bumping the brk.

14. lelanthran ◴[04 Mar 21 11:36 UTC] No.26341167[source]▶

>>26340841 (TP) #

> Although I'm still genuinely interested when a linked list is best suited.

I can think of a few use-cases where I've opted for linked-lists over resizable arrays:

1. When you have enough memory for the set of all elements you want to store but not enough contiguous memory for the set of all elements you want to store (i.e. You have the available memory, but it's all fragmented).

2. When you have multiple threads adding to/removing from the container, a linked list lets each thread (in theory) insert/remove nodes at the same time. For example, one thread can be adding to the head of a queue while another thread is removing from the tail of the same queue; they don't need to both wait for the same lock to resize the container.

3. When you have large insertions/removals. For example, concatenating two lists into one is just reassigning two/four pointers in linked lists. With arrays or similar you have to have enough free memory to make of a copy of the smaller list, at least).

   Another example is "zipping" two lists together - you simply go through the lists and reassign pointers. With arrays you have to allocate enough memory for the final list before copying elements over.

   Same goes for splitting lists into two: for example, applying a predicate function to remove and return all elements that satisfies the predicate takes no extra memory using lists, but with arrays will need the destination to be allocated first).

4. Finally, you may want to store cycles in your container (perhaps the traversal maintains state so you can use the cycle to make traversal of some nodes repeat, or maybe you're simply insane and like to confuse readers of your code :-).

Okay, fair enough, that last one is me stretching things a bit :-)

replies(2): >>26341243 #>>26341268 #

15. beaconstudios ◴[04 Mar 21 11:37 UTC] No.26341172{3}[source]▶

>>26341133 #

good point, they may still be useful as a pedagogical tool to introduce pointer-based structures.

16. jpcooper ◴[04 Mar 21 11:40 UTC] No.26341203[source]▶

>>26340841 (TP) #

I remember the linked list being useful to solve Leetcode-style questions involving LRU (least recently used) caches.

17. jpcooper ◴[04 Mar 21 11:43 UTC] No.26341229[source]▶

>>26341157 #

Agreed on linked lists in Haskell if you’re doing forward iteration. You can have lazy byte strings under the bonnet while providing a ‘pure’ linked list interface. No worries about explicit iteration.

18. quotemstr ◴[04 Mar 21 11:44 UTC] No.26341240[source]▶

>>26340841 (TP) #

> Although I'm still genuinely interested when a linked list is best suited

Maintaining an intrusive list of objects that you have around anyway is sometimes the simplest option, especially because objects can easily remove themselves from the list upon destruction.

19. MauranKilom ◴[04 Mar 21 11:45 UTC] No.26341243[source]▶

>>26341167 #

Re 1. (memory fragmentation): I'm sort of on board with this (although you probably make it worse for the next person if you use linked lists). But it'll be a very special case where the performance hit from working with a huge list will be acceptable.

Re 2. (threading): What happens when the list becomes empty? How do you keep track of the size to determine whether the list is empty? How do you know the size is still up-to-date when you go on to make a modification? I don't think this is as easy (even "in theory") as you make it sound.

Re 3. (merging overhead): You say "takes no extra memory", but you use significant extra memory already: You incur an additional two pointers of (and possibly per-allocation) overhead. For the entire lifetime of the list, not just when you do some operations.

If your elements are small, then your list might have double memory consumption the entire time instead of only when merging. And if they are large, you could store unique_ptrs to those elements in a vector. In which case the total memory overhead of merging two vectors (2 x total number of elements x pointer size) is exactly the same as a linked list has the entire time. And all the operations then turn into shuffling around pointers too, but probably fewer: You just have to move the unique_ptrs around, which is one pointer each instead of constantly unlinking and relinking lists.

Sure, sometimes iterator invalidation is a thing, and maybe keeping pointers to (separately allocated) elements doesn't cut it. But even then you wouldn't want to use a list if performance is even remotely a concern.

replies(2): >>26341300 #>>26341441 #

20. jcelerier ◴[04 Mar 21 11:49 UTC] No.26341268[source]▶

>>26341167 #

> When you have enough memory for the set of all elements you want to store but not enough contiguous memory for the set of all elements you want to store

That's only relevant if you don't have an MMU tho. So not on any desktop or embedded system where ram is measured in gigabytes.

Otherwise it's up to the OS to handle that - only individual memory pages are contiguous, but not sets of pages.

replies(2): >>26341345 #>>26341769 #

21. grogers ◴[04 Mar 21 11:54 UTC] No.26341290[source]▶

>>26340841 (TP) #

Use a linked list when you need to store iterators into a container while mutating it. E.g. if you are making a super naive LRU cache, you could use a map<K, list<pair<K,V>>::iterator> and a list<pair<K,V>>. This is something you can't do with vectors unless you can guarantee the capacity will never increase.

This is the most useful case of removing from the middle of a container, which linked lists are good at. Intrusive lists are common for this type of thing, instead of actually storing a list iterator.

22. gpderetta ◴[04 Mar 21 11:55 UTC] No.26341300{3}[source]▶

>>26341243 #

To fight fragmentation, a deque-like container is probably better anyway.

Very good point on 3.

replies(1): >>26341479 #

23. tummulfingur ◴[04 Mar 21 11:57 UTC] No.26341308{3}[source]▶

>>26341111 #

>> or some other piece of code is known to keep direct references to the elements that should not be invalidated

> In that case, why not store the elements in a unique_ptr (and then everything in a vector)?

I recently ended up using std::list<Object*> because of pointer invalidation on vector resize. I see now that using smart pointers would have worked just as well. Thanks.

24. ◴[04 Mar 21 12:03 UTC] No.26341345{3}[source]▶

>>26341268 #

25. lelanthran ◴[04 Mar 21 12:21 UTC] No.26341441{3}[source]▶

>>26341243 #

> Re 2. (threading): What happens when the list becomes empty? How do you keep track of the size to determine whether the list is empty? How do you know the size is still up-to-date when you go on to make a modification? I don't think this is as easy (even "in theory") as you make it sound.

No, it isn't easy. I used a single lock to lock both head and tail pointers, and did the pointer-swapping and pointer-comparisons for the pathological cases in that lock[1]. However, that being said, locking for the comparison and head/tail reassignment is *still* preferable to locking while reallocating your array.

For counting items in the list, I post to a semaphore when an item is inserted, wait on the semaphore when an item is removed and simply return the semaphore count when a count of items is required. Because it is a linked list, there is no problem with having the count change in between retrieving it and adding/removing items - that'd only be a problem in array-like structures.

If you want to code-review my implementation then be my guest (it's on github, under my name, a repo called libcmq). I welcome criticism :-)

In any case, I've been considering reimplementing that library as a resizable array precisely because of the downsides of lists. I'm keeping it as it is (for now) because there are other things I want to do that aren't boring.

[1] Empty list and list of exactly one element are problematic.

26. MauranKilom ◴[04 Mar 21 12:25 UTC] No.26341479{4}[source]▶

>>26341300 #

Indeed. I recently looked at using std::deque because I was exactly trying to avoid allocating too much contiguous memory, fearing fragmentation.

Sadly, std::deque is pretty much useless. It uses something like page-sized blocks in one implementation, a few cachelines in another one (forgot which is gcc/clang), and on MSVC it allocates every single item separately (!!!) for element size > 1 word. And this will not be changed due to ABI compatibility. It might as well not exist in the standard.

replies(1): >>26341669 #

27. gpderetta ◴[04 Mar 21 12:49 UTC] No.26341669{5}[source]▶

>>26341479 #

yes, I never use std::deque for the same reason. I believe that the deque in boost these days has a compile time configurable block size.

28. saagarjha ◴[04 Mar 21 13:00 UTC] No.26341769{3}[source]▶

>>26341268 #

If your memory is fragmented enough you can have many pages that are lightly used, at which point a MMU isn't going to help.

29. eru ◴[04 Mar 21 13:02 UTC] No.26341787[source]▶

>>26340841 (TP) #

Linked lists are great when you want a fully persistent data structure version of a stack. (https://en.wikipedia.org/wiki/Persistent_data_structure)

If you don't need persistence, you can often use a dynamic array to do basically everything a linked list can do for you, but most of it faster.

Also, if you have a relocating garbage collector, or otherwise a smart enough compiler (or interpreter), linked lists can become sufficiently efficient.

30. eru ◴[04 Mar 21 13:06 UTC] No.26341829[source]▶

>>26341086 #

Similarly, some data structure and algorithmic choices that don't make much sense when operating in RAM, can suddenly become appropriate in different environments.

Eg when working on hard disks (eg think a file system or database with raw storage access). Or if you are working in a medium were you can only write once, and can't change what's already written, etc.

31. bzbarsky ◴[04 Mar 21 19:48 UTC] No.26347246[source]▶

>>26340841 (TP) #

One place where a (doubly, perhaps) linked list might be suitable is when you have a list that wants to provide the following features:

1. Multiple independent (not necessarily concurrent, but not coordinated) mutators.

2. Insertion and removal happen commonly and should be cheap.

3. Operations on the whole list typically happen in list order or close to it.

Bonus points if the size of the relevant data structures is often such that you blow out caches anyway (e.g each thing in your list is larger than a cache line) and hence there are no strong locality benefits from an array. Thought prefetching _can_ sometimes help with arrays there.

The above features _can_ be implemented on top of an array, but feature 1 means you can't use indices as long-lived handles to items, so you need either handles that get updated on mutations or some concept of identity and "find the current index of this thing" calls.

For context here, Gecko used to store the children of a DOM node as an array of pointers and switched to a doubly-linked list because the DOM does have the above properties and it was faster and mostly simpler to have a linked list in practice. There is a bit of complexity to make node.childNodes iteration (as opposed to .nextSibling/.previousSibling iteration) not be O(N^2), but overall the switch was a win.

32. w0utert ◴[05 Mar 21 09:34 UTC] No.26354859{3}[source]▶

>>26341111 #

>> You still should at least consider a vector here. If lookups are infrequent, you can just sort it and binary search it on demand (and just add new items at the end). There are cases when this is not good enough, but they really are surprisingly rare.

Well most of the time linked lists are used for things that have to be kept in some order that cannot (easily) be determined by a sorting predicate, and/or when re-sorting is not allowed.

>> In that case, why not store the elements in a unique_ptr (and then everything in a vector)?

But why? What's the benefit of doing that? The objects will still be scattered around the heap so iterating and dereferencing still trashes the cache, you lose benefits like not invalidating iterators on inserts/removals, and you complicate the code.

I'm not saying linked lists should be your go-to data structure, but IMO the 'linked lists are bad, dont use them' meme is a little overused.

↑