One of my favorite is "you don't need a linked list".
Although I'm still genuinely interested when a linked list is best suited. I'm also curious why they were invented and why they're taught, maybe just for teaching purposes...
One of my favorite is "you don't need a linked list".
Although I'm still genuinely interested when a linked list is best suited. I'm also curious why they were invented and why they're taught, maybe just for teaching purposes...
Specifically I needed the guarantee that iterators weren't invalidated by other parts of the list changing. I could've used a vector and worked around it but it would've taken more effort and more code, and like I said the performance difference wouldn't have made an overall difference to the program.
They might be good for traversed lists with the possibility of partway insertion too, or performing insertion sort. Hell they might even be good for traversed lists in the general case, but I'm not a low level dev so I don't know without testing it.
Fun exercise though!
I'm guessing they're taught because they're taught, in that special inertial way that education propagates. Originally, I bet they were introduced when implementing your own data structures rather than pulling a library was normal industry practice.
Of course there is tons of value in knowing about data structures and their characteristics, but probably not in knowing how to implement LLs.
It's not hard to come up with plenty of valid use cases for linked lists. Any time you need some algorithm that does a lot of non-predictable insertions/deletions, on a collection that is not regularly iterated in full or used to find elements (so pointer-chasing/cache effects are irrelevant). And/or when the elements are prohibitively expensive reorder/move on insertions/deletions, or some other piece of code is known to keep direct references to the elements that should not be invalidated when some unrelated element is inserted/removed and. Maintaining validity of iterators while an algorithm is operating on the list is another important benefit.
Some people will come up with all kinds of clever workarounds or more complicated datastructures/bookkeeping so you can achieve the same thing without a linked list, but in most cases I don't see the point of it. Especially not in cases that are not performance-sensitive at all. In such cases you're just increasing code complexity and obfuscating intent because of some ideological objection to using linked lists.
You still should at least consider a vector here. If lookups are infrequent, you can just sort it and binary search it on demand (and just add new items at the end). There are cases when this is not good enough, but they really are surprisingly rare.
> And/or when the elements are prohibitively expensive reorder/move on insertions/deletions, or some other piece of code is known to keep direct references to the elements that should not be invalidated
In that case, why not store the elements in a unique_ptr (and then everything in a vector)?
> Especially not in cases that are not performance-sensitive at all. In such cases you're just increasing code complexity and obfuscating intent because of some ideological objection to using linked lists.
Fully agreed - if you are sure that it cannot possibly become performance relevant and a list has the right interface, use it. For example, when the number of items is fundamentally tiny (e.g. one item per dimension of your 3D data) and the thing you'll do with the list item will be complex (or infrequent) anyway so that a few pointer indirections are negligible.
But with all things in computer science, it really depends on your use-case.
* Lazy linked lists are the same thing as iterators, a fact which is used pervasively in Haskell.
* Intrusive or otherwise inline linked lists like are used everywhere, e.g. the DOM has a doubly linked list of children for each node, simple memory pools are often implemented as free lists, and tons of resources inside kernels are members of many linked lists.
* Linked lists are guaranteed O(1) append, but dynamic arrays are only amortized O(1) append. Intrusive linked lists also never need to allocate. That makes them useful inside resource-constrained places.
I can think of a few use-cases where I've opted for linked-lists over resizable arrays:
1. When you have enough memory for the set of all elements you want to store but not enough contiguous memory for the set of all elements you want to store (i.e. You have the available memory, but it's all fragmented).
2. When you have multiple threads adding to/removing from the container, a linked list lets each thread (in theory) insert/remove nodes at the same time. For example, one thread can be adding to the head of a queue while another thread is removing from the tail of the same queue; they don't need to both wait for the same lock to resize the container.
3. When you have large insertions/removals. For example, concatenating two lists into one is just reassigning two/four pointers in linked lists. With arrays or similar you have to have enough free memory to make of a copy of the smaller list, at least).
Another example is "zipping" two lists together - you simply go through the lists and reassign pointers. With arrays you have to allocate enough memory for the final list before copying elements over.
Same goes for splitting lists into two: for example, applying a predicate function to remove and return all elements that satisfies the predicate takes no extra memory using lists, but with arrays will need the destination to be allocated first).
4. Finally, you may want to store cycles in your container (perhaps the traversal maintains state so you can use the cycle to make traversal of some nodes repeat, or maybe you're simply insane and like to confuse readers of your code :-).Okay, fair enough, that last one is me stretching things a bit :-)
Maintaining an intrusive list of objects that you have around anyway is sometimes the simplest option, especially because objects can easily remove themselves from the list upon destruction.
Re 2. (threading): What happens when the list becomes empty? How do you keep track of the size to determine whether the list is empty? How do you know the size is still up-to-date when you go on to make a modification? I don't think this is as easy (even "in theory") as you make it sound.
Re 3. (merging overhead): You say "takes no extra memory", but you use significant extra memory already: You incur an additional two pointers of (and possibly per-allocation) overhead. For the entire lifetime of the list, not just when you do some operations.
If your elements are small, then your list might have double memory consumption the entire time instead of only when merging. And if they are large, you could store unique_ptrs to those elements in a vector. In which case the total memory overhead of merging two vectors (2 x total number of elements x pointer size) is exactly the same as a linked list has the entire time. And all the operations then turn into shuffling around pointers too, but probably fewer: You just have to move the unique_ptrs around, which is one pointer each instead of constantly unlinking and relinking lists.
Sure, sometimes iterator invalidation is a thing, and maybe keeping pointers to (separately allocated) elements doesn't cut it. But even then you wouldn't want to use a list if performance is even remotely a concern.
That's only relevant if you don't have an MMU tho. So not on any desktop or embedded system where ram is measured in gigabytes.
Otherwise it's up to the OS to handle that - only individual memory pages are contiguous, but not sets of pages.
This is the most useful case of removing from the middle of a container, which linked lists are good at. Intrusive lists are common for this type of thing, instead of actually storing a list iterator.
> In that case, why not store the elements in a unique_ptr (and then everything in a vector)?
I recently ended up using std::list<Object*> because of pointer invalidation on vector resize. I see now that using smart pointers would have worked just as well. Thanks.
No, it isn't easy. I used a single lock to lock both head and tail pointers, and did the pointer-swapping and pointer-comparisons for the pathological cases in that lock[1]. However, that being said, locking for the comparison and head/tail reassignment is *still* preferable to locking while reallocating your array.
For counting items in the list, I post to a semaphore when an item is inserted, wait on the semaphore when an item is removed and simply return the semaphore count when a count of items is required. Because it is a linked list, there is no problem with having the count change in between retrieving it and adding/removing items - that'd only be a problem in array-like structures.
If you want to code-review my implementation then be my guest (it's on github, under my name, a repo called libcmq). I welcome criticism :-)
In any case, I've been considering reimplementing that library as a resizable array precisely because of the downsides of lists. I'm keeping it as it is (for now) because there are other things I want to do that aren't boring.
[1] Empty list and list of exactly one element are problematic.
Sadly, std::deque is pretty much useless. It uses something like page-sized blocks in one implementation, a few cachelines in another one (forgot which is gcc/clang), and on MSVC it allocates every single item separately (!!!) for element size > 1 word. And this will not be changed due to ABI compatibility. It might as well not exist in the standard.
If you don't need persistence, you can often use a dynamic array to do basically everything a linked list can do for you, but most of it faster.
Also, if you have a relocating garbage collector, or otherwise a smart enough compiler (or interpreter), linked lists can become sufficiently efficient.
Eg when working on hard disks (eg think a file system or database with raw storage access). Or if you are working in a medium were you can only write once, and can't change what's already written, etc.
1. Multiple independent (not necessarily concurrent, but not coordinated) mutators.
2. Insertion and removal happen commonly and should be cheap.
3. Operations on the whole list typically happen in list order or close to it.
Bonus points if the size of the relevant data structures is often such that you blow out caches anyway (e.g each thing in your list is larger than a cache line) and hence there are no strong locality benefits from an array. Thought prefetching _can_ sometimes help with arrays there.
The above features _can_ be implemented on top of an array, but feature 1 means you can't use indices as long-lived handles to items, so you need either handles that get updated on mutations or some concept of identity and "find the current index of this thing" calls.
For context here, Gecko used to store the children of a DOM node as an array of pointers and switched to a doubly-linked list because the DOM does have the above properties and it was faster and mostly simpler to have a linked list in practice. There is a bit of complexity to make node.childNodes iteration (as opposed to .nextSibling/.previousSibling iteration) not be O(N^2), but overall the switch was a win.
Well most of the time linked lists are used for things that have to be kept in some order that cannot (easily) be determined by a sorting predicate, and/or when re-sorting is not allowed.
>> In that case, why not store the elements in a unique_ptr (and then everything in a vector)?
But why? What's the benefit of doing that? The objects will still be scattered around the heap so iterating and dereferencing still trashes the cache, you lose benefits like not invalidating iterators on inserts/removals, and you complicate the code.
I'm not saying linked lists should be your go-to data structure, but IMO the 'linked lists are bad, dont use them' meme is a little overused.