Also, in mitigation, it's not a special-purpose word for appending to vectors. They introduced "emplace" to other STL data structures at the same time, with the equivalent semantics, so it's a word you only have to learn once.
Especially since many of the standard containers have many more constructors that you'd think - hello std::vector(size_type count).
*when using it incorrectly. The premise of of emplace_back is that you use it for calling the constructor of the object in place. Obviously it won't help you if you call a copy constructor instead. I find this article a bit pointless. Clang's suggestion was spot-on and emplace_back would have (potentially) helped if the suggestion was actually followed correctly.
The author then proceeds to compare widgets.push_back(std::move(w)); versus widgets.emplace_back(std::move(w));. Wat? Why is he comparing against the thing you aren’t supposed to write? That makes no sense.
The final section I can’t tell if he’s comparing compile-time or run-time. The comments make me think compile but I’m not sure?
Good topic for a blog post. But it feels like this post is sloppily mixing three different ideas. Very strange.
But on the other hand, my instinctive reaction is: Come on, is this really something that confuses people? emplace_back is all about constructing something in-place, which is exactly the opposite of moving it there! How can any professional programmer be 180 degrees wrong about the purpose of a function they use regularly?
What I want to say here is that C++ is really hard, and that there are a million subtle things that one simply has to look up instead of making educated guesses. I don’t think people appreciate this difficulty enough.
Though I think the abseil article does a better job of explaining why: https://abseil.io/tips/112
[Googler; unaffiliated with Abseil(sadly)]
But okay, maybe it's more obvious when having worked with pre-C++11 code, where you would have killed for something like emplace_back. If you are new to the language and only know modern C++, you probably don't think much about why emplace_back exists in addition to push_back, and why it can be faster.
- dangling pointers
- integer under/overflow
- macro expansion (not via debugger but VSCode helps with this)
- double values not being the same when you expect them to be the same
- Seeing if something is copied or moved [1]
[1] Not sure if this is actually true. But I think it can be true by simply stepping inside the function like emplace_back
The only actual argument against blindly using emplace_back everywhere is that it has worse compile time in a deliberately pathological microbenchmark. Not convincing, though it is interesting.
Though place_back still has the connotation of 'take this thing and put it over there' that emplace_back is actually the opposite of.
Naming things really is hard.
And that is basically the premise of the Abseil Tip of the week series, which are an incredible resource for learning how to work with modern C++.
If you already know from previous versions that push_back makes a copy and that C++11 added move semantics and emplace_back, it's not a huge leap to connect the two. Especially if you don't notice push_back's rvalue overload (or understand rvalues).
And if you're new to C++ and are having all of this thrown at you at once it'd definitely be easy to get some crossed wires.
One of my favorite is "you don't need a linked list".
Although I'm still genuinely interested when a linked list is best suited. I'm also curious why they were invented and why they're taught, maybe just for teaching purposes...
Specifically I needed the guarantee that iterators weren't invalidated by other parts of the list changing. I could've used a vector and worked around it but it would've taken more effort and more code, and like I said the performance difference wouldn't have made an overall difference to the program.
They might be good for traversed lists with the possibility of partway insertion too, or performing insertion sort. Hell they might even be good for traversed lists in the general case, but I'm not a low level dev so I don't know without testing it.
Fun exercise though!
I'm guessing they're taught because they're taught, in that special inertial way that education propagates. Originally, I bet they were introduced when implementing your own data structures rather than pulling a library was normal industry practice.
Of course there is tons of value in knowing about data structures and their characteristics, but probably not in knowing how to implement LLs.
It's not hard to come up with plenty of valid use cases for linked lists. Any time you need some algorithm that does a lot of non-predictable insertions/deletions, on a collection that is not regularly iterated in full or used to find elements (so pointer-chasing/cache effects are irrelevant). And/or when the elements are prohibitively expensive reorder/move on insertions/deletions, or some other piece of code is known to keep direct references to the elements that should not be invalidated when some unrelated element is inserted/removed and. Maintaining validity of iterators while an algorithm is operating on the list is another important benefit.
Some people will come up with all kinds of clever workarounds or more complicated datastructures/bookkeeping so you can achieve the same thing without a linked list, but in most cases I don't see the point of it. Especially not in cases that are not performance-sensitive at all. In such cases you're just increasing code complexity and obfuscating intent because of some ideological objection to using linked lists.
There's nothing wrong with longer function names. They make code a lot easier to read.
You still should at least consider a vector here. If lookups are infrequent, you can just sort it and binary search it on demand (and just add new items at the end). There are cases when this is not good enough, but they really are surprisingly rare.
> And/or when the elements are prohibitively expensive reorder/move on insertions/deletions, or some other piece of code is known to keep direct references to the elements that should not be invalidated
In that case, why not store the elements in a unique_ptr (and then everything in a vector)?
> Especially not in cases that are not performance-sensitive at all. In such cases you're just increasing code complexity and obfuscating intent because of some ideological objection to using linked lists.
Fully agreed - if you are sure that it cannot possibly become performance relevant and a list has the right interface, use it. For example, when the number of items is fundamentally tiny (e.g. one item per dimension of your 3D data) and the thing you'll do with the list item will be complex (or infrequent) anyway so that a few pointer indirections are negligible.
It's a tradeoff, of course, but if the user code becomes unreadable because the function names are verbose (so you constantly have to wrap lines and cut through the visual noise to see the actually meaningful part of the code), just to make sure people don't misunderstand basic functionality... that's not the tradeoff I would like.
But with all things in computer science, it really depends on your use-case.
* Lazy linked lists are the same thing as iterators, a fact which is used pervasively in Haskell.
* Intrusive or otherwise inline linked lists like are used everywhere, e.g. the DOM has a doubly linked list of children for each node, simple memory pools are often implemented as free lists, and tons of resources inside kernels are members of many linked lists.
* Linked lists are guaranteed O(1) append, but dynamic arrays are only amortized O(1) append. Intrusive linked lists also never need to allocate. That makes them useful inside resource-constrained places.
I can think of a few use-cases where I've opted for linked-lists over resizable arrays:
1. When you have enough memory for the set of all elements you want to store but not enough contiguous memory for the set of all elements you want to store (i.e. You have the available memory, but it's all fragmented).
2. When you have multiple threads adding to/removing from the container, a linked list lets each thread (in theory) insert/remove nodes at the same time. For example, one thread can be adding to the head of a queue while another thread is removing from the tail of the same queue; they don't need to both wait for the same lock to resize the container.
3. When you have large insertions/removals. For example, concatenating two lists into one is just reassigning two/four pointers in linked lists. With arrays or similar you have to have enough free memory to make of a copy of the smaller list, at least).
Another example is "zipping" two lists together - you simply go through the lists and reassign pointers. With arrays you have to allocate enough memory for the final list before copying elements over.
Same goes for splitting lists into two: for example, applying a predicate function to remove and return all elements that satisfies the predicate takes no extra memory using lists, but with arrays will need the destination to be allocated first).
4. Finally, you may want to store cycles in your container (perhaps the traversal maintains state so you can use the cycle to make traversal of some nodes repeat, or maybe you're simply insane and like to confuse readers of your code :-).Okay, fair enough, that last one is me stretching things a bit :-)
If you came up with the same name: Congratulations, you probably picked a good name!
If not: Think about how the two names you picked differ. What made you sway towards one or the other in the respective contexts? Is there some subtlety that you were not aware of earlier? Is the intention still the same? Maybe you even realize that what you wrote earlier doesn't fully do what you needed.
This also works on longer time scales and in reverse, of course. Occasionally I look at some older code I wrote and realize that, with a shifted frame of mind, I read/understand a function name the wrong way, even though it technically means the right thing too. Try to learn from these experiences as you make them and consider them when you're coming up with a name ("can this be misunderstood from the perspective of a user that is trying to do something that's only tangentially related?").
(Is this what it feels like to become a dinosaur?)
There is some confusion on this issue: https://stackoverflow.com/questions/4303513/push-back-vs-emp...
The function void emplace_back(Type&& _Val) provided by MSCV10 is non conforming and redundant, because as you noted it is strictly equivalent to push_back(Type&& _Val).
Looks like (as usual) MSVC's lackluster implementation drives real-world usage. Is there a rule of thumb?
Maintaining an intrusive list of objects that you have around anyway is sometimes the simplest option, especially because objects can easily remove themselves from the list upon destruction.
Re 2. (threading): What happens when the list becomes empty? How do you keep track of the size to determine whether the list is empty? How do you know the size is still up-to-date when you go on to make a modification? I don't think this is as easy (even "in theory") as you make it sound.
Re 3. (merging overhead): You say "takes no extra memory", but you use significant extra memory already: You incur an additional two pointers of (and possibly per-allocation) overhead. For the entire lifetime of the list, not just when you do some operations.
If your elements are small, then your list might have double memory consumption the entire time instead of only when merging. And if they are large, you could store unique_ptrs to those elements in a vector. In which case the total memory overhead of merging two vectors (2 x total number of elements x pointer size) is exactly the same as a linked list has the entire time. And all the operations then turn into shuffling around pointers too, but probably fewer: You just have to move the unique_ptrs around, which is one pointer each instead of constantly unlinking and relinking lists.
Sure, sometimes iterator invalidation is a thing, and maybe keeping pointers to (separately allocated) elements doesn't cut it. But even then you wouldn't want to use a list if performance is even remotely a concern.
That's only relevant if you don't have an MMU tho. So not on any desktop or embedded system where ram is measured in gigabytes.
Otherwise it's up to the OS to handle that - only individual memory pages are contiguous, but not sets of pages.
This is the most useful case of removing from the middle of a container, which linked lists are good at. Intrusive lists are common for this type of thing, instead of actually storing a list iterator.
> In that case, why not store the elements in a unique_ptr (and then everything in a vector)?
I recently ended up using std::list<Object*> because of pointer invalidation on vector resize. I see now that using smart pointers would have worked just as well. Thanks.
See https://godbolt.org/z/6qW7q3
struct SimpleData
{
int a;
double b;
};
SimpleData someData = {1, 2.0}; // ok
std::vector<SimpleData> data;
// Why can I not do this?
data.emplace_back(1, 2.0);
The author seems to be an advanced C++ programmer, but doesn't seem to properly acknowledge that, as you say, The premise of of emplace_back is that you use it for calling the constructor of the object in place. The only instance of proper use of emplace_back is in
widgets.emplace_back(foo, bar, baz);
I was surprised to see no follow-up for this example: auto w = Widget(1,2,3);
widgets.emplace_back(w); // Fixed? Nope!
No mention of the obvious fix: widgets.emplace_back(1,2,3);
I don't mean to 'pile on', but, a pet peeve of mine: emplace_back is not magic C++11 pixie dust. This isn't saying anything. You could say garbage collection isn't magic, or error-correcting codes aren't magic, or sound type systems aren't magic. To say that something isn't magic is an empty statement, especially when it's something like emplace_back which introduces an important new ability.No, it isn't easy. I used a single lock to lock both head and tail pointers, and did the pointer-swapping and pointer-comparisons for the pathological cases in that lock[1]. However, that being said, locking for the comparison and head/tail reassignment is *still* preferable to locking while reallocating your array.
For counting items in the list, I post to a semaphore when an item is inserted, wait on the semaphore when an item is removed and simply return the semaphore count when a count of items is required. Because it is a linked list, there is no problem with having the count change in between retrieving it and adding/removing items - that'd only be a problem in array-like structures.
If you want to code-review my implementation then be my guest (it's on github, under my name, a repo called libcmq). I welcome criticism :-)
In any case, I've been considering reimplementing that library as a resizable array precisely because of the downsides of lists. I'm keeping it as it is (for now) because there are other things I want to do that aren't boring.
[1] Empty list and list of exactly one element are problematic.
I found these two articles really helpful when trying to understand moves:
https://abseil.io/tips/55 https://abseil.io/tips/77
Though I'd also say don't worry about it too much, especially at first. If you're copying a lot of temporary objects moving can get you some performance wins, but that's something profiling should be telling you m
Sadly, std::deque is pretty much useless. It uses something like page-sized blocks in one implementation, a few cachelines in another one (forgot which is gcc/clang), and on MSVC it allocates every single item separately (!!!) for element size > 1 word. And this will not be changed due to ABI compatibility. It might as well not exist in the standard.
I think this is said to express "simply replacing that function call/method/... without changing your calls/methods/coding style is not going to magically solve your problems". Which is true in the push_back/emplace_back case (you need to adjust your call).
GC, on the other hand, is magic pixie dust, if my definition is correct ;)
I agree, but you're doing all the work in conjuring this interpretation. Saying it's not magic doesn't say anything substantial.
If it were possible to mechanically replace all references to push_back by emplace_back for an easy performance boost, then we wouldn't need emplace_back to be its own member function, it would just be a compiler optimisation. You have to make shallow and localised changes to make proper use of emplace_back, but isn't that the best we can hope for? Far easier than parallelism, say, where sweeping architectural changes may be necessary.
1. Be unique.
2. Don't be misleading.
3. Be leading.
So in this case you are nailing 2 but missing out on three which I would consider a "pretty good" name. Sure, it would be nice if it was a bit more obvious (maybe construct_in_place_back?) but at least it doesn't sound like it does something else so people know to look it up.
data.push_back({1, 2.0});
That does not work with emplace_back (not that it would be useful) because the template arguments can't be deduced from the initializer list (whereas for push_back the type is known so you can initialize it with the initializer list).If you don't need persistence, you can often use a dynamic array to do basically everything a linked list can do for you, but most of it faster.
Also, if you have a relocating garbage collector, or otherwise a smart enough compiler (or interpreter), linked lists can become sufficiently efficient.
I've been at my current job for a little over two years. Before then I never touched C++; now it's the main (basically only) language used here.
I'm still getting trucked by this...
Eg when working on hard disks (eg think a file system or database with raw storage access). Or if you are working in a medium were you can only write once, and can't change what's already written, etc.
Widget w(1,2,3);
The part I agree with are the method selection semantics. A good rule of thumb (though don't rely on it) is that the "more specific" prototype generally wins.
I take the author's point about the cost of compiling a template, but the benchmark is odd: a thousand functions with a single call to emplace_back(). C++ programmers are (in)famously willing to trade compile-time performance for run-time performance. Complement the compile benchmark with a realistic program benchmark. In a typical program, I would expect emplace_back() to be called many more times than it's compiled.
In particular: #5 on that page: "we see that the right-hand style is not only more robust and maintainable for the reasons already given, but also arguably cleaner and more regular with the type consistently on the right when it is mentioned"
[1] https://herbsutter.com/2013/08/12/gotw-94-solution-aaa-style...
The student accidentally changed the semantics by adding the extra call to the copy contstructor, and therefore Clang didn't know anymore that it could be removed
My question is, after having incorrectly only done one of those things, how could it have known that the user was only halfway though a two-part operation and didn't actually mean for it to be that way? Clang doesn't know how the code was in the past.
Very much so. I tell everyone not to bother learning C++, there are so many other better languages out there. Hell, I'm only still using it because of legacy projects
See https://godbolt.org/z/Tj5baM and https://en.cppreference.com/w/cpp/language/aggregate_initial... (case 5).
He gives an example using aggregate initialization, but that's not the same thing:
// Classic C++ declaration order // Modern C++ style
employee e{ empid }; auto e = employee{ empid };
widget w{ 12, 34 }; auto w = widget{ 12, 34 };
He makes the case that it's nicer to use auto pretty much wherever we can, to avoid repeating ourselves regarding type, and to avoid unexpected conversions. The auto keyword doesn't apply in our case, there's no way to use auto to simplify a statement like: Widget w(1,2,3);
Probably, but using the canonical syntax removes all doubt, following the C++ language specification. That's a small but real disadvantage in robustness when using the unnecessary assignment, and I'm not aware of any reason to favour using it.
If you have a strong understanding of move semantics, then I agree it would be pretty weird to misunderstand the relationship between move semantics and emplace_back. But how common is it to have a strong understanding of move semantics, really? Can we confidently assume that even 50% of professional C++ programmers have a strong understanding of move semantics? I've known competent C++ devs, who write template metaprograms for fun, who were still surprised to learn basic facts about move semantics like "the destructor of a moved-from object will still be called." I know for certain that I don't have a strong understanding of move semantics, but I wouldn't call myself a C++ programmer either, so at least I'm not dragging down the statistics :)
This is a great observation! There's probably a ton of this that feeds the "$lang/api/framework is garbage"/"no it's not" schisms. "Well do you prefer documentation or a REPL?"
As if the documentation/compiler implementations are otherwise perfect. Quick, does `std::condition_variable::wait_for` use a monotonic clock under the hood?
Here's the docs: https://en.cppreference.com/w/cpp/thread/condition_variable/...
You might think so based on the part that says:
> Note that rel_time must be small enough not to overflow when added to std::chrono::steady_clock::now()."
But I happen to know from actually using it, that not all implementations use a monotonic clock: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41861
This led to a rather dramatic time bomb bug in our code that only flared up when GPS (and therefore system time) was being spoofed to replay a specific scenario in the past.
foo(someValue);
someValue will bever be implicitly moved. Either you need an explicit cast (i.e. std move) or someValue is actually an expression returing a temporary or an rvalue.Of course if foo takes someValue by non const reference it might still modify someValue as it wishes (and moving out of it is just one possible mutation). But this is not a specific issue with moves but with functions taking non const references in general, which should be used only sporadically and hopefully the name makes the mutation clear.
Despite my tendencies towards almost-always auto, I can't say that I consistently use auto x = type{init}, but there is certainly nothing wrong with it.
Just a hint that this is a very sound suggestion.
1. Multiple independent (not necessarily concurrent, but not coordinated) mutators.
2. Insertion and removal happen commonly and should be cheap.
3. Operations on the whole list typically happen in list order or close to it.
Bonus points if the size of the relevant data structures is often such that you blow out caches anyway (e.g each thing in your list is larger than a cache line) and hence there are no strong locality benefits from an array. Thought prefetching _can_ sometimes help with arrays there.
The above features _can_ be implemented on top of an array, but feature 1 means you can't use indices as long-lived handles to items, so you need either handles that get updated on mutations or some concept of identity and "find the current index of this thing" calls.
For context here, Gecko used to store the children of a DOM node as an array of pointers and switched to a doubly-linked list because the DOM does have the above properties and it was faster and mostly simpler to have a linked list in practice. There is a bit of complexity to make node.childNodes iteration (as opposed to .nextSibling/.previousSibling iteration) not be O(N^2), but overall the switch was a win.
Actually these days it does. Copy/Move elision is mandated by the language in this case and no valid copy/move constructor is required. Similarly is no possible to return non-movable non-copyable types from functions which opens the possibility of interesting code patternes.
But there is something I noticed: changing to a new name on the spot is more work.
I have to find all the previous occurrences and rename them.
I can use my editor to help, some plugins for it, or I can get the code to some parsable step and then do a refactor rename.
Notice how I had to get sidetracked here while I am trying to keep my original context in mind.
I use vim, so maybe I am missing out on fancier ways to do this quick on the spot rename? I would love to optimize this problem away.
I still don't like it though. I'd rather express what I want to happen than express something I don't want to happen in the knowledge that the standard requires my compiler to be sufficiently smart to repair my statement. Consistent behaviour across different language versions is another plus.
If you'll permit me a moment's snark:
Most OOP languages: You can't copy objects. You may copy references to objects. Copying objects is one road to madness. Move semantics is another.
Old C++: You can copy objects, and specify your own copy constructors (which are permitted to side-effect), but the compiler is permitted to optimise away copy operations under certain circumstances. You shouldn't assume your copy constructor will be invoked whenever you see what looks like a copy.
New C++: You can copy objects, and specify your own copy constructors (which are permitted to side-effect), but the compiler is permitted to optimise away copy operations under certain circumstances, and beyond that, is now required to optimise away copy operations in some specific circumstances. You shouldn't assume your copy constructor will be invoked whenever you see what looks like a copy. When you see a copy operation in certain specific contexts, you are permitted to assume your copy constructor will not be invoked.
As with so many discussions of C++, I'm reminded of two quotes:
> Within C++, there is a much smaller and cleaner language struggling to get out.
- Bjarne Stroustrup [0]
> Do you know how the Orcs first came into being? They were elves once, taken by the dark powers, tortured and mutilated. A ruined and terrible form of life. And now... perfected.
- Saruman
It is sometimes right to hold your nose and use std::piecewise_construct.
But most usually it doesn't matter, and worrying about it will distract you from what does matter. So, push_back() unless you know you have a good reason not to. And, you might never encounter one.
Good C++ and naïve C++ are not so different. Both are much better than fake-smart C++.
Well most of the time linked lists are used for things that have to be kept in some order that cannot (easily) be determined by a sorting predicate, and/or when re-sorting is not allowed.
>> In that case, why not store the elements in a unique_ptr (and then everything in a vector)?
But why? What's the benefit of doing that? The objects will still be scattered around the heap so iterating and dereferencing still trashes the cache, you lose benefits like not invalidating iterators on inserts/removals, and you complicate the code.
I'm not saying linked lists should be your go-to data structure, but IMO the 'linked lists are bad, dont use them' meme is a little overused.