It'd be expensive though; every context switch would require it's own stack and pushing / restoring one more register. There's GOOD reason programs don't work that way and are supposed to not rely on values outside of properly initialized (and not later clobbered) memory.
As an additional protection, new stack frames are implicitly zeroed as they are created. I assume this is done by filling the CPU cache with zeros for those addresses before continuing to execute the called function. No need to wait for actual zeros to be written to main memory.
For reference, the actual proposal that was accepted into C++26 is [2]. It discusses performance only in general, and it refers to an earlier analysis [3] for more details. This last reference describes regressions of around 0.5% in time and in code size. Earlier prototypes suggested larger regressions (perhaps even "horrendous") but more emphasis on compiler optimizations has brought the regression down considerably.
Of course one's mileage may vary, and one might also consider a 0.5% regression unacceptable. However, the C++ committee seems to have considered this to be an acceptable tradeoff to remove a frequent cause of undefined behavior from C++.
[1]: https://herbsutter.com/2024/08/07/reader-qa-what-does-it-mea...
[2]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p27...
[3]: https://open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2723r1...
This compiler option causes the compiler to emit a call to a stack probe function to ensure that a sufficient amount of stack space is available.
Rather than just probe once for each stack page used, you can substitute a function that *FILLS* the stack frame with a particular value - something like 0xBAADF00D - one could set the value to anything you wanted at runtime.
This would get you similar behaviour to gcc/clang's -ftrivial-auto-var-init
Windows has started to auto-initialize most stack variables in the Windows kernel and several other areas.
The following types are automatically initialized:
Scalars (arrays, pointers, floats)
Arrays of pointers
Structures (plain-old-data structures)
The following are not automatically initialized:
Volatile variables
Arrays of anything other than pointers (i.e. array of int, array of structures, etc.)
Classes that are not plain-old-data
During initial testing where we forcibly initialized all types of data on the stack we saw performance regressions of over 10% in several key scenarios.
With POD structures only, performance was more reasonable. Compiler optimizations to eliminate redundant stores (both inside basic blocks and between basic blocks) were able to further drop the regression caused by POD structures from observable to noise-level for most tests.
We plan on revisiting zero initializing all types (especially now that our optimizer has more powerful optimizations), we just haven’t gotten to it yet.
see https://web.archive.org/web/20200518153645/https://msrc-blog...Expensive is the (very slow for modern CPUs) operation of _writing_ that change in value out to memory at it's distant and slow speed compared to that which the CPU operates at, as well as the overhead of synchronizing that write to any other caches of those memory locations.
Maybe you're thinking of the trick of a band new page of memory mapped memory that is 'zeroed' but is in reality just a special 'all zeros' page in the virtual to physical memory lookup table? Those still need to be zeroed by real writes at some point, if they're ever used.
Consider the case of a system call, such as `read`. You’re in user space and you have some stack frames on the stack as usual. You allocate a buffer on the stack (there’s a cpu instruction for that; it basically just extends your “turf¹” to include more of the stack page, and zeros it as mentioned) to hold the data you want to read. You then call `read` with the `call` instruction, including the address of the buffer and the buffer size as arguments. So far everything is very straight–forward.
But `read` is actually in a different protection domain; it’s part of the kernel. The CPU uses metadata previously set up by the kernel to turn this into a “portal call”. After the portal call your thread will be given a different protection domain. In principle this is the kernel’s protection domain, but in reality the kernel might split that up in many complicated ways. What is relevant here is that the turf of this protection domain has been modified to include this new stack frame. From the perspective of `read`, the stack has just started; there are no prior frames. The reality is that this stack frame is still part of the stack of the caller, it’s only the turf that has changed. Those prior stack frames still exist, but they are unreadable. Worse, the buffer is also unreadable; it’s located at an address that is not part of the kernel’s turf.
So obviously there needs to be another set of instructions for modifying turfs. The full set of obvious modifications are available, but the relevant one here is a temporary grant of read and/or write permissions to a function you are about to call. You would insert a `pass` instruction to pass along access to the buffer for the duration of the call. This access is automatically revoked after the call returns. (Ideally you wouldn’t actually have to do this manually for every portal call; instead you would call a non–portal `read` function in libc. This function’s job is to make the portal call, and whoever wrote it makes sure to include the `pass` instruction.)
¹ A turf is the set of addresses that a given thread running in a given protection domain can read and/or write.