The motivation and reason it works is also wrong anyway. Like i get it's a gentle intro, but i think there are ways to accomplish that without being egregiously history rewriting ;)
Ken zadeck was my office mate for years, so this is my recollection, but it's also been a few decades, so sorry for any errors :)
The reason of why it works is definitely wrong - they weren't even using rewriting forms of SSA, and didn't for a long time. Even the first "fully SSA" compiler (generally considered to be open64) did not rewrite the IR into SSA.
The reason it works so well is because it enables you to perform effective per-variable dataflow. In fact, this is the only problem it solves - the ability to perform unrestricted per-variable dataflow quickly. This was obvious given the historical context at the time, but less obvious now that it is history :)
In the simplest essence, SSA enables you to follow chains of dataflow for a variable very easily and simply, and that's probably what i would have said instead.
It's true that for simple scalar programs, the variable name reuse that breaks these dataflow chains mostly occur at explicit mutation points, but this is not always correct depending on the compiler IR, and definitely not correct you extend it to memory, among other things. It's also not correct at all as you extend the thing SSA enables you to do (per-variable dataflow quickly) to other classes of problems (SSU, SSI, etc).
History wise - this post also sort of implies SSA came out of nowhere and was some revolutionary thing nobody had ever thought about, just sort of developed in the 80's.
In fact, it was a formalization of attempts at per-variable dataflow they had been working on for a while.
I'd probably just say that as a gentle intro, but if you want the rest of history, here you go:
Well before SSA, it was already known that lower bounds on bitvector dataflow (the dominant approach at the time of SSA) were not great. Over the years, it turned out they were much better than initially expected, but in the end, worse than anyone wanted as programs got bigger and bigger. N^2 or N^3 bitvector operations for most problems. Incrementality is basically impossible[2]. They were also hard to understand and debug because of dependencies between related variables, etc.
Attempts at faster/better dataflow existed in two rough forms, neither of which are bound by the same lower bound:
1. Structured dataflow/Interval analysis algorithms/Elimination dataflow - reduce the CFG into various types of regions with a known system of equations, solve the equations, distribute the results to the regions. Only works well on reducible graphs Can be done incrementally with care. This was mostly studied in parallel to bitvectors, and was thought heavily about before it became known that there was a class of rapid dataflow problems (IE before the late 70's). Understanding the region equations well enough to debug them requires a very deep understanding of the basis of dataflow - lattices, etc.
In that sense it was worse than bitvectors to understand for most people. Graph reducibility restrictions were annoying but tractable on forward graphs through node splitting and whatnot (studies showed 90+% of programs at the time had reducible flowgraphs already), but almost all backwards CFG's are not reducible, making backwards dataflow problems quite annoying. In the end, as iterative dataflow became provably faster/etc, and compilers became the province of more than just theory experts, this sort of died[3]. If you ever become a compiler theory nerd, it's actually really interesting to look at IMHO.
2. Per-variable dataflow approaches. This is basically "solve a dataflow problem for single variable or cluster of variables at a time instead of for all variables at once".
There were not actually a ton of people who thought this would ever turn into a practical approach:
a. The idea of solving reaching definitions (for example) one variable at a time seemed like it would be much slower than solving it for all variables at once.
b. It was very non-obvious how to be able to effectively partition variables to be able to compute a problem on one variable (or a cluster of variables) at a time without it devolving into either bitvector-like time bounds or structured dataflow like math complexity.
It was fairly obvious at the time that if you culd make it work, you could probably get much faster incremental dataflow solution.
SSA came out of this approach. Kenny's thesis dealt with incremental dataflow and partitioned variable problems, and proved time bounds on various times of partitioned/clustered variable problems. You can even see the basis of SSA in how it thinks about things. Here, i'll quote from a few parts:
" As the density of Def sites per variable increases, the average size of the affected area for each change will decrease...".
His thesis is (amont other things) on a method for doing this that is typical for the time (IE not SSA):
"The mechanism used to enhance performance is raising the density of Def sites for each variable. The most obvious way to increase the Use and Def density is to attempt a reduction on the program flow graph. This reduction would replace groups of nodes by a single new node. This new node would then be labeled with infcrmation that summarized the effects of execution through that group of nodes."
This is because, as i mentioned, figuring out how to do it by partitioning the variables based on dominance relationships was not known - that is literally SSA, and also because Kenny wanted to finish his thesis before the heat death of the sun.
In this sense, they were already aware that if they got "the right number of names" for a variable, the amount of dataflow computation needed for most problems (reaching defs, etc) would become very small, and that changes would be easy to handle. They knew fairly quickly that for the dataflow problems they wanted to solve, they needed each variable to have a single reaching definition (and reaching definitions was well known), etc.
SSA was the incremental culmination of going from there to figuring out a clustering/partitioning of variables that was not based on formally structured control flow regions (which are all-variables things), but instead based on local dataflow for a variable with incorporation of the part of the CFG structure that could actually affect the local result for a given variable.
It was approached systematically - understanding the properties they needed, figuring out algorithms that solved them, etc.
Like a lot of things that turn out to be unexpectedly useful, take the world by storm, whatever, etc, history later seems to try to often represent them as a eureka moment.
[1] Bitvectors are assumed to be fixed size, and thus constant cost, but feel free to add another factor of N here if you want.
[2] This is not true in the sense that we knew how to recompute the result incrementally, and end up with a correct result, but doing so provably faster than solving the problem from scratch was not possible.
[3] They actually saw somewhat of a resurgence in the world of GPUs and more general computation graphs because it all becomes heavily applicable again to solving. However, we almost always have eventually developed easier to understand (even if potentially slower theory-wise) global algorithms and use those instead, because we have the compute power to do it, and this tradeoff is IMHO worth it.