Although any random bathroom-wall graffiti is better than the SWEBOK, I don't know what to recommend that's actually
good. Part of the problem is that people still suck at programming.
“How to report bugs effectively”
<https://www.chiark.greenend.org.uk/~sgtatham/bugs.html> is probably
the highest-bang-for-buck reading on software engineering.
Not having read it, I hear The Pragmatic Programmer is pretty good. Code Complete was pretty great at the time. The Practice of Programming covers most of the same material but is much more compact and higher in quality; The C Programming Language, by one of the same authors, also teaches significant things. The Architecture of Open-Source Applications series isn't a handbook, but offers some pretty good ideas: https://aosabook.org/en/
Here are some key topics such a handbook or compendium ought to cover:
- How to think logically. This is crucial not only for debugging but also for formulating problems in such a way that you can program them into a computer. Programming problems that are small enough to fit into a programming interview can usually be solved, though badly, simply by rephrasing them in predicate logic (with some math, but usually not much) and mechanically transforming it into structured control flow. Real-world programming problems usually can't, but do have numerous such subproblems. I don't know how to teach this, but that's just my own incompetence at teaching.
- Debugging. You'll spend a lot of your time debugging, and there's more to debugging than just thinking logically. You also need to formulate good hypotheses (out of the whole set of logically possible ones) and run controlled experiments to validate them. There's a whole panoply of techniques available here, including testing, logging, input record and replay, delta debugging, stack trace analysis, breakpoint debuggers, metrics anomaly detection, and membrane interposition with things like strace.
- Testing. Though I mentioned this as a debugging technique, testing has a lot more applications than just debugging. Automated tests are crucial for finding and diagnosing bugs, and can also be used for design, performance profiling, and interface documentation. Manual tests are also crucial for finding and diagnosing bugs, and can also tell you about usability and reliability. There are a lot of techniques to learn here too, including unit testing, fuzzing, property-based testing, various kinds of test doubles (including mock objects), etc.
- Version tracking. Git is a huge improvement over CVS, but CVS is a huge improvement over Jupyter notebooks. Version control facilitates delta debugging, of course, but also protects against accidental typo insertion, overwriting new code with old code, losing your source code without backups, not being able to tell what your coworkers did, etc. And GitLab, Gitea, GitHub, etc., are useful in lots of ways.
- Reproducibility more generally. Debugging irreproducible problems is much more difficult, and source-code version tracking is only the start. It's very helpful to be able to reproduce your deployment environment(s), whether with Docker or with something else. When you can reproduce computational results, you can cache them safely, which is important for optimization.
- Stack Overflow. It's pretty common that you can find solutions to your problems easily on Stack Overflow and similar fora; twin pitfalls are blindly copying and pasting code from it without understanding it, and failing to take advantage of it even when it would greatly accelerate your progress.
- ChatGPT. We're still figuring out how to use large language models. Some promising approaches seem to be asking ChatGPT what some code does, how to use an unfamiliar API to accomplish some task that requires several calls, or how to implement an unfamiliar algorithm; and using ChatGPT as a simulated user for user testing. This has twin pitfalls similar to Stack Overflow. Asking it to write production-quality code for you tends to waste more time debugging its many carefully concealed bugs than it would take you to just write the code, but sometimes it may come up with a fresh approach you wouldn't have thought of.
- Using documentation in general. It's common for novice programmers to use poor-quality sites like w3schools instead of authoritative sites like python.org or MDN, and to be unfamiliar with the text of the standards they're nominally programming to. It's as if they think that any website that ranks well on Google is trustworthy! I've often found it very helpful to be able to look up the official definitions of things, and often official documentation has better ways to do things than outdated third-party answers. Writing documentation is actually a key part of this skill.
- Databases. There are a lot of times when storing your data in a transactional SQL database will save you an enormous amount of development effort, for several reasons: normalization makes invalid states unrepresentable; SQL, though verbose, can commonly express things in a fairly readable line or two that would take a page or more of nested loops, and many ORMs are about as good as SQL for many queries; transactions greatly simplify concurrency; and often it's easier to horizontally scale a SQL database than simpler alternatives. Not every application benefits from SQL, but applications that suffer from not using it are commonplace. Lacking data normalization, they suffer many easily avoidable bugs, and using procedural code where they could use SQL, they suffer not only more bugs but also difficulty in understanding and modification.
- Algorithms and data structures. SQL doesn't solve all your data storage and querying problems. As Zachary Vance said, "Usually you should do everything the simplest possible way, and if that fails, by brute force." But sometimes that doesn't work either. Writing a ray tracer, a Sudoku solver, a maze generator, or an NPC pathfinding algorithm doesn't get especially easier when you add SQL to the equation, and brute force will get you only so far. The study of algorithms can convert impossible programming problems into easy programming problems, and I think it may also be helpful for learning to think logically. The pitfall here is that it's easy to confuse the study of existing data structures and algorithms with software engineering as a whole.
- Design. It's always easy to add functionality to a small program, but hard to add functionality to a large program. But the order of growth of this difficulty depends on something we call "design". Well-designed large software can't be as easy to add functionality to as small software, but it can be much, much easier than poorly-designed large software. This, more than manpower or anything else, is what ultimately limits the functionality of software. It has more to do with how the pieces of the software are connected together than with how each one of them is written. Ultimately it has a profound impact on how each one of them is written. This is kind of a self-similar or fractal concern, applying at every level of composition that's bigger than a statement, and it's easy to have good high-level design and bad low-level design or vice versa. The best design is simple, but simplicity is not sufficient. Hierarchical decomposition is a central feature of good designs, but a hierarchical design is not necessarily a good design.
- Optimization. Sometimes the simplest possible way is too slow, and faster software is always better. So sometimes it's worthwhile to spend effort making software faster, though never actually optimal. Picking a better algorithm is generally the highest-impact thing you can do here when you can, but once you've done that, there are still a lot of other things you can do to make your software faster, at many different levels of composition.
- Code reviews. Two people can build software much more than twice as fast as one person. One of the reasons is that many bugs that are subtle to their author and hard to find by testing are obvious to someone else. Another is that often they can improve each other's designs.
- Regular expressions. Leaving aside the merits of understanding the automata-theory background, like SQL, regular expressions are in the category of things that can reduce a complicated page of code to a simple line of code, even if the most common syntax isn't very readable.
- Compilers, interpreters, and domain-specific languages. Regular expressions are a domain-specific language, and it's very common to have a problem domain that could be similarly simplified if you had a good domain-specific language for it, but you don't. Writing a compiler or interpreter for such a domain-specific language is one of the most powerful techniques for improving your system's design. Often you can use a so-called "embedded domain-specific language" that's really just a library for whatever language you're already using; this has advantages and disadvantages.
- Free-software licensing. If it works, using code somebody else wrote is very, very often faster than writing the code yourself. Unfortunately we have to concern ourselves with copyright law here; free-software licensing is what makes it legal to use other people's code most of the time, but you need to understand what the common licenses permit and how they can and cannot be combined.
- Specific software recommendations. There are certain pieces of software that are so commonly useful that you should just know about them, though this information has a shorter shelf life and is somewhat more domain-specific than the stuff above. But the handbook should list the currently popular libraries and analogous tools applicable to building software.