We were doing all sorts of things wrong and not idiomatically, but things turned out ok for the most part.
The fun thing with restart strategies is if your process fails quickly, you get into restart escalation, were your supervisor restarts because you restarted too many times, and so on and then beam shuts down. But that happens once or twice and you figure out how to avoid it (I usually put a 1 second sleep at startup in my crashy processes, lol).
Ghost processes are easy-ish to find. erlang:processes() lists all the pids, and then you can use erlang:process_info() to get information about them... We would dump stats on processes to a log once a minute or so, with some filtering to avoid massive log spew. Those kinds of things can be built up over time... the nice thing is the debug shell can see everything, but you do need to learn the things to look for.
What's so cool about BEAM is you can connect a repl and debug the program as it's running. It's probably the best possible system for discovering what's happening as things are happening.