←back to thread

140 points Tomte | 1 comments | | HN request time: 0.001s | source
Show context
kragen ◴[] No.26289195[source]
(BTW, Norman's server seems to be suffering under the load; https://web.archive.org/web/20210223015500/https://www.cs.tu... has your (Way)Back if you're suffering problems in accessing it.)

I've been interested in literate programming for a long time; for my self-bootstrapping PEG parser https://github.com/kragen/peg-bootstrap/blob/master/peg.md I wrote my own noweb-like system called HandAxeWeb in Lua (5.x) https://github.com/kragen/peg-bootstrap/blob/master/handaxew.... It accepts input in Markdown, HTML, ReStructuredText, etc., and it's only a couple hundred lines of Lua.

For HandAxeWeb (named following the convention of StoneKnifeForth—the intent was to make it simple enough that even fairly early stages of a bootstrap could be literate programs), I wanted to be able to include multiple versions of a program in the same document, because I think it's often helpful to see the temporal development of a program from a simpler version to a more complex version. The simpler version is easier to understand and helps you focus on the most fundamental aspects of the program. https://www.youtube.com/watch?v=KoWqdEACyLI is a 5'30" screencast (not by me) of explaining the development of a Pong program in this fashion. I think it's usually easier to understand a program in this fashion than by trying to understand the final complete version bit by bit, the way something like "TeX: The Program" forces you to do.

Still, generally speaking, soi-disant literate programming tools—including my own—generally fail to take advantage of the most compelling aspect of the computer as a communication medium: its ability to simulate. When I dive into a new code base, it's never entirely by reading it—whether top-down, bottom-up, or in any other order. The cross-reference links added by things like CWEB (or, you know, ctags) are helpful, of course, but invariably I want to see the output of the program, which CWEB doesn't support at all! (Although Knuth's TeX: The Program does manage to include TeX output despite being written in CWEB, that's in a sense sort of a coincidence; this is not a feature CWEB can provide for any programs other than TeX and METAFONT.)

Books like Algorithms, by Knuth's student Sedgewick, are full of graphical representations of the outputs of the algorithms being discussed, and this is enormously helpful—perhaps even more so than the source code; see https://algs4.cs.princeton.edu/22mergesort/ for some examples from the current version of the book, which is lamentably in Java. It's better still, though, when you can edit the code and see the results—when diving into a new code base, I tend to execute modified versions of the code a lot, whether in the debugger or with extra logging or what. Paper books can't do this, but that's no excuse for not doing it when we're writing for readers who have computers.

Philip Guo's Python Tutor http://pythontutor.com/ provides dynamic visualization of the memory contents of, in theory, arbitrary code (in the supported languages, including of course Python, but also C, C++, Java, JS, and Ruby). There are things you can display with animation that you can't display in a static page, but Algorithms gets quite far with static printed images, and I think static visualization is better when you can make it work, for reasons explored at length in Bret Victor's http://worrydream.com/MagicInk/.

Python Tutor doesn't scale to large programs, but Dorothea Lütkehaus and Andreas Zeller's DDD can control GDB to create such visualizations for anything you can run under GDB (or JDB, Ladebug, pydb, or perl -d). Unfortunately there's no way to share the output of either DDD or Python Tutor, except maybe a screencast, and despite having been around since 01995, DDD has never been popular, I suspect because its Motif UI is clumsy to use. https://edoras.sdsu.edu/doc/ddd/article20.html shows what it looked like in 02002 and https://youtu.be/cKQ1qdo79As?t=106 shows what it looked like in 02015.

Of course, spreadsheets are by far the most popular programming environment, and they have always displayed the program's output when you open it—even to the exclusion of the code, mostly. I've experimented with this sort of thing in the past, with things like http://canonical.org/~kragen/sw/bwt an interactive visualization of the Burrows–Wheeler transform, and so it's been heartening to see modern software development moving in this direction.

The simplest version of this is things like Python's doctest, where you manually paste textual snippets of output in the code itself, and a testing tool automatically verifies that they're still up-to-date; Darius Bacon's Halp https://github.com/darius/halp is a more advanced version of this, where the example output updates automatically, so you can make changes to the program and see how they affect the results.

The most polished versions of this approach seem to have adopted the name "explorable explanations", and many of the best examples are Amit Patel's, which he has at different times termed "interactive illustrations" https://simblob.blogspot.com/2007/07/interactive-illustratio..., "active essays" (I think? Maybe I'm misremembering and that term was current in Squeak around 02003: http://wiki.squeak.org/squeak/1125), and "interactive visual explanations". I wrote a previous comment about this on here in 02019: https://news.ycombinator.com/item?id=20954056.

However, Amit's explorables, like many other versions of the genre, de-emphasize the underlying code to the point where they both don't display the actual code and don't let you edit it. They're intended to visualize an algorithm, not a codebase.

Mike Bostock, d3.js's original author, has created https://bl.ocks.org/ for sharing explorable explanations made with d3, and is doing a startup called ObservableHQ which makes things like this a lot easier to build: https://beta.observablehq.com/d/e639659056145e88 but at the expense of a certain amount of polish and presentational freedom. Also, unfortunately, ObservableHQ programs seem to be tied to the company's website—you can download their output, but very much unlike TeX, the programs will only be runnable until the company goes out of business. So if you aspire to make a lasting contribution to human intellectual heritage, like TeX, GCC, or d3.js itself, ObservableHQ is not for you.

R Markdown (by JJ Allaire—yes, the Cold Fusion dude—and Yihui Xie, among others) is one of the more interesting developments here; as with noweb or HandAxeWeb, you edit something very close to the "woven" version of the source code (in a dialect of Markdown); but, in a separate file alongside, RStudio maintains the results of executing the code, which are included in the "woven" output, and may be textual or graphical. Moreover, as with Halp or ObservableHQ, these results are displayed in a notebook-style interface as you're editing the code. https://bookdown.org/yihui/rmarkdown/notebook.html has a variety of examples, and Xie is rightly focused on reproducibility, which is very challenging to achieve with the existing tooling. https://bookdown.org/ lists a number of books that have been written with R Markdown, and https://github.com/rstudio/rmarkdown explains the overall project.

Of course the much more common notebook-style interface, and the one that popularized the interaction style, is Jupyter (influenced by SageMath), which mixes input and output indiscriminately in the same file and peremptorily makes backward-incompatible changes in file formats; the result is a lot of friction with version-control systems. Nevertheless, it supports inline LaTeX, it's easy to use and compatible with a huge variety of existing software, and it can include publication-quality visualizations, so there's a lot of code out there in Jupyter notebooks now, far more than in any system that purports to be a "literate programming" system. Notable examples include Peter Norvig's œuvre (there's a list at https://github.com/norvig/pytudes#pytudes-index-of-jupyter-i...). I find this a very comfortable and powerful medium for this form of literate programming; recent examples include https://nbviewer.jupyter.org/url/canonical.org/~kragen/sw/de..., https://nbviewer.jupyter.org/url/canonical.org/~kragen/sw/de..., and https://nbviewer.jupyter.org/url/canonical.org/~kragen/sw/de..., which are maybe sort of embarrassingly bad but I think demonstrate the potential of the medium for vernacular expression of vulgar software, as well as lofty Norvig-type things.

Konrad Hinsen has written about the reproducibility and lock-in problems introduced by Jupyter, for example in https://khinsen.wordpress.com/2015/09/03/beyond-jupyter-what..., and has been using Tudor Gîrba's Glamorous Toolkit https://gtoolkit.com/ to explore what comes next. He's been hitting the reproducibility problem pretty hard in http://www.activepapers.org/ but the primary intent there is, as with the explorable-explanations stuff, code as a means to producing research ("How should we package and publish the outcomes of computer-aided research"), rather than maintainability and understandability of code itself. I think this is a promising direction for literate programming as such, too.

replies(3): >>26289911 #>>26290050 #>>26290328 #
1. kragen ◴[] No.26290050[source]
About spreadsheets, I missed the editing window on this, but I wanted to point out that in addition to the plotting capabilities spreadsheets have included since at least Lotus 1-2-3 1.0A in 01983 https://www.pcjs.org/software/pcx86/app/lotus/123/1a/ you can use conditional formatting and the like to get useful algorithmic visualizations; as an example, consider http://canonical.org/~kragen/sw/dev3/minskyplot.gnumeric, which also uses a slider to allow you to alter algorithm parameters dynamically in an ObservableHQ-like way. Even with Lotus 1-2-3 on a 4.7-MHz IBM PC 5150, you could get a much quicker feedback loop for that kind of thing than you can get from reading a printed program, but it was considerably harder to share with other people.

If you want to get that kind of historical end-user programming perspective, can load the disk image at http://canonical.org/~kragen/sw/dev3/lotus-123-1a-plotsin.im... into the PCjs emulator running Lotus 1-2-3 linked above (mount it as drive B:), /FR Retrieve PLOTSIN.WKS, and type /GV to view the graph, and you can also load the .wks file from http://canonical.org/~kragen/sw/dev3/plotsin.wks into modern Gnumeric or LibreOffice Calc—but they won't display the graph. (I was also able to mount a directory containing the files from that disk image on drive B: in Dosbox and load the spreadsheet into 1-2-3—but Dosbox's CGA emulation seems to screw up on actually displaying the graph, and I think PCjs is also emulating the speed of the machine, which is an important aspect of the user experience.)

Of course spreadsheets are a pretty limited programming environment, and like modern explorable explanations, they're focused on presenting the results of the computation, or enabling you to apply it to new inputs, rather than focused on explaining the inner workings of the computation itself. But they do expose the inner workings, even if only by necessity, and for problems they can solve at all, they're often a much more convenient way to understand some algorithm than a static pile of source code.