Noweb – A Simple, Extensible Tool for Literate Programming

1. kragen ◴[27 Feb 21 23:38 UTC] No.26289195[source]▶

(BTW, Norman's server seems to be suffering under the load; https://web.archive.org/web/20210223015500/https://www.cs.tu... has your (Way)Back if you're suffering problems in accessing it.)

I've been interested in literate programming for a long time; for my self-bootstrapping PEG parser https://github.com/kragen/peg-bootstrap/blob/master/peg.md I wrote my own noweb-like system called HandAxeWeb in Lua (5.x) https://github.com/kragen/peg-bootstrap/blob/master/handaxew.... It accepts input in Markdown, HTML, ReStructuredText, etc., and it's only a couple hundred lines of Lua.

For HandAxeWeb (named following the convention of StoneKnifeForth—the intent was to make it simple enough that even fairly early stages of a bootstrap could be literate programs), I wanted to be able to include multiple versions of a program in the same document, because I think it's often helpful to see the temporal development of a program from a simpler version to a more complex version. The simpler version is easier to understand and helps you focus on the most fundamental aspects of the program. https://www.youtube.com/watch?v=KoWqdEACyLI is a 5'30" screencast (not by me) of explaining the development of a Pong program in this fashion. I think it's usually easier to understand a program in this fashion than by trying to understand the final complete version bit by bit, the way something like "TeX: The Program" forces you to do.

๛

Still, generally speaking, soi-disant literate programming tools—including my own—generally fail to take advantage of the most compelling aspect of the computer as a communication medium: its ability to simulate. When I dive into a new code base, it's never entirely by reading it—whether top-down, bottom-up, or in any other order. The cross-reference links added by things like CWEB (or, you know, ctags) are helpful, of course, but invariably I want to see the output of the program, which CWEB doesn't support at all! (Although Knuth's TeX: The Program does manage to include TeX output despite being written in CWEB, that's in a sense sort of a coincidence; this is not a feature CWEB can provide for any programs other than TeX and METAFONT.)

Books like Algorithms, by Knuth's student Sedgewick, are full of graphical representations of the outputs of the algorithms being discussed, and this is enormously helpful—perhaps even more so than the source code; see https://algs4.cs.princeton.edu/22mergesort/ for some examples from the current version of the book, which is lamentably in Java. It's better still, though, when you can edit the code and see the results—when diving into a new code base, I tend to execute modified versions of the code a lot, whether in the debugger or with extra logging or what. Paper books can't do this, but that's no excuse for not doing it when we're writing for readers who have computers.

Philip Guo's Python Tutor http://pythontutor.com/ provides dynamic visualization of the memory contents of, in theory, arbitrary code (in the supported languages, including of course Python, but also C, C++, Java, JS, and Ruby). There are things you can display with animation that you can't display in a static page, but Algorithms gets quite far with static printed images, and I think static visualization is better when you can make it work, for reasons explored at length in Bret Victor's http://worrydream.com/MagicInk/.

Python Tutor doesn't scale to large programs, but Dorothea Lütkehaus and Andreas Zeller's DDD can control GDB to create such visualizations for anything you can run under GDB (or JDB, Ladebug, pydb, or perl -d). Unfortunately there's no way to share the output of either DDD or Python Tutor, except maybe a screencast, and despite having been around since 01995, DDD has never been popular, I suspect because its Motif UI is clumsy to use. https://edoras.sdsu.edu/doc/ddd/article20.html shows what it looked like in 02002 and https://youtu.be/cKQ1qdo79As?t=106 shows what it looked like in 02015.

๛

Of course, spreadsheets are by far the most popular programming environment, and they have always displayed the program's output when you open it—even to the exclusion of the code, mostly. I've experimented with this sort of thing in the past, with things like http://canonical.org/~kragen/sw/bwt an interactive visualization of the Burrows–Wheeler transform, and so it's been heartening to see modern software development moving in this direction.

The simplest version of this is things like Python's doctest, where you manually paste textual snippets of output in the code itself, and a testing tool automatically verifies that they're still up-to-date; Darius Bacon's Halp https://github.com/darius/halp is a more advanced version of this, where the example output updates automatically, so you can make changes to the program and see how they affect the results.

๛

The most polished versions of this approach seem to have adopted the name "explorable explanations", and many of the best examples are Amit Patel's, which he has at different times termed "interactive illustrations" https://simblob.blogspot.com/2007/07/interactive-illustratio..., "active essays" (I think? Maybe I'm misremembering and that term was current in Squeak around 02003: http://wiki.squeak.org/squeak/1125), and "interactive visual explanations". I wrote a previous comment about this on here in 02019: https://news.ycombinator.com/item?id=20954056.

However, Amit's explorables, like many other versions of the genre, de-emphasize the underlying code to the point where they both don't display the actual code and don't let you edit it. They're intended to visualize an algorithm, not a codebase.

Mike Bostock, d3.js's original author, has created https://bl.ocks.org/ for sharing explorable explanations made with d3, and is doing a startup called ObservableHQ which makes things like this a lot easier to build: https://beta.observablehq.com/d/e639659056145e88 but at the expense of a certain amount of polish and presentational freedom. Also, unfortunately, ObservableHQ programs seem to be tied to the company's website—you can download their output, but very much unlike TeX, the programs will only be runnable until the company goes out of business. So if you aspire to make a lasting contribution to human intellectual heritage, like TeX, GCC, or d3.js itself, ObservableHQ is not for you.

๛

R Markdown (by JJ Allaire—yes, the Cold Fusion dude—and Yihui Xie, among others) is one of the more interesting developments here; as with noweb or HandAxeWeb, you edit something very close to the "woven" version of the source code (in a dialect of Markdown); but, in a separate file alongside, RStudio maintains the results of executing the code, which are included in the "woven" output, and may be textual or graphical. Moreover, as with Halp or ObservableHQ, these results are displayed in a notebook-style interface as you're editing the code. https://bookdown.org/yihui/rmarkdown/notebook.html has a variety of examples, and Xie is rightly focused on reproducibility, which is very challenging to achieve with the existing tooling. https://bookdown.org/ lists a number of books that have been written with R Markdown, and https://github.com/rstudio/rmarkdown explains the overall project.

Of course the much more common notebook-style interface, and the one that popularized the interaction style, is Jupyter (influenced by SageMath), which mixes input and output indiscriminately in the same file and peremptorily makes backward-incompatible changes in file formats; the result is a lot of friction with version-control systems. Nevertheless, it supports inline LaTeX, it's easy to use and compatible with a huge variety of existing software, and it can include publication-quality visualizations, so there's a lot of code out there in Jupyter notebooks now, far more than in any system that purports to be a "literate programming" system. Notable examples include Peter Norvig's œuvre (there's a list at https://github.com/norvig/pytudes#pytudes-index-of-jupyter-i...). I find this a very comfortable and powerful medium for this form of literate programming; recent examples include https://nbviewer.jupyter.org/url/canonical.org/~kragen/sw/de..., https://nbviewer.jupyter.org/url/canonical.org/~kragen/sw/de..., and https://nbviewer.jupyter.org/url/canonical.org/~kragen/sw/de..., which are maybe sort of embarrassingly bad but I think demonstrate the potential of the medium for vernacular expression of vulgar software, as well as lofty Norvig-type things.

๛

Konrad Hinsen has written about the reproducibility and lock-in problems introduced by Jupyter, for example in https://khinsen.wordpress.com/2015/09/03/beyond-jupyter-what..., and has been using Tudor Gîrba's Glamorous Toolkit https://gtoolkit.com/ to explore what comes next. He's been hitting the reproducibility problem pretty hard in http://www.activepapers.org/ but the primary intent there is, as with the explorable-explanations stuff, code as a means to producing research ("How should we package and publish the outcomes of computer-aided research"), rather than maintainability and understandability of code itself. I think this is a promising direction for literate programming as such, too.

replies(3): >>26289911 #>>26290050 #>>26290328 #

2. someguydave ◴[28 Feb 21 01:43 UTC] No.26289911[source]▶

>>26289195 (TP) #

Thanks for your comment. Are you aware of a literate programming tool that primarily uses tags in source files to link with documentation in txt files? I guess I am thinking one could tangle the source with docs to produce the documentation, while the source is passed to the compiler unchanged.

replies(3): >>26289987 #>>26290144 #>>26293209 #

3. kragen ◴[28 Feb 21 01:59 UTC] No.26289987[source]▶

>>26289911 #

That's a really interesting idea! The closest things I've seen along those lines are Javadoc and its numerous bastard progeny (most notably Doxygen), which omit the "documentation in txt files" part entirely, and "shadow blocks" in Forth systems, where if I understand correctly you'd put the textual documentation a fixed number of blocks away from the source code on disk. So, if that offset were 50, code block 42 would correspond to shadow block 92, and there was a short command in the editor to jump back and forth between displaying the code and the comments (screens were too small at the time to display both at once). But I never used these systems.

4. kragen ◴[28 Feb 21 02:11 UTC] No.26290050[source]▶

>>26289195 (TP) #

About spreadsheets, I missed the editing window on this, but I wanted to point out that in addition to the plotting capabilities spreadsheets have included since at least Lotus 1-2-3 1.0A in 01983 https://www.pcjs.org/software/pcx86/app/lotus/123/1a/ you can use conditional formatting and the like to get useful algorithmic visualizations; as an example, consider http://canonical.org/~kragen/sw/dev3/minskyplot.gnumeric, which also uses a slider to allow you to alter algorithm parameters dynamically in an ObservableHQ-like way. Even with Lotus 1-2-3 on a 4.7-MHz IBM PC 5150, you could get a much quicker feedback loop for that kind of thing than you can get from reading a printed program, but it was considerably harder to share with other people.

If you want to get that kind of historical end-user programming perspective, can load the disk image at http://canonical.org/~kragen/sw/dev3/lotus-123-1a-plotsin.im... into the PCjs emulator running Lotus 1-2-3 linked above (mount it as drive B:), /FR Retrieve PLOTSIN.WKS, and type /GV to view the graph, and you can also load the .wks file from http://canonical.org/~kragen/sw/dev3/plotsin.wks into modern Gnumeric or LibreOffice Calc—but they won't display the graph. (I was also able to mount a directory containing the files from that disk image on drive B: in Dosbox and load the spreadsheet into 1-2-3—but Dosbox's CGA emulation seems to screw up on actually displaying the graph, and I think PCjs is also emulating the speed of the machine, which is an important aspect of the user experience.)

Of course spreadsheets are a pretty limited programming environment, and like modern explorable explanations, they're focused on presenting the results of the computation, or enabling you to apply it to new inputs, rather than focused on explaining the inner workings of the computation itself. But they do expose the inner workings, even if only by necessity, and for problems they can solve at all, they're often a much more convenient way to understand some algorithm than a static pile of source code.

5. akkartik ◴[28 Feb 21 02:36 UTC] No.26290144[source]▶

>>26289911 #

https://github.com/nickpascucci/verso works like this. There's a syntax for creating tags in source files, and exposition for tags lives in a separate file.

replies(1): >>26291344 #

6. akkartik ◴[28 Feb 21 03:09 UTC] No.26290328[source]▶

>>26289195 (TP) #

Here are a couple more projects that may or may not seem like Literate Programming, but are motivated squarely by its ethos: to order code for exposition, independent of what the compiler wants.

* https://github.com/snaptoken, the engine behind https://viewsourcecode.org/snaptoken/kilo. The key new feature here seems to be that fragments are always shown in context that can be dynamically expanded by the reader.

* https://github.com/jbyuki/ntangle.vim -- a literate system that tangles your code behind the scenes every time you :wq in Vim or Neovim.

* My system of layers deemphasizes typesetting and is designed to work within a programmer's editor (though IDEs will find it confusing): http://akkartik.name/post/wart-layers. I don't have a single repo for it, mostly[1] because it's tiny enough to get bundled with each of my projects. Perhaps the most developed place to check out is the layered organization for a text editor I built in a statement-oriented language with built-in support for layers: https://github.com/akkartik/mu1/tree/master/edit#readme. It's also in my most recent project, though it's only used in a tiny bootstrapping shim before I wormhole solipsistically into my own universe: https://github.com/akkartik/mu/blob/main/tools/tangle.readme.... Maybe one day I'll have layers in this universe.

[1] And also because I think example repos are under-explored compared to constant attempts at reusable components: http://akkartik.name/post/four-repos

replies(1): >>26290596 #

7. kragen ◴[28 Feb 21 04:08 UTC] No.26290596[source]▶

>>26290328 #

These are great, thank you!

8. someguydave ◴[28 Feb 21 07:19 UTC] No.26291344{3}[source]▶

>>26290144 #

yeah exactly, thank you.

I found that the leo editor does this too but I believe you must used the gui to tangle/weave, I would prefer cli for automation.

9. kmstout ◴[28 Feb 21 13:16 UTC] No.26293209[source]▶

>>26289911 #

There's a variation called "elucidative programming" [1] wherein the source is marked with "anchors" that the documentation can reference. Since source code lives in traditional source files, all the regular development infrastructure continues to work. When the source/documentation bundle is processed, the output is a two-pane coordinated view of code and discussion.

[1] http://people.cs.aau.dk/~normark/elucidative-programming/

replies(1): >>26300293 #

10. someguydave ◴[01 Mar 21 06:47 UTC] No.26300293{3}[source]▶

>>26293209 #

yeah I like it but the view of the code is a little lame if you wanted a printout. I think I would prefer comment anchors in the code which would weave into code quotes in a document.