That said, I've never loved the LaTeX-centric nature of most tools. I don't like heavier markup systems while I am writing prose, which is why I wrote SpiralWeb (https://github.com/michaeljmcd/spiralweb) as a Pandoc/Markdown centric tool.
That said, I've never loved the LaTeX-centric nature of most tools. I don't like heavier markup systems while I am writing prose, which is why I wrote SpiralWeb (https://github.com/michaeljmcd/spiralweb) as a Pandoc/Markdown centric tool.
I find that the order of diffs given by git is not optimized for helping a reviewer understand the change. Sometimes the order of files will not be in the most logical way; sometimes unrelated changes (e.g., a text editor removing blanks at the end of lines) create noise; etc.
I've been thinking that it would be interesting to have a tool where the author can take the diff of their commit(s), order them in a way that is conducive to understanding and explain each part of the diff. That'd be similar to having the author do a code walkthrough, but at the pace of the reader rather than the author.
https://martinfowler.com/bliki/SemanticDiff.html
Tools in this space date back to the 1990s. There is a recent upsurge of interest, a number of capable tools for different languages are currently available.
http://ross.net/funnelweb/tutorial/index.html
Unfortunately the only known implementation was last updated over two decades ago, and is written in pretty hard to understand C.
I asked for permission and started a repository here: https://github.com/loa-in-/fw-utf8
I currently have it unmodified there, except for disabled check for ASCII range. (this modification is included in initial commit, sorry, my bad). Otherwise code is the same.
Edit: i’m not saying that this is a solved problem. I think the parent’s point is valid. I am just saying that there are some tools that make this possible and I agree that there is a definite need for improvements in this area.
(An obvious next step is Coccinelle-style semantic patches, but let's start with sed!)
But if one is into literate programming it is definitely a must to check out the Leo Editor http://leoeditor.com
I've been interested in literate programming for a long time; for my self-bootstrapping PEG parser https://github.com/kragen/peg-bootstrap/blob/master/peg.md I wrote my own noweb-like system called HandAxeWeb in Lua (5.x) https://github.com/kragen/peg-bootstrap/blob/master/handaxew.... It accepts input in Markdown, HTML, ReStructuredText, etc., and it's only a couple hundred lines of Lua.
For HandAxeWeb (named following the convention of StoneKnifeForth—the intent was to make it simple enough that even fairly early stages of a bootstrap could be literate programs), I wanted to be able to include multiple versions of a program in the same document, because I think it's often helpful to see the temporal development of a program from a simpler version to a more complex version. The simpler version is easier to understand and helps you focus on the most fundamental aspects of the program. https://www.youtube.com/watch?v=KoWqdEACyLI is a 5'30" screencast (not by me) of explaining the development of a Pong program in this fashion. I think it's usually easier to understand a program in this fashion than by trying to understand the final complete version bit by bit, the way something like "TeX: The Program" forces you to do.
๛
Still, generally speaking, soi-disant literate programming tools—including my own—generally fail to take advantage of the most compelling aspect of the computer as a communication medium: its ability to simulate. When I dive into a new code base, it's never entirely by reading it—whether top-down, bottom-up, or in any other order. The cross-reference links added by things like CWEB (or, you know, ctags) are helpful, of course, but invariably I want to see the output of the program, which CWEB doesn't support at all! (Although Knuth's TeX: The Program does manage to include TeX output despite being written in CWEB, that's in a sense sort of a coincidence; this is not a feature CWEB can provide for any programs other than TeX and METAFONT.)
Books like Algorithms, by Knuth's student Sedgewick, are full of graphical representations of the outputs of the algorithms being discussed, and this is enormously helpful—perhaps even more so than the source code; see https://algs4.cs.princeton.edu/22mergesort/ for some examples from the current version of the book, which is lamentably in Java. It's better still, though, when you can edit the code and see the results—when diving into a new code base, I tend to execute modified versions of the code a lot, whether in the debugger or with extra logging or what. Paper books can't do this, but that's no excuse for not doing it when we're writing for readers who have computers.
Philip Guo's Python Tutor http://pythontutor.com/ provides dynamic visualization of the memory contents of, in theory, arbitrary code (in the supported languages, including of course Python, but also C, C++, Java, JS, and Ruby). There are things you can display with animation that you can't display in a static page, but Algorithms gets quite far with static printed images, and I think static visualization is better when you can make it work, for reasons explored at length in Bret Victor's http://worrydream.com/MagicInk/.
Python Tutor doesn't scale to large programs, but Dorothea Lütkehaus and Andreas Zeller's DDD can control GDB to create such visualizations for anything you can run under GDB (or JDB, Ladebug, pydb, or perl -d). Unfortunately there's no way to share the output of either DDD or Python Tutor, except maybe a screencast, and despite having been around since 01995, DDD has never been popular, I suspect because its Motif UI is clumsy to use. https://edoras.sdsu.edu/doc/ddd/article20.html shows what it looked like in 02002 and https://youtu.be/cKQ1qdo79As?t=106 shows what it looked like in 02015.
๛
Of course, spreadsheets are by far the most popular programming environment, and they have always displayed the program's output when you open it—even to the exclusion of the code, mostly. I've experimented with this sort of thing in the past, with things like http://canonical.org/~kragen/sw/bwt an interactive visualization of the Burrows–Wheeler transform, and so it's been heartening to see modern software development moving in this direction.
The simplest version of this is things like Python's doctest, where you manually paste textual snippets of output in the code itself, and a testing tool automatically verifies that they're still up-to-date; Darius Bacon's Halp https://github.com/darius/halp is a more advanced version of this, where the example output updates automatically, so you can make changes to the program and see how they affect the results.
๛
The most polished versions of this approach seem to have adopted the name "explorable explanations", and many of the best examples are Amit Patel's, which he has at different times termed "interactive illustrations" https://simblob.blogspot.com/2007/07/interactive-illustratio..., "active essays" (I think? Maybe I'm misremembering and that term was current in Squeak around 02003: http://wiki.squeak.org/squeak/1125), and "interactive visual explanations". I wrote a previous comment about this on here in 02019: https://news.ycombinator.com/item?id=20954056.
However, Amit's explorables, like many other versions of the genre, de-emphasize the underlying code to the point where they both don't display the actual code and don't let you edit it. They're intended to visualize an algorithm, not a codebase.
Mike Bostock, d3.js's original author, has created https://bl.ocks.org/ for sharing explorable explanations made with d3, and is doing a startup called ObservableHQ which makes things like this a lot easier to build: https://beta.observablehq.com/d/e639659056145e88 but at the expense of a certain amount of polish and presentational freedom. Also, unfortunately, ObservableHQ programs seem to be tied to the company's website—you can download their output, but very much unlike TeX, the programs will only be runnable until the company goes out of business. So if you aspire to make a lasting contribution to human intellectual heritage, like TeX, GCC, or d3.js itself, ObservableHQ is not for you.
๛
R Markdown (by JJ Allaire—yes, the Cold Fusion dude—and Yihui Xie, among others) is one of the more interesting developments here; as with noweb or HandAxeWeb, you edit something very close to the "woven" version of the source code (in a dialect of Markdown); but, in a separate file alongside, RStudio maintains the results of executing the code, which are included in the "woven" output, and may be textual or graphical. Moreover, as with Halp or ObservableHQ, these results are displayed in a notebook-style interface as you're editing the code. https://bookdown.org/yihui/rmarkdown/notebook.html has a variety of examples, and Xie is rightly focused on reproducibility, which is very challenging to achieve with the existing tooling. https://bookdown.org/ lists a number of books that have been written with R Markdown, and https://github.com/rstudio/rmarkdown explains the overall project.
Of course the much more common notebook-style interface, and the one that popularized the interaction style, is Jupyter (influenced by SageMath), which mixes input and output indiscriminately in the same file and peremptorily makes backward-incompatible changes in file formats; the result is a lot of friction with version-control systems. Nevertheless, it supports inline LaTeX, it's easy to use and compatible with a huge variety of existing software, and it can include publication-quality visualizations, so there's a lot of code out there in Jupyter notebooks now, far more than in any system that purports to be a "literate programming" system. Notable examples include Peter Norvig's œuvre (there's a list at https://github.com/norvig/pytudes#pytudes-index-of-jupyter-i...). I find this a very comfortable and powerful medium for this form of literate programming; recent examples include https://nbviewer.jupyter.org/url/canonical.org/~kragen/sw/de..., https://nbviewer.jupyter.org/url/canonical.org/~kragen/sw/de..., and https://nbviewer.jupyter.org/url/canonical.org/~kragen/sw/de..., which are maybe sort of embarrassingly bad but I think demonstrate the potential of the medium for vernacular expression of vulgar software, as well as lofty Norvig-type things.
๛
Konrad Hinsen has written about the reproducibility and lock-in problems introduced by Jupyter, for example in https://khinsen.wordpress.com/2015/09/03/beyond-jupyter-what..., and has been using Tudor Gîrba's Glamorous Toolkit https://gtoolkit.com/ to explore what comes next. He's been hitting the reproducibility problem pretty hard in http://www.activepapers.org/ but the primary intent there is, as with the explorable-explanations stuff, code as a means to producing research ("How should we package and publish the outcomes of computer-aided research"), rather than maintainability and understandability of code itself. I think this is a promising direction for literate programming as such, too.
My current understanding is that if write a paragraph size of comments to explain each and every part of my code with its intent it will be called literate programming.
> Unfortunately, no one has yet volunteered to write a program using another’s system for literate programming. A fair conclusion from my mail would be that one must write one’s own system before one can write a literate program, and that makes me wonder how widespread literate programming is or will ever become. This column will continue only if I hear from people who use literate-programming systems that they have not designed themselves.
And it did not continue. Since then though, it appears that Noweb (and more recently, org-babel, and somewhere in between the Leo editor) is among the literate-programming systems that have been the most successful at getting others to use them!
Separately, something amusing:
When Donald Knuth came up with "literate programming" (partly because it had been suggested to him, by Tony Hoare IIRC, that he ought to publish as a book the source of the TeX program he was rewriting, so he was led to solve the problem of exposition) and the idea of programs as literature, he made a joke (or maybe he was half-serious, hard to say):
> Perhaps we will even one day find Pulitzer prizes awarded to computer programs. (http://literateprogramming.com/knuthweb.pdf)
That does not seem likely, but reality is stranger than one can imagine: a literate computer program won an Oscar! In 2014, an Academy Award (Scientific and Technical) was given to the authors of the book Physically Based Rendering (http://www.pbr-book.org/), itself a literate program. So we have this video of the award presentation, where actors Kristen Bell and Michael B. Jordan read out the citation and one of the awardees (Matt Pharr) thanks Knuth for inventing literate programming: https://www.youtube.com/watch?v=7d9juPsv1QU
The patch format explicitly allows it to ignore "junk" information at certain points, so you can edit in comments all over the place. The format also lets you break up a diff, rearranging it semantically, and it'll get rebuilt later.
Edit, to expand on the above:
> patch tries to skip any leading garbage, apply the diff, and then skip any trailing garbage. Thus you could feed an article or message containing a diff listing to patch, and it should work..... After removing indenting or encapsulation, lines beginning with # are ignored, as they are considered to be comments.
> With context diffs, and to a lesser extent with normal diffs, patch can detect when the line numbers mentioned in the patch are incorrect, and attempts to find the correct place to apply each hunk of the patch. As a first guess, it takes the line number mentioned for the hunk, plus or minus any offset used in applying the previous hunk. If that is not the correct place, patch scans both forwards and backwards for a set of lines matching the context given in the hunk.
When I (used to) write literate programs, the document I produced would be some kind of top-down view of the functionality. I would begin by explaining the kind of problem to be solved and include motivating examples. Then I would explain the structure of the solution and start writing each piece. At the end (perhaps an appendix) I would have the parts where the pieces assembled into the structure required by the compiler.
One of the essential points of literate programming is that it lets you structure your explanation in a way that makes sense, while the literate programming tool outputs "chunks" restructured in a way that makes sense for the compiler.
Perhaps your idea of paragraph sized comments seems silly because you're not imagining something that would be complex enough to comment that way? Imagine a physics simulation. A numerical linear algebra library. Perhaps a game where there are some complex interactions between certain entities that need to be spelled out so that they next person knows what the heck is going on.
Of course there is a level of organization where people write separate design docs for everything, and some level of management has signed off on this or that... I don't think literate programming is for that level of coordination. I think it's for a smaller team, a more personal level of organization and exposition.
BTW, I am pretty sure Norman Ramsey himself has said with many modern programming languages literate programming is no longer essential. The order of presentation of functions, for example, is not constrained in Java. In the olden days (ummm, yes, I know C and its derivatives are alive and well today...), you would need to generate header files and source code, so the signature in the `*.h` file had to match in function in the `.c` file. Better to keep them adjacent in the documentation, at least. But that isn't really the way things look today, at least in my world.
If you want to get that kind of historical end-user programming perspective, can load the disk image at http://canonical.org/~kragen/sw/dev3/lotus-123-1a-plotsin.im... into the PCjs emulator running Lotus 1-2-3 linked above (mount it as drive B:), /FR Retrieve PLOTSIN.WKS, and type /GV to view the graph, and you can also load the .wks file from http://canonical.org/~kragen/sw/dev3/plotsin.wks into modern Gnumeric or LibreOffice Calc—but they won't display the graph. (I was also able to mount a directory containing the files from that disk image on drive B: in Dosbox and load the spreadsheet into 1-2-3—but Dosbox's CGA emulation seems to screw up on actually displaying the graph, and I think PCjs is also emulating the speed of the machine, which is an important aspect of the user experience.)
Of course spreadsheets are a pretty limited programming environment, and like modern explorable explanations, they're focused on presenting the results of the computation, or enabling you to apply it to new inputs, rather than focused on explaining the inner workings of the computation itself. But they do expose the inner workings, even if only by necessity, and for problems they can solve at all, they're often a much more convenient way to understand some algorithm than a static pile of source code.
Here is a video showing a literate form of Clojure:
https://www.youtube.com/watch?v=mDlzE9yy1mk
The literate program creates a new PDF and a working version of Clojure, including running a test suite. If you change the literate code and type 'make' it re-makes the PDF with the new changes and rebuilds/retests Clojure.
and here is the source:
* https://github.com/snaptoken, the engine behind https://viewsourcecode.org/snaptoken/kilo. The key new feature here seems to be that fragments are always shown in context that can be dynamically expanded by the reader.
* https://github.com/jbyuki/ntangle.vim -- a literate system that tangles your code behind the scenes every time you :wq in Vim or Neovim.
* My system of layers deemphasizes typesetting and is designed to work within a programmer's editor (though IDEs will find it confusing): http://akkartik.name/post/wart-layers. I don't have a single repo for it, mostly[1] because it's tiny enough to get bundled with each of my projects. Perhaps the most developed place to check out is the layered organization for a text editor I built in a statement-oriented language with built-in support for layers: https://github.com/akkartik/mu1/tree/master/edit#readme. It's also in my most recent project, though it's only used in a tiny bootstrapping shim before I wormhole solipsistically into my own universe: https://github.com/akkartik/mu/blob/main/tools/tangle.readme.... Maybe one day I'll have layers in this universe.
[1] And also because I think example repos are under-explored compared to constant attempts at reusable components: http://akkartik.name/post/four-repos
At the same time, it is a decent argument against the practice. Most programs are not linear in the "why" and are instead many many many competing priorities for why something was done the way it was. Moreso if you consider codebases with more than a few contributors. Especially so if they are all conceptually contributing.
Which makes sense if you think of most creative books. You will have many contributors, but the narrative is usually split between a very small number of authors. Most contributions are in supporting art, editing, or general feedback. To move programming to a similar space, would require working with contributions in a similar way. (Last is clearly an assertion.)
I have made better progress in the MP3 book. Which, I have enjoyed. Same for the Stanford Graphbase.
It is frustrating, as I am not a fan of c, all told. And I have not found any lisp literate programs. If you know of any, I'd be very interested.
It gets laughable when you have codebases that have "gone all in (functional|object oriented|any other style)" where they seem to mistake the style for the goal, which should be to solve a problem. (I say this as someone that is pretty sure I have made those mistakes.)
Knuth has an LP web page (https://www-cs-faculty.stanford.edu/~knuth/lp.html), but it looks like the examples are out of date.
Probably more useful is http://www.literateprogramming.com/; the CWEB Tool page has some examples and the PDF Articles page has ... articles.
Here's an intro from Knuth: http://www.literateprogramming.com/knuthweb.pdf
And then there's Physically Based Rendering at http://www.pbr-book.org/.
Lately I've built a faster, mostly drop-in replacement for org-babel-tangle (that doesn't unnecessarily clobber files that haven't changed); and I'm finishing up a more complete chunk formatter for HTML export, along with usable chunk index generation. Once that's done, I'll quit nerd sniping myself on literate programming systems for awhile and finish up a missive on programming a Turing machine to solve the Towers of Hanoi.
http://www.literateprogramming.com/noweb_hacker.pdf
This came in handy when I wanted syntax highlighting in a woven document.
I found that the leo editor does this too but I believe you must used the gui to tangle/weave, I would prefer cli for automation.
I setup a simple literate configuration of my init file via markdown, which worked out really well, but doing it "properly" in org-mode would be a nice evolution.
With markdown I just search for code-blocks, write them all sequentially to a temporary buffer and evaluate once done. So it is very simplistic, but also being able to write and group things is useful:
https://github.com/skx/dotfiles/blob/master/.emacs.d/init.md
The first plus was, as you point out, the extensibility of noweb, the pipeline architecture, which transforms the literate input into a documented plain text token stream, then does token stream transformation where you can insert your own transformations, like indexing, syntax highlighting, macro expansions if you wished, and then it reassembles the transformed token stream into output documents.
the other brilliant idea was to go for a minimalistic literate syntax and be language agnostic, for both the markup and the programming language.
This design decision was a focus on the absolute bare minimum, the gist of literate programming, and it still was open to all magic via user plug ins.
This decision also made noweb trivial to learn.
However. How noweb then chose to move to "icon" as scripting and extension language escapes me.
In my book, that was the design decision that killed it. And the rewrite to noweb3, lua based, remained in eternal 'beta'.
and LP as a whole always struggled with IDE / editor support.
literate programming as a discipline could resurrect with the advent of language server protocol. that might make literate programming accessible to contemporary IDEs again.
- I had to take care of writing each Python code chunk with the amount of indentation appropriate for where it had to end up, since Noweb does (did?) not respect relative indentation of chunks when tangling.
- Debugging the resulting script was more painful than plain Python sources, as all the debugging info (line numbers, etc.) referred to the tangled code and not to the actual noweb source file I was editing.
[^1]: Looking at the website, it doesn't seem to have changed much since then.
- knot [1]: tangles source code from a text file formatted using plain markdown syntax, can use any markdown converter for weaving into a printable document
- snarl [2]: extends markdown code blocks with syntax used for tangling, its "weave" steps just removes the additional syntax and outputs plain markdown
- pylit [3] [4]: a bidirectional converter: code to formatted text and back. Uses reST for formatting, and preserves line numbers which is useful when debugging. Not an LP tool strictly, as it doesn't define/rearrange code blocks so you have to write your script in the order the compiler wants it, not in the order that would make the best exposition.
Both seem to preserve relative indentation of chunks, so would be useful for Python too.
[1]: https://github.com/mqsoh/knot [2]: https://blog.oddbit.com/post/2020-01-15-snarl-a-tool-for-lit... [3]: https://github.com/gmilde/PyLit [4]: https://github.com/slott56/PyLit-3
I originally tried Emacs org-mode babel, but it didn't really fit the 'batch pipeline' flow I wanted.
[1] http://people.cs.aau.dk/~normark/elucidative-programming/
I still use it from time to time, especially for small, well defined projects, because I find it useful to have to argue with myself when designing a software. It's not so much about producing a nice documentation or a proper exposition of some idea, than it is about having to formulate all the reasoning, the alternatives, and the choices.
[0]: https://github.com/rixed/portia [1]: http://rixed.github.io/portia/