As with most things, don’t be dogmatic.
As with most things, don’t be dogmatic.
I imagine the biggest hurdle on the path towards adopting this is writing down clear, readable prose using highly technical language. And naming things. Using ambiguous human language to describe a complex algorithm without causing a conflict in a big team.
https://www.cs.tufts.edu/~nr/cs257/archive/literate-programm...
https://www-cs-faculty.stanford.edu/~knuth/lp.html
Knuths intention seems clear enough in his own writing:
Literate programming is a methodology that combines a programming language with a documentation language, thereby making programs more robust, more portable, more easily maintained, and arguably more fun to write than programs that are written only in a high-level language. The main idea is to treat a program as a piece of literature, addressed to human beings rather than to a computer.
and
Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.
- A literate program has code and documentation interleaved in one file.
- Weaving means extracting documentation and turning it into e.g. a pdf.
- Tangling means extracting code in a form that is understandable to a compiler.
A crucial thing to actually make this paradigm useful is the ability to change around the order of your code snippets, i.e. not letting the compiler dictate order. This enables you to code top-down/bottom-up how ever you see fit, like the article mentioned. My guess on why people soured on literate programming is that their first introduction involved using tools that didn't have this ability (e.g. jupyter notebooks). Also, you usually lose a lot of IDE features: no go-to-definition, bad auto-complete, etc.
IMO, the best tool that qualifies for proper literate programming is probably org-mode with org-babel. It's programming language agnostic, supports syntax highlighting and noWEB for changing around order. Of course it requires getting into the Emacs ecosystem, so it's destined to stay obscure.
However, I don't know what this metalanguage should be. I don't know how to translate typical comments (or a literate program) into some sort of formal language. I think we have a gap in philosophy (epistemology).
I’ve written code for many years, with Doxygen/Jazzy/docc in mind (still do[0]). I feel that it’s a big help.
It depends. If you want to learn faster, you should be dogmatic: "In der Beschränkung zeigt sich erst der Meister." If you want to become a better programmer, please do set extra challenges (fe pure lazy functional progamming only, pure literate programming, ...)
Literate programming is, in my opinion, only used very seldomly because keeping an accurate big picture view of a program up to date is a lot of work. It fits with a waterfall development process where everything that the program is supposed to do is known beforehand. It fits well with education. I think it is no coincidence that it was brought to prominence by D.E. Knuth who is also very famous as an educator.
I find it weird to not be able to find linux source code and commentaries or even math/physics/science masterpieces in libraries where you can find Finnegan's Wake easily (at least where do I live), and not be able to talk about the GHC in between two discussion about romance or the weather at the bakery.
Maybe a tool like Rational Rose is more along those lines.
I’ve always been a proponent of writing code in a manner that affords analysis, later. That’s usually more than just adding headerdoc.
i do similar thing which i call live-sketching.. a mostly-no-content python namespace-hierarchy of module(s) and classes (used as just namespace holders), and then add (would-do-somehing) "terminal" methods, and combine-those-into-flows actual "procedures" methods , here and there .. until the "communication" diagram starts appear out of it, and week after week, fill the missing parts. It feels like some way of writing executable spec over imagined/fake stuff, and slowly replacing the fakes with reals. Some parts never get filled. Others are replaced with big-external-pieces - as-long-as matching the spec needed. What's left is written by hand.. and all this maybe multiple cycles.
This approach allows for both keeping the knowledge of what the system should do - on the spec / hierarchical level - and freedom to leave things undone, plug some external monster, or do-it-yourself as one sees fit. The downside is that the plumbing between pieces might be bigger/messier than the pieces - if you have ever seen the spiderweb of wires above a breadboard with TTL ICs..
e.g. for my Last project - re-engineering a multiple-aging-variants of kiosk-system into coherent single codebase that can spawn each/most of the previous - took me 6 months to turn a zoo of 20x 25KLoc into single 20Kloc +- 5 for the specializations - and the code-structure still preserves the initial split-of-concerns (some call it architecture), and comms "diagram", who talks to who when/why.
But yeah, it's not for faint-hearted, and there little visibility of the amount of work going/done, as the structure at day 1 is more or less the structure at day 181, and management may decide to see only that..
>> - Weaving means extracting documentation and turning it into e.g. a pdf.
>> - Tangling means extracting code in a form that is understandable to a compiler.
Interesting. i have made a few times DomainSpecific-"languages" - like for chips-module-testing , or for HR-payroll stuff - expressed in some general language with an engine underneath, which allowed for both turning/rendering the DS-"code" into various machine-readable outputs - verilog, labview, .. - as well as various documentation formats. Essentially a self-contained code-piece-with-execution/s-and-documentation/s, with the feature to "explain" what goes on, change-by-change.
Never knew it might be called literate programming.
TeX was "proven" as a text/typography tool by the fact that the source code written in WEB (interleaving pascal and TeX (this is meta (metacircular))) allows for you to "render" the program as a typographed work explaining how TeX is made+ run the program as a mean to create typographic work.
I'm lacking the words for a better explanation of how do I feel sbout the distinction, but in a sense I would say that notebooks are litterate scrips, while TeX is a litterate program ? (The difference is aesthetical)
However, the final effect is spaghetti code (you can surrogate “goto” by injecting code in different locations.) And docs are hard to read.
But, it really forces you to explain what you do and how you got there, which is incredibly useful for reconstructing history. (Theirs is also a sort of diff file for it, I think with .ch extension, to amend files.)
One problem with "literate programming" is it assumes that good coders are also good writers, and the good writers are also good coders.
Another problem is that the source files for the production code will have to be "touched" for documentation changes. Which IMHO is an absolution no-no for production code. Once the code has been validated, no more edits! If you want to edit docs, go ahead, just don't edit the actual source.
- day 5's solution for example: https://aoc.oppi.li/2.3-day-5.html#day-5
- literate haskell source: https://tangled.org/oppi.li/aoc/blob/main/src/2025/05.lhs
the book/site is "weaved" with pandoc, the code is "tangled" with a custom markdown "unlit" program that is passed to GHC.
https://github.com/nickpascucci/verso
I actually wish for a tool that would use two things: 1) navigate code like a file system: Class/function/lines [3..5]
2)allow us to use git commit revisions so that we could comment on the evolution of the code
So far the only thing capable has been leoEditor + org-babel
In my opinion this is the most practical approach for real world projects. You get benefits like avoiding outdated documentation without huge upfront costs.
That one statement is a great concise explanation/motivation for "literate programming".
Explanations with code, that explain code design choices, in a way that enables the code to be understood better, and the ideas involved to be picked up and applied flexibly to reading and writing other code.
Another way to view it is: Developers are "compilers" from ideas to source. Documenting the ideas along with the "generated" source, is being "open source" about the origin and specific implementation of the source.
https://www.goodreads.com/review/list/21394355-william-adams...
Not sure where the author got the contention that there are only a few tools for literate programming --- it's a straight-forward enough task that many programmers do this --- heck, even I managed to (w/ a bit of help on tex.stackexchange): https://github.com/WillAdams/gcodepreview/blob/main/literati... --- if it were more complex, and wasn't so implementation-specific (filenames need to be specified in multiple places), I'd write it up as a Literate Program and put it up on CTAN as a package.
One classic bit of advice for writing is, ‘It is perfectly okay to write garbage as long as you edit brilliantly.’ --- the great thing about a Literate Program is that it makes the act of editing far simpler, which has made feasible every program I've ever written which got past the 1K lines mark --- including an AppleScript for InDesign which Olav Martin Kvern, then the "Scripting Evangelist" for Adobe Systems declared to be impossible (my boss had promised a system for creating a four-level deep index from XML embedded in the text of pages in an InDesign document, while OMK averred that it was impossible to create an index entry for more than the main level of the index --- one has to have code which tracks the existence of an entry at each level of the index and where it does not exist, starting at the top-level, insert it, then work down and add the sub-index-entry to the index-entry it is beneath).
Here's Jeremy Howard explaining why he loves doing everything in notebooks: https://www.youtube.com/watch?v=9Q6sLbz37gk
https://leo-editor.github.io/leo-editor/
https://kaleguy.github.io/leovue/#/t/2
https://ganelson.github.io/inweb/inweb/index.html
Inform 7 is arguably one of the largest programs ever written in literate style.
Responding directly to a couple things the author wrote:
> When programming, it’s not uncommon to write a function that’s “good enough for now”, and revise it later. This is impossible to adequately do in literate programming.
It's not impossible in literate programming. There's nothing about LP that impedes this, I do it all the time. I have a quick obvious implementation (perhaps a naive recursive solution) and throw it in to get things working. I revisit it later when I need to make that naive recursive one faster (memoization, DP, or just another algorithm all together). It's no harder than what I'd do with an ordinary approach to programming.
> Unit testing is not supported one bit in WEB, but you can cobble something together in CWEB.
WEB was designed for use with Pascal and CWEB for C and C++. At the time the tools were developed, "unit testing" as it means today was not really a widespread thing. Use other tools if you find that WEB is impeding your use of unit tests in your Pascal programs. With other tools (org-mode and org-babel are what I use), it's easy to do. Like with writing good enough functions, you just do it, and it's done. You write a unit test in a block of code and when it gets tangled you execute your unit tests. This can be more cumbersome in some languages than with others, but in Python it's as easy as:
#+BEGIN_SOURCE python :noweb yes :tangle test/test_foo.py
from hypothesis import ...
from pytest import ...
<<name_of_specific_test>>
<<name_of_other_test>>
#+END_SOURCE
#+NAME: name_of_specific_test
#+BEGIN_SOURCE
def test_frob(...):
...
#+END_SOURCE
When I used LP regularly I had a little script I wrote that would tangle source from my org files, and because I had the names and paths specified everything would end up in the right place. This is followed by running `pytest` (or whatever test utility) as normal. I used this in makefiles and other scripts. This is only slightly harder than the normal approach, but not hard. I added a `tangle` step into my build and test process and it was good to go.If your unit test system requires more ceremony then you'll need to include that as well, but you'd have to include that in your conventionally written code as well.
As I noted elsethread, the big thing which Literate Programming has netted me is that it makes editing easier/manageable, even for long and complex projects spread across multiple files --- having the single point of control/interaction where I can:
- make the actual change to the code to implement a new feature
- change the optional library which exposes this project to a secondary language
- update the documentation to note the new interface
- update the sample template files (one for the main implementation, the other for the secondary) to reflect the new feature
- update an on-going notes.txt file where the need for the new feature was originally noted
is _huge_ and ensures that no file is missed in the update.
Perhaps you're thinking of mathematics.
If you have to be able to represent arbitrary abstract logical constructs, I don't think you can formalized the whole language ahead of time. I think the best you can do is allow for ad-hoc formalization of notation while trying to keep any newly introduced notation reasonably consitent with previously introduced notation.
it would probably also semi-weave the source into a standard, say, markdown or latex or asciidoc and proxy that LSP server on those woven files.
(Having said that, I firmly hold the opinion that we should all be writing READMEs in HTML[2][3] (instead of Markdown) and more fully exploring/exploiting the capabilities—and ubiquity—of web browsers to enable "smart documentation": self-contained (i.e. single-file) study aids, visualization widgets, etc[4].)
1. <https://www.teamten.com/lawrence/programming/write-code-top-...>
2. <https://hn.algolia.com/?dateRange=all&type=comment&prefix=tr...>
https://backbonejs.org/docs/backbone.html https://github.com/jashkenas/backbone/blob/master/backbone.j... https://ashkenas.com/docco/
A good start would be just commenting code! Almost all the code I've looked into recently has been startling - the only comments are the licence boilerplate at the top of each file!
I can think of only one product/library/package that was commented to explain what was happening. Go look at the source for a random package that you depend on. If you're really lucky, there might be something hinting at the meaning of function arguments, but like as not, not even that ;(
From the README "Inform is itself a literate program (written with inweb), one of the largest in the world. This means that a human-readable form of the code is continuously maintained alongside it: see Inform: The Program"
For something as complex as the Linux kernel, there is no single document that is going to explain the entire system to anyone who reads it. For a start, different people need different levels of explanation. Someone fresh out of a JavaScript bootcamp is going to need a very different guide to Linux than someone who's spent years working on the Windows kernel and just needs to know what's different and what's the same. Moreover, the further a person is from understanding how the Linux kernel works, the more iterative the explanation will need to be: first setting up the broad concepts, then explaining these concepts in more detail, then clarifying these details with more precise examples, and so on. If these layers of explanations are bound to code, then the person who needs less of an explanation will end up skipping parts of the codebase (assuming they let themselves be guided by the literate documentation). If the explanation is not bound to the code, then that's not really literate programming, it's just documentation.
The other issue is that even two different people with similar levels of skill will often want things explained in different ways. Partly, that's going to be things like the analogies they're used to, and partly that's going to be a question of what they need from the explanation. A document "The Linux Kernel for the Data Scientist" will probably look very different from "The Linux Kernel for the Systems Engineer", and both will be different again to "The Linux Kernel for Project Managers". A huge part of technical writing is understanding precisely who your audience is, and in literate programming, your audience kind of becomes "everyone", which is too large an audience. The advantage of separating code and documentation is that you can write your code for a much more restricted set of readers, but provide a bunch of different additional guides that are each aimed more precisely at a target audience.
I think literate programming can work for programs that are primarily intended as tools for teaching (because then the whole application is designed to be read by a specific target audience, and can be written from that perspective), but for general-purpose applications, particularly more complex ones like the Linux kernel, are better served by separating out the different documentation concerns.