C4 – C in 4 functions

1. abecedarius ◴[04 Nov 14 21:23 UTC] No.8559116[source]▶

On a first skim, this looks really nice; complaints that it's unreadable are unfounded. The background that makes it readable are Wirth's Compiler Construction http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf plus precedence climbing http://en.wikipedia.org/wiki/Operator-precedence_parser#Prec...

replies(6): >>8559784 #>>8559904 #>>8560891 #>>8560993 #>>8561018 #>>8561057 #

2. dabockster ◴[04 Nov 14 23:53 UTC] No.8559784[source]▶

>>8559116 (TP) #

complaints that it's unreadable are unfounded

Not exactly. You have to remember that language and compiler design require a LOT of work and experience to understand, and that many programmers will only see this as, frankly, spaghetti.

I think it could have used some more block comments, but that's just me.

replies(5): >>8559908 #>>8560200 #>>8560318 #>>8561169 #>>8561481 #

3. ◴[05 Nov 14 00:31 UTC] No.8559904[source]▶

>>8559116 (TP) #

4. abecedarius ◴[05 Nov 14 00:32 UTC] No.8559908[source]▶

>>8559784 #

I'm writing a chapter for AOSA on something like this: a self-hosting compiler of a subset of Python to Python bytecode. It'll present the full code (about the same length) and try to explain it well for people not into compilers yet, but in the meantime I recommend the Wirth book.

What's your favorite shorter intro? I'd especially like to reference other educational compilers that are self-hosting.

replies(1): >>8560123 #

5. metaobject ◴[05 Nov 14 01:47 UTC] No.8560123{3}[source]▶

>>8559908 #

So, there is another version of Architecture of Open Source Applications in the works? I love the first two editions - they are unique books that really are gems. Are you able to divulge what projects are covered in the new version?

replies(1): >>8560333 #

6. anigbrowl ◴[05 Nov 14 02:14 UTC] No.8560200[source]▶

>>8559784 #

I had the same first instinct, but given that a) it's very very tidy code and b) if you want to understand the inner workings of a compiler then you really do need to figure this out, I decided on review that it's basically self-documenting.

Of course figuring out what it's doing is one thing - understanding why it is done in this particular way is another, and while I was able to find my way around fairly quickly I'd cry if I had to re-implement it. I do love how small it is though, that gives it great educational value.

replies(1): >>8560237 #

7. blinks ◴[05 Nov 14 02:27 UTC] No.8560237{3}[source]▶

>>8560200 #

> understanding why it is done in this particular way is another

Isn't that the reason for comments in the first place?

replies(1): >>8560267 #

8. anigbrowl ◴[05 Nov 14 02:39 UTC] No.8560267{4}[source]▶

>>8560237 #

I look to comments to tell me what a block of code is doing rather than why, eg 'Performs a Discrete Cosine Transform on the contents of the buffer' or 'Bubble sort algorithm rearranges the records in at least as much time as required to enjoy a nice cup of tea.'

The 'why' of a very low-level tools like this is the sort of thing that needs to be explored at length in a paper or (in this case) a book, otherwise they'll swamp the actual code. Sometimes as a learning exercise I'll take something like this and comment the hell out of everything, but the value there is more in writing the comments than trying to read them again later. Of course this is very much a matter of personal taste.

replies(1): >>8560401 #

9. ezy ◴[05 Nov 14 02:57 UTC] No.8560318[source]▶

>>8559784 #

Well, the goal is for it to be minimal, and C doesn't actually do a lot of work for you. On reading it, the algorithm it uses to parse was of less interest than the various tricks it uses to initialize and manage the state of the parser in a compact way.

10. abecedarius ◴[05 Nov 14 03:02 UTC] No.8560333{4}[source]▶

>>8560123 #

https://github.com/aosabook/500lines and I'm looking forward to reading it too. :)

I'm tempted to link to my draft chapter, but though the code is essentially done the text needs a lot of work.

11. aeonsky ◴[05 Nov 14 03:26 UTC] No.8560401{5}[source]▶

>>8560267 #

I am a junior dev without a ton of experience so correct me if I'm wrong, but I strongly disagree. Comments should explain "why" something was written. Wouldn't the function name indicate what you are doing (and comments in the function)? This is especially true in business logic.

reverseNaturalSortOrder(listOfItems); // case sensitive sort of items by reverse alphabetical order

reverseNaturalSortOrder(listOfItems); // sort this way because the upper layer expects items in reverse order since user requested it

I think it is usually significantly easier to understand what something is doing rather than why it is doing that. To answer the former it usually requires a narrow scope of focus, but the latter requires a very broad scope.

replies(4): >>8560423 #>>8560448 #>>8560941 #>>8561313 #

12. anigbrowl ◴[05 Nov 14 03:40 UTC] No.8560423{6}[source]▶

>>8560401 #

Sure, I'm not arguing against ever saying why you'd do something :) I just like descriptive comments because it speeds up the business of figuring the high level structure of the code - you're right that understanding exactly what's going on can be the most difficult part, but for me that answer usually falls out as I build a mental model of what it's doing. On the other hand some algorithms need in-depth explanation that's beyond the scope of comment (hence my example of the transform).

Mind, I mostly do hobby/experimental programming, I've just been doing it for a long time. So I'm not commending this as a work practice or anything - your point about business logic made good sense.

13. DSMan195276 ◴[05 Nov 14 03:52 UTC] No.8560448{6}[source]▶

>>8560401 #

I agree with you completely - The code explains what you're doing, comments explain why you did it that way. Ideally, any comments that explain what you're doing would end-up being redundant when looking at the code.

I think this code details a special case of the above though, in that it comments what the enums are instead of just naming the enums. I give that a pass strictly because this code needs to be able to compile itself, and I don't think it supports named enums, so the comment was necessary to make up for that.

replies(1): >>8561095 #

14. shangxiao ◴[05 Nov 14 07:53 UTC] No.8560891[source]▶

>>8559116 (TP) #

It is unreadable. Lumping code together into big functions just so you can say "Look I've created a compiler in 4 functions" is pointless, unless your goal is to post it to HN to show everyone how clever you are.

This is not how you code when you work in a team or when you know some other poor soul has to come along and maintain it.

I suggest you take a look at [1] then go and read this excellent book by Martin Fowler: "Refactoring: Improving the Design of Existing Code" [2]

[1] https://en.wikipedia.org/wiki/Code_smell

[2] http://www.goodreads.com/book/show/44936.Refactoring

replies(3): >>8560936 #>>8560974 #>>8561004 #

15. RubyPinch ◴[05 Nov 14 08:20 UTC] No.8560936[source]▶

>>8560891 #

I have a feeling that the project is a mix of for-fun and to-see-if-its-possible.

Not all programming is enterprise quality, some programming is intentionally not.

Being so dismissive of that, seems a little silly to me.

The demoscene doesn't exist because of a bunch of programmers trying to make the lives of every other programmer more difficult, it exists because people like the challenge.

replies(1): >>8561661 #

16. tdsamardzhiev ◴[05 Nov 14 08:23 UTC] No.8560941{6}[source]▶

>>8560401 #

Agreed - code should explain what code does. Duh.

17. userbinator ◴[05 Nov 14 08:37 UTC] No.8560974[source]▶

>>8560891 #

This also is not code that would need to be written by a team.

The functions may be big, but they also don't have all that much duplicate code inside them. The choice of 4 functions isn't arbitrary either - they nicely divide the problem into:

- next() - splits the source code into a series of tokens

- expr() - parses expressions

- stmt() - parses statements

- main() - starts the processing of the source, and also contains the main interpreter VM's execution loop

Code generation is integrated into the parsing, since it's generating code for a stack-based machine and that also very nicely follows the sequence of actions performed when parsing.

In fact I'm of the opinion that the obsession with breaking up code into tiny pieces (usually accompanied by the overuse of OOP) is harmful to the understanding of the program as a whole since it encourages looking at each piece independently and misses "seeing the forest for the trees".

In contrast, this is code that is designed to be easily read and understood by a single person, showing how very simple an entire compiler and interpreter/VM can be. It doesn't attempt to hide anything with thick layers upon layers of abstraction and deep chains of function calls, but instead is the "naked essence" of the solution to the problem.

Someone used to e.g. enterprise Java may find this style of code quite jarring to their senses, but that's only because they've grown accustomed to an environment in which everything is highly-abstracted and indirect, hiding the true nature of the solution. Personally, I think the simplicity and "nakedness" of this code has a great beauty to it --- it's a functional work of art.

replies(1): >>8561622 #

18. pvidler ◴[05 Nov 14 08:48 UTC] No.8560993[source]▶

>>8559116 (TP) #

Most of it was readable, but the printf on lines 57--59 made me retch. I see what it's doing, but it's not what I'd call easily maintainable:

  printf("%8.4s", &"LEA ,IMM ,JMP ,JSR ,BZ ,BNZ ,ENT ,ADJ ,LEV ,LI ,LC ,SI ,SC ,PSH ,"
         "OR ,XOR ,AND ,EQ ,NE ,LT ,GT ,LE ,GE ,SHL ,SHR ,ADD ,SUB ,MUL ,DIV ,MOD ,"
         "OPEN,READ,CLOS,PRTF,MALC,MSET,MCMP,EXIT,"[*++le * 5]);

replies(1): >>8561345 #

19. Confusion ◴[05 Nov 14 08:55 UTC] No.8561004[source]▶

>>8560891 #

I suggest you consider how someone could know of, and understand, your suggestions very well, while still having the opinion he does. Don't underestimate other people.

20. PhasmaFelis ◴[05 Nov 14 09:06 UTC] No.8561018[source]▶

>>8559116 (TP) #

> On a first skim, this looks really nice; complaints that it's unreadable are unfounded.

Man, I can't even tell what this is supposed to be. My confusion is entirely founded. My thought process with articles like this goes something like "C in four functions, huh? Sounds like it could be clever. I'll just click and read the explanation... Oh, there isn't an explanation. Well, maybe this file will explain things! ...Nope, it's 500 lines of mostly-uncommented if-else statements. Maybe it's a compiler? I dunno!"

I'm sure there's a subset of the programming community for whom this is crystal clear on first sight, and that's great; but there's a lot more of us who could probably get the joke with a few hints, so it would be nice if you'd help out instead of declaring that if you understand it, it must be easy.

replies(2): >>8561173 #>>8563933 #

21. tomp ◴[05 Nov 14 09:24 UTC] No.8561057[source]▶

>>8559116 (TP) #

> complaints that it's unreadable are unfounded

    int *a, *b;
    int t, *d;

I can't really see how anyone can say that code that uses one-letter variable names (with the exception of the "standard ones", whose meaning is defined at the top) is readable.

replies(1): >>8563882 #

22. MrTortoise ◴[05 Nov 14 09:40 UTC] No.8561095{7}[source]▶

>>8560448 #

Its not that simple though, error fixes and edge cases often obfuscate something that was understandable. A why comment is never bad, but a what comment is often as valuable as a test

replies(2): >>8563011 #>>8563579 #

23. qznc ◴[05 Nov 14 10:13 UTC] No.8561169[source]▶

>>8559784 #

Compilers are also a very well understood topic. If you have seen (and understood) a recursive decent parser with precedence climbing, the code looks as expected. It is a pretty straightforward implementation.

24. boomlinde ◴[05 Nov 14 10:15 UTC] No.8561173[source]▶

>>8561018 #

In the sense that some people won't have an idea of what's going on, this community altogether isn't particularly inclusive at all. Personally, I really don't want the topics this site covers to cater to a lowest common denominator, and I'm sure that isn't what you had in mind either, but that's the effect of taking "more of us" to mean more than you personally.

replies(2): >>8561408 #>>8561638 #

25. chipsy ◴[05 Nov 14 11:10 UTC] No.8561313{6}[source]▶

>>8560401 #

There are three levels to consider:

1. Readability for modification

2. Readability for the "what"

3. Readability for the "why"

All human-readable description in code is there to make the difference between having a piece of documentation pointing you in the right direction, and having to reverse engineer your understanding. It's an optimization problem with definite tradeoffs. Although CS professors and many tutorials will tend to encourage you towards heavy description, over-description creates space for inconsistent, misleading documentation, which is worse than "not knowing what it does."

When you see code that is dense and full of short variables, it's written favorably towards modification. It is relying on a summary comment for "what" it does, and perhaps on namespaces, scope blocks, or equivalent guards to keep names from colliding. Such code lets you reach flow state quickly once you've grokked it and are ready to begin work, because more of it fits onscreen, and you're assured the smallest amount of typing and thus can quickly try and revise ideas. The summary often gets to stay the same even if your implementation is completely different. And if lines of code are small, you enjoy a high comments-to-code ratio, at least visually.

Code that builds in the "what" through more descriptive variable names pays a price through being a little harder to actually work in, with big payoffs coming mostly where the idea cannot be captured and summarized so easily through a comment.

In your example, one might instead rework the whole layout of the code so:

    var urq; /* user request type */

    ... (we build list "l") 
    
    /* adjustments for user request */
    {
      if (urq=="rns") { /* case sensitive sort by reverse alphabetical order */ ... }
    }

If you aren't reusing the algorithm, inline and summarize. And if you're writing comments about "what the upper layer expects", then you(or your team) spread and sliced up the meaning of the code and created that problem; that kind of comment isn't answering a "why," it's excusing something obtuse and unintuitive in the design, and is a code smell - the premise for that comment is hiding something far worse. If the sequence of events is intentionally meaningful and there's no danger of cut-and-paste modifications getting out of sync, it doesn't need to be split into tiny pieces. Big functions can be well-organized and easy to navigate just with scope blocks, comments, and a code-folding editor.

"Why" is a complicated thing. It's not really explainable through any one comment, unless the comment is an excuse like the example you gave. The whole program has to be written towards the purpose of explaining itself(e.g. "literate programming"), yet most code is grown organically in a way that even the original programmers can only partially explain, starting with a simple problem and simple data and then gradually expanding in complexity. Experience(and unfamiliar new problem spaces) eventually turns most programmers towards the organic model, even if they're aware of top-down architecture strategies. Ultimately, a "why" has to ask Big Questions about the overall purpose of the software; software is a form of automation, and the rationale for automating things has to be continuously interrogated.

26. manish_gill ◴[05 Nov 14 11:23 UTC] No.8561345[source]▶

>>8560993 #

I'd like to know why this was downvoted, and if people who down voted it can explain what the code is doing please?

replies(1): >>8561499 #

27. chris_wot ◴[05 Nov 14 11:59 UTC] No.8561408{3}[source]▶

>>8561173 #

Actually, a lot of us probably don't understand what this is doing. I sure don't, but I really don't consider myself "lowest common denominator" either. I come here to learn, to be honest!

replies(1): >>8569378 #

28. sklogic ◴[05 Nov 14 12:31 UTC] No.8561481[source]▶

>>8559784 #

Basically, what you're saying is that any toy compiler example should be accompanied with a copy of the Dragon Book.

replies(1): >>8561609 #

29. maffydub ◴[05 Nov 14 12:38 UTC] No.8561499{3}[source]▶

>>8561345 #

It's taking an integer (representing an operation) and printing out the name of that operation.

First thing to say is that "* ++le" is the integer representing the operation to perform. This basically walks through the array of instructions returning each integer in turn.

Starting at the beginning of the line, we have "printf" with a format string of "%8.4s". This means print out the first 4 characters of the string that I pass next (padded to 8 characters). There then follows a string containing all of the operation names, in numerical order, padded to 4 characters and separated by commas (so the start of each is 5 apart). Finally, we do a lookup into this string (treating it as an array) at offset "* ++le * 5", i.e. the integer representing the operation multipled by 5 (5 being the number of characters between the start of each operation name). Doing this lookup gives us a char, but actually we wanted the pointer to this char (as we want printf to print out this char and the following 3 chars), so we take the address of this char (the & at the beginning of the whole expression).

It's concise, but not exactly self-documenting.

Does that make sense?

(I didn't downvote.)

replies(1): >>8561631 #

30. WhitneyLand ◴[05 Nov 14 13:17 UTC] No.8561609{3}[source]▶

>>8561481 #

You seem to suggest that a background in compiler theory is somehow table stakes for commenting on HN. Since many here are not developers, and many developers don't have a CS degree, a few contextual comments seem appropriate.

replies(2): >>8561761 #>>8561872 #

31. shangxiao ◴[05 Nov 14 13:19 UTC] No.8561622{3}[source]▶

>>8560974 #

Breaking up code is more than just removing redundancy - it's about exactly what you have written: "Encouraging looking at each piece independently". That actually has advantages, the main being that it is easy to understand how each piece operates independently of the other pieces - simply cohesion & coupling which is applicable to any language, not just "Java". I don't believe in my experience that I miss "seeing the forest for the trees" - I encourage you to try it, perhaps by learning some functional programming.

There is one obsession that I am tired of is people posting "X awesome thing in 120 lines of JavaScript" or "Y in 4 functions". Just because the problem is reduced to small as possible metric, it doesn't make it good.

PS: Also you mention negatively connotated terms like "OOP", "Java", "thick layers of abstraction" and "deep chains of function calls" as if you've ascertained that I'm some enterprise developer that doesn't have any C experience and wouldn't know simplicity if it bit me in the ass.

32. peterfirefly ◴[05 Nov 14 13:23 UTC] No.8561631{4}[source]▶

>>8561499 #

How is that not self-documenting if one knows C?

replies(1): >>8562807 #

33. WhitneyLand ◴[05 Nov 14 13:26 UTC] No.8561638{3}[source]▶

>>8561173 #

The logical leap from adding a "few hints" to everything becomes "lowest common denominator" is the size of the Grand Canyon.

replies(1): >>8567437 #

34. shangxiao ◴[05 Nov 14 13:35 UTC] No.8561661{3}[source]▶

>>8560936 #

Of course this occurred to me, my comment was more of an emotional response to "complaints that it's unreadable are unfounded".

If quite a few people are saying that it's unreadable, you don't just dismiss them as making "unfounded" statements.

> The demoscene doesn't exist because of a bunch of programmers trying to make the lives of every other programmer more difficult, it exists because people like the challenge.

I'm pretty sure that this doesn't need to be stated.

35. sklogic ◴[05 Nov 14 14:06 UTC] No.8561761{4}[source]▶

>>8561609 #

You seem to suggest that a niche, minimalistic toy example should always be accessible to non-developers.

replies(1): >>8561821 #

36. WhitneyLand ◴[05 Nov 14 14:23 UTC] No.8561821{5}[source]▶

>>8561761 #

It's false dichotomy that you either get it or you don't. People will access things according to their ability. Where to draw the line on being inclusive? If some has genuine curiosity and motivation to ask a question and learn, then providing a few lines of overview doesn't clutter the board much and can be a positive contribution.

replies(1): >>8561881 #

37. ctdonath ◴[05 Nov 14 14:34 UTC] No.8561872{4}[source]▶

>>8561609 #

If you're commenting about a remarkably clever example of an obscure topic which requires prolonged study to understand, then yes I'd suggest that a background in _______ is somehow table stakes for commenting on a focused discussion of _______ on HN.

"Toy examples" are often the result of long & deep study and practice of a subject, creating something profound which casual observers are not entitled to instantly understand. In this case, it's a very clever compiler: everybody understands this summary, and if you want "a few contextual comments" beyond the source code itself then you know where to get enough information to learn what you need to understand this.

If you don't "get it", and don't want to "get it" on your own, it's not for you.

38. sklogic ◴[05 Nov 14 14:36 UTC] No.8561881{6}[source]▶

>>8561821 #

Exactly, where to draw a line? Explaining a concept of abstract and virtual machines may take a few pages of a dense text, explaining how to parse expressions with precedence may require dozens of pages, explaining C types will add a few more.

So, yes, it's either you're curious enough to dig into a code and find the relevant explanations somewhere else (the said Dragon Book and alike), or you won't get it, regardless of how comprehensive comments are.

39. maffydub ◴[05 Nov 14 17:06 UTC] No.8562807{5}[source]▶

>>8561631 #

I think you and I might disagree on the meaning of self-documenting. ;)

replies(1): >>8563208 #

40. DSMan195276 ◴[05 Nov 14 17:40 UTC] No.8563011{8}[source]▶

>>8561095 #

Like I noted with my special case, it's not always that simple, but I routinely find the best commented code to be code which was written with the comments explaining why and the code explaining what. There are definitely time where a what comment is warranted, but it's just not the general case.

41. peterfirefly ◴[05 Nov 14 18:10 UTC] No.8563208{6}[source]▶

>>8562807 #

I don't think we really do.

It is impenetrable black magick if one "knows" C -- but quite clear if one /actually/ knows C.

replies(1): >>8563590 #

42. moron4hire ◴[05 Nov 14 19:06 UTC] No.8563579{8}[source]▶

>>8561095 #

That should almost never be the case. If you find this to be a frequent occurrence, then the code base in which you are working is not designed for the problem domain to which it is being applied.

43. moron4hire ◴[05 Nov 14 19:09 UTC] No.8563590{7}[source]▶

>>8563208 #

Ah, the No True Scotsman finally arrives to the party.

replies(1): >>8564632 #

44. abecedarius ◴[05 Nov 14 19:54 UTC] No.8563882[source]▶

>>8561057 #

OK, 'unfounded' was a little too strong. I'd change some things myself, including comments on those declarations; but if this looks like a code-golf game to you, it's not, it's a style you're not used to.

45. abecedarius ◴[05 Nov 14 20:00 UTC] No.8563933[source]▶

>>8561018 #

Yes, I'm sorry: I meant to defend this code from the charge of being pointless code golf, and inadvertently disparaged people without the background to enjoy reading it. It really is hard to follow without that background, which lots of good programmers don't have.

I hope someone will write an explanation. I'm still working on one for my own (quite different) little compiler.

46. peterfirefly ◴[05 Nov 14 22:21 UTC] No.8564632{8}[source]▶

>>8563590 #

No. The printf() requires that one has read K&R. That's not a high barrier to clear. Pointers are chapter 5.

47. boomlinde ◴[06 Nov 14 15:33 UTC] No.8567437{4}[source]▶

>>8561638 #

Personally I don't see anything wrong with suggesting to add a few hints, but the basis of that suggestion in this case was that "there's a lot more of us who could probably get the joke" with the hints. If making it approachable to more people is inherently a good thing, the logical conclusion is to make it approachable to everyone.

With code like this, the readability obviously isn't a high priority consideration and sometimes the exact opposite of the goal, with the impenetrability sometimes being part of its charm. This is Hacker News after all, and if your reaction to of a piece of code that describes itself as "an exercise in minimalism" is to leave a snarky comment about the lack of documentation, you should probably check your news elsewhere.

If you have any interest in the subject, the initial comment "just enough features to allow self-compilation and a bit more" should give the purpose of the code away. If not, it ought to have been a clear sign of dragons.

48. boomlinde ◴[06 Nov 14 19:57 UTC] No.8569378{4}[source]▶

>>8561408 #

My point isn't that not knowing what this does makes you a lowest common denominator. It's that most stuff that is shared on this site at least borders on being what I'd call esoteric, and if making each individual submission more approachable or catering to a larger general audience is a goal of this community, it isn't really going that way. If it was going that way, I doubt the community would be particularly interested in this site.

This submission in particular has dubious practical use, and the description, "an exercise in minimalism", is telling of a sort of artistic intent. If you don't understand what it does, how it does it, or if you don't like it, it won't lower my opinion of you in any way, but its inclusion on this site is part of why I like to come here every now and then. I get to discuss subjects that relate to my work and hobbies, but I also get to look at weird alien code and think hard in unfamiliar terms. From what you are saying, I think you can relate.

Personally, I could glance over it and get the idea that it is a C compiler, but if you were to show me some code in written with the latest JS MVC or FRP framework, don't hold your breath for me to tell you what it does. I can't say that I fully understand this, and that's why I enjoy the rich discussion the submission spawned here.

replies(1): >>8569700 #

49. chris_wot ◴[06 Nov 14 20:54 UTC] No.8569700{5}[source]▶

>>8569378 #

Fair enough :-) thanks for clarifying.