Most active commenters
  • bsder(5)
  • ephaeton(3)

←back to thread

Things Zig comptime won't do

(matklad.github.io)
458 points JadedBlueEyes | 19 comments | | HN request time: 1.228s | source | bottom
Show context
ephaeton ◴[] No.43745670[source]
zig's comptime has some (objectively: debatable? subjectively: definite) shortcomings that the zig community then overcomes with zig build to generate code-as-strings to be lateron @imported and compiled.

Practically, "zig build"-time-eval. As such there's another 'comptime' stage with more freedom, unlimited run-time (no @setEvalBranchQuota), can do IO (DB schema, network lookups, etc.) but you lose the freedom to generate zig types as values in the current compilation; instead of that you of course have the freedom to reduce->project from target compiled semantic back to input syntax down to string to enter your future compilation context again.

Back in the day, where I had to glue perl and tcl via C at one point in time, passing strings for perl generated through tcl is what this whole thing reminds me of. Sure it works. I'm not happy about it. There's _another_ "macro" stage that you can't even see in your code (it's just @import).

The zig community bewilders me at times with their love for lashing themselves. The sort of discussions which new sort of self-harm they'd love to enforce on everybody is borderline disturbing.

replies(7): >>43745717 #>>43746029 #>>43749212 #>>43749261 #>>43750375 #>>43750463 #>>43750751 #
1. bsder ◴[] No.43746029[source]
> The zig community bewilders me at times with their love for lashing themselves. The sort of discussions which new sort of self-harm they'd love to enforce on everybody is borderline disturbing.

Personally, I find the idea that a compiler might be able to reach outside itself completely terrifying (Access the network or a database? Are you nuts?).

That should be 100% the job of a build system.

Now, you can certainly argue that generating a text file may or may not be the best way to reify the result back into the compiler. However, what the compiler gets and generates should be completely deterministic.

replies(8): >>43746364 #>>43746553 #>>43747061 #>>43747350 #>>43748448 #>>43749876 #>>43763255 #>>43772068 #
2. bmacho ◴[] No.43746364[source]
They are not advocating for IO in the compiler, but everything else that other languages can do with macros: run commands comptime, generate code, read code, modify code. It's proven to be very useful.
replies(1): >>43747577 #
3. ephaeton ◴[] No.43746553[source]
> Personally, I find the idea that a compiler might be able to reach outside itself completely terrifying (Access the network or a database? Are you nuts?).

What is "itself" here, please? Access a static 'external' source? Access a dynamically generated 'external' source? If that file is generated in the build system / build process as derived information, would you put it under version control? If not, are you as nuts as I am?

Some processes require sharp tools, and you can't always be afraid to handle one. If all you have is a blunt tool, well, you know how the saying goes for C++.

> However, what the compiler gets and generates should be completely deterministic.

The zig community treats 'zig build' as "the compile step", ergo what "the compiler" gets ultimately is decided "at compile, er, zig build time". What the compiler gets, i.e., what zig build generates within the same user-facing process, is not deterministic.

Why would it be. Generating an interface is something that you want to be part of a streamline process. Appeasing C interfaces will be moving to a zig build-time multi-step process involving zig's 'translate-c' whose output you then import into your zig file. You think anybody is going to treat that output differently than from what you'd get from doing this invisibly at comptime (which, btw, is what practically happens now)?

replies(2): >>43747717 #>>43750460 #
4. panzi ◴[] No.43747061[source]
> Personally, I find the idea that a compiler might be able to reach outside itself completely terrifying (Access the network or a database? Are you nuts?).

Yeah, although so can build.rs or whatever you call in your Makefile. If something like cargo would have built-in sandboxing, that would be interesting.

replies(1): >>43749083 #
5. forrestthewoods ◴[] No.43747350[source]
> Personally, I find the idea that a compiler might be able to reach outside itself completely terrifying (Access the network or a database? Are you nuts?).

It’s not the compiler per se.

Let’s say you want a build system that is capable of generating code. Ok we can all agree that’s super common and not crazy.

Wouldn’t it be great if the code that generated Zig code also be written in Zig? Why should codegen code be written in some completely unrelated language? Why should developers have to learn a brand new language to do compile time code Gen? Why yes Rust macros I’m staring angrily at you!

6. bsder ◴[] No.43747577[source]
I'm going to make you defend that statement that they are "useful". I would counter than macros are "powerful".

However, "macros" are a disaster to debug in every language that they appear. "comptime" sidesteps that because you can generally force it to run at runtime where your normal debugging mechanisms work just fine (returning a type being an exception).

"Macros" generally impose extremely large cognitive overhead and making them hygienic has spawned the careers of countless CS professors. In addition, macros often impose significant compiler overhead (how many crates do Rust's proc-macros pull in?).

It is not at all clear that the full power of general macros is worth the downstream grief that they cause (I also hold this position for a lot of compiler optimizations, but that's a rant for a different day).

replies(1): >>43749023 #
7. bsder ◴[] No.43747717[source]
> The zig community treats 'zig build' as "the compile step", ergo what "the compiler" gets ultimately is decided "at compile, er, zig build time". What the compiler gets, i.e., what zig build generates within the same user-facing process, is not deterministic.

I know of no build system that is completely deterministic unless you go through the process of very explicitly pinning things. Whereas practically every compiler is deterministic (gcc, for example, would rebuild itself 3 times and compare the last two to make sure they were byte identical). Perhaps there needs to be "zigmeson" (work out and generate dependencies) and "zigninja" (just call compiler on static resources) to set things apart, but it doesn't change the fact that "zig build" dispatches to a "build system" and "zig"/"zig cc" dispatches to a "compiler".

> Appeasing C interfaces will be moving to a zig build-time multi-step process involving zig's 'translate-c' whose output you then import into your zig file. You think anybody is going to treat that output differently than from what you'd get from doing this invisibly at comptime (which, btw, is what practically happens now)?

That's a completely different issue, but it illustrates the problem perfectly.

The problem is that @cImport() can be called from two different modules on the same file. What about if there are three? What about if they need different versions? What happens when a previous @cImport modifies how that file translates. How do you do link time optimization on that?

This is exactly why your compiler needs to run on static resources that have already been resolved. I'm fine with my build system calling a SAT solver to work out a Gordian Knot of dependencies. I am not fine with my compiler needing to do that resolution.

8. eddythompson80 ◴[] No.43748448[source]
> Personally, I find the idea that a compiler might be able to reach outside itself completely terrifying (Access the network or a database? Are you nuts?).

Why though? F# has this feature called TypeProviders where you can emit types to the compiler. For example, you can do do:

   type DbSchema = PostgresTypeProvider<"postgresql://postgres:...">
   type WikipediaArticle = WikipediaTypeProvider<"https://wikipedia.org/wiki/Hello">

and now you have a type that references that Article or that DB. You can treat it as if you had manually written all those types. You can fully inspect it in the IDE, debugger or logger. It's a full type that's autogenerated in a temp directory.

When I first saw it, I thought it was really strange. Then thought about it abit, played with it, and thought it was brilliant. Literally one of the smartest ideas ever. It's first class codegen framework. There were some limitations, but still.

After using it in a real project, you figure out why it didn't catch on. It's so close, but it's missing something. Just one thing is out of place there. The interaction is painful for anything that's not a file source, like CsvTypeProvider or a public internet url. It does also create this odd dependenciey that your code has that can't be source controlled or reproduced. There were hacks and workarounds, but nothing felt right for me.

It was however, the best attempt at a statically typed language trying to imitate python or javascript scripting syntax. Where you just say put a db uri, and you start assuming types.

9. disentanglement ◴[] No.43749023{3}[source]
> However, "macros" are a disaster to debug in every language that they appear.

I have only used proper macros in Common Lisp, but at least there they are developed and debugged just like any other function. You call `macroexpand` in the repl to see the output of the macro and if there's an error you automatically get thrown in the same debugger that you use to debug other functions.

replies(1): >>43756279 #
10. jenadine ◴[] No.43749083[source]
You can run cargo in a sandbox.
replies(1): >>43754060 #
11. SleepyMyroslav ◴[] No.43749876[source]
>Personally, I find the idea that a compiler might be able to reach outside itself completely terrifying (Access the network or a database? Are you nuts?).

In gamedev code is small part of the end product. "Data-driven" is the term if you want to look it up. Doing an optimization pass that will partially evaluate data+code together as part of the build is normal. Code has like 'development version' that supports data modifications and 'shipping version' that can assume that data is known.

The more traditional example of PGO+LTO is just another example how code can be specialized for existing data. I don't know a toolchain that survives change of PGO profiling data between builds without drastic changes in the resulting binary.

replies(1): >>43756356 #
12. throwawaymaths ◴[] No.43750460[source]
> What is "itself"

If I understand correctly the zig compiler is sandboxed to the local directory of the project's build file. Except for possibly c headers.

The builder and linker can reach out a bit.

replies(1): >>43750534 #
13. ephaeton ◴[] No.43750534{3}[source]
at "build time", the default language's build tool, a zig program, can reach anywhere and everywhere. To build a zig project, you'd use a zig program to create dependencies and invoke the compiler, cache the results, create output binaries, link them, etc.

Distinguishing between `comptime` and `build time` is a distinction from the ivory tower. 'zig build' can happily reach anywhere, and generate anything.

replies(1): >>43750956 #
14. throwawaymaths ◴[] No.43750956{4}[source]
Its not just academic, because if you try to @include something from out of path in your code you'll not be happy. Moreover, 'zig build' is not the only tool in the zig suite, there's individual compilation commands too. So there are real implications to this.

It is also helpful for code/security review to have a one-stop place to look to see if anything outside of the git tree/submodule system can affect what's run.

15. panzi ◴[] No.43754060{3}[source]
Yeah, but I want cargo to do that for me. And tell me if any build.rs does something it shouldn't.
16. bsder ◴[] No.43756279{4}[source]
So, for debugging, we're already in the REPL--which means an interactive environment and the very significant amount of overhead baggage that goes with that (heap allocation, garbage collection, tty, interactive prompt, overhead of macroexpand, etc.).

At the very least, that places you outside the boundary of a lot of the types of system programming that languages like C, C++, Rust, and Zig are meant to do.

17. bsder ◴[] No.43756356[source]
Is the PGO data not a static file which is then fed into the compiler? That still gives you a deterministic compiler, no?
18. naasking ◴[] No.43763255[source]
> That should be 100% the job of a build system.

What is the primary difference between build system and compiler in your mind? Why not have the compiler know how to build things, and so compile-time codegen you want to put in the build system, happens during compilation?

19. CRConrad ◴[] No.43772068[source]
Personally, I find the idea of needing something called a "build system" completely terrifying.