Most active commenters

pizlonator(9)
johncolanduoni(7)
kragen(7)
pjmlp(3)

Popular/hot comments

>>45135330 #
>>45135176 #
>>45138618 #

←back to thread

Fil's Unbelievable Garbage Collector

(fil-c.org)

Show context

crawshaw ◴[05 Sep 25 02:56 UTC] No.45134578[source]▶

>>45133938 (OP) #

It is great that Fil-C exists. This is the sort of technique that is very effective for real programs, but that developers are convinced does not work. Existence proofs cut through long circular arguments.

replies(2): >>45134840 #>>45135366 #

johncolanduoni ◴[05 Sep 25 03:52 UTC] No.45134840[source]▶

>>45134578 #

What do the benchmarks look like? My main concern with this approach would be that the performance envelope would eliminate it for the use-cases where C/C++ are still popular. If throughput/latency/footprint are too similar to using Go or what have you, there end up being far fewer situations in which you would reach for it.

replies(1): >>45134852 #

pizlonator ◴[05 Sep 25 03:56 UTC] No.45134852[source]▶

>>45134840 #

Some programs run as fast as normally. That's admittedly not super common, but it happens.

Some programs have a ~4x slowdown. That's also not super common, but it happens.

Most programs are somewhere in the middle.

> for the use-cases where C/C++ are still popular

This is a myth. 99% of the C/C++ code you are using right now is not perf sensitive. It's written in C or C++ because:

- That's what it was originally written in and nobody bothered to write a better version in any other language.

- The code depends on a C/C++ library and there doesn't exist a high quality binding for that library in any other language, which forces the dev to write code in C/C++.

- C/C++ provides the best level of abstraction (memory and syscalls) for the use case.

Great examples are things like shells and text editors, where the syscalls you want to use are exposed at the highest level of fidelity in libc and if you wrote your code in any other language you'd be constrained by that language's library's limited (and perpetually outdated) view of those syscalls.

replies(8): >>45134950 #>>45135063 #>>45135080 #>>45135102 #>>45135517 #>>45136755 #>>45137524 #>>45143638 #

1. johncolanduoni ◴[05 Sep 25 04:56 UTC] No.45135102[source]▶

>>45134852 #

While there are certainly other reasons C/C++ get used in new projects, I think 99% not being performance or footprint sensitive is way overstating it. There's tons of embedded use cases where a GC is not going to fly just from a code size perspective, let alone latency. That's mostly where I've often seen C (not C++) for new programs. Also, if Chrome gets 2x slower I'll finally switch back to Firefox. That's tens of millions of lines of performance-sensitive C++ right there.

That actually brings up another question: how would trying to run a JIT like V8 inside Fil-C go? I assume there would have to be some bypass/exit before jumping to generated code - would there need to be other adjustments?

replies(7): >>45135144 #>>45135158 #>>45135395 #>>45135400 #>>45135515 #>>45136267 #>>45138618 #

2. kragen ◴[05 Sep 25 05:04 UTC] No.45135144[source]▶

>>45135102 (TP) #

Latency is the killer, I think. A GC can be on the order of 100 instructions.

replies(2): >>45135176 #>>45135412 #

3. pizlonator ◴[05 Sep 25 05:07 UTC] No.45135158[source]▶

>>45135102 (TP) #

Most C/C++ code for old or new programs runs on a desktop or server OS where you have lots of perf breathing room. That’s my experience. And that’s frankly your experience too, if you use Linux, Windows, or Apple’s OSes

> how would trying to run a JIT like V8 inside Fil-C go?

You’d get a Fil-C panic. Fil-C wouldn’t allow you to PROT_EXEC lol

replies(2): >>45135232 #>>45135330 #

4. pizlonator ◴[05 Sep 25 05:12 UTC] No.45135176[source]▶

>>45135144 #

It’s a concurrent GC. Latency won’t kill you

I’ll admit that if you are in the business of counting instructions then other things in Fil-C will kill you. Most of the overhead is from pointer chasing.

See https://fil-c.org/invisicaps

replies(3): >>45135323 #>>45135355 #>>45135870 #

5. addaon ◴[05 Sep 25 05:27 UTC] No.45135232[source]▶

>>45135158 #

> Most C/C++ code for old or new programs runs on a desktop or server OS where you have lots of perf breathing room. That’s my experience. And that’s frankly your experience too, if you use Linux, Windows, or Apple’s OSes

What if I also use cars, and airplanes, and dishwashers, and garage doors, and dozens of other systems? At what point does most of the code I interact with /not/ have lots of breathing room? Or does the embedded code that makes the modern world run not count as "programs"?

replies(2): >>45135279 #>>45135425 #

6. pizlonator ◴[05 Sep 25 05:35 UTC] No.45135279{3}[source]▶

>>45135232 #

You have a good point!

First of all, I’m not advocating that people use Fil-C in places where it makes no sense. I wouldn’t want my car’s control system to use it.

But car systems are big if they have 100 million lines of code or maybe a billion. But your desktop OS is at like 10 billion and growing! Throw in the code that runs in servers that you rely on and we might be at 100 billion lines of C or C++

7. kragen ◴[05 Sep 25 05:44 UTC] No.45135323{3}[source]▶

>>45135176 #

"Concurrent" doesn't usually mean "bounded in worst-case execution time", especially on a uniprocessor. Does it in this case?

InvisiCaps sound unbelievably amazing. Even CHERI hasn't managed to preserve pointer size.

replies(2): >>45136725 #>>45140094 #

8. johncolanduoni ◴[05 Sep 25 05:45 UTC] No.45135330[source]▶

>>45135158 #

Thanks for telling me what my experience is, but I can think of plenty of C/C++ code on my machine that would draw ire from ~all it's users if it got 2x slower. I already mentioned browsers but I would also be pretty miffed if any of these CPU-bound programs got 2x slower:

* Compilers (including clang)

* Most interpreters (Python, Ruby, etc.)

* Any simulation-heavy video game (and some others)

* VSCode (guess I should've stuck with Sublime)

* Any scientific computing tools/libraries

Sure, I probably won't notice if zsh or bash got 2x slower and cp will be IO bound anyway. But if someone made a magic clang pass that made most programs 2x faster they'd be hailed as a once-in-a-generation genius, not blown off with "who really cares about C/C++ performance anyway?". I'm not saying there's no place for trading these overheads for making C/C++ safer, but treating it as a niche use-case for C/C++ is ludicrous.

replies(4): >>45135422 #>>45136378 #>>45136399 #>>45149655 #

9. johncolanduoni ◴[05 Sep 25 05:50 UTC] No.45135355{3}[source]▶

>>45135176 #

For embedded use cases, it can definitely kill you. Small microcontrollers frequently have constant IPC for a given instruction stream and you regularly see simple for loops get used for timing.

10. conradev ◴[05 Sep 25 05:56 UTC] No.45135395[source]▶

>>45135102 (TP) #

I feel like code size, toolchain availability and the universality of the C ABI are more good reasons for why code is written in C besides runtime performance. I’d be curious how much overhead Fil-C adds from a code size perspective, though!

replies(1): >>45140057 #

11. pjmlp ◴[05 Sep 25 05:57 UTC] No.45135400[source]▶

>>45135102 (TP) #

Books like Zen of Assembly Programming exist, exactly because once upon a time, performance sensitive and C or C++ on the same sentence did not made any sense.

It is decades of backed optimisation work, some of which exploring UB based optimizations, that has made that urban myth possible.

As the .NET team discovered, and points out on each release since .NET 5 on lengthy blog posts able to kill most browsers buffers, if the team puts down as much work on the JIT and AOT compilers as the Visual C++ team, then performance looks quite different than what everyone else expects it naturally to be like.

replies(2): >>45135511 #>>45142482 #

12. rwmj ◴[05 Sep 25 06:00 UTC] No.45135412[source]▶

>>45135144 #

In the fast case allocations can be vastly cheaper than malloc, usually just a pointer decrement and compare. You'll need to ensure that your fast path never has the need to collect the minor heap, which can be done if you're careful. I hate this comparison that is always done as if malloc/free are completely cost-free primitives.

replies(1): >>45135541 #

13. pjmlp ◴[05 Sep 25 06:02 UTC] No.45135422{3}[source]▶

>>45135330 #

Many compilers are bootstrapped.

Ruby is partially written in Rust nowadays.

VSCode uses plenty of Rust and .NET AOT on its extensions, alongside C++, and more recently Webassembly, hence why it is the only Electron garbage with acceptable performance.

Unity and Unreal share a great deal of games, with plenty of C#, Blueprints, Verse, and a GC for C++.

14. pjmlp ◴[05 Sep 25 06:03 UTC] No.45135425{3}[source]▶

>>45135232 #

Some of that is thankfully running Ada.

replies(1): >>45139590 #

15. ngrilly ◴[05 Sep 25 06:21 UTC] No.45135511[source]▶

>>45135400 #

You got me curious and I visited one of these .NET performance posts and indeed, it crashed my browser tab!

16. aseipp ◴[05 Sep 25 06:22 UTC] No.45135515[source]▶

>>45135102 (TP) #

Chrome is not a good counter example a priori. It is a project that has hundreds of engineers assigned to it, some of them world-class security engineers, so they can potentially accept the burden of hardening their code and handling security issues with a regular toolchain. They've may have even evaluated such solutions already.

I think an important issue is that for performative sensitive C++ stuff and related domains, it's somewhat all or nothing with a lot of these tools. Like, a CAD program is ideally highly performant, but I also don't want it to own my machine if I load a malicious file. I think that's the hardest thing and there isn't any easy lift-and-shift solution for that, I believe.

I think some C++ projects probably could actually accept a 2x slowdown, honestly. Like I'm not sure if LibrePCB taking 2x as long in cycles would really matter. Maybe it would.

17. kragen ◴[05 Sep 25 06:27 UTC] No.45135541{3}[source]▶

>>45135412 #

I agree, and I've written an allocator in C that works that way. The fast path is about 5 clock cycles on common superscalar processors, which is about 7–10× faster than malloc: http://canonical.org/~kragen/sw/dev3/kmregion.h

This is bottlenecked on memory access that is challenging to avoid in C. You could speed it up by at least 2× with some compiler support, and maybe even without it, but I haven't figured out how. Do you have any ideas?

Typically, though, when you are trying to do WCET analysis, as you know, you try to avoid any dynamic allocation in the time-sensitive part of the program. After all, if completing a computation after a deadline would cause a motor to catch fire or something, you definitely don't want to abort the computation entirely with an out-of-memory exception!

Some garbage collectors can satisfy this requirement just by not interfering with code that doesn't allocate, but typically not concurrent ones.

18. yvdriess ◴[05 Sep 25 07:24 UTC] No.45135870{3}[source]▶

>>45135176 #

There's tricks to improve the performance of pointer chasing on modern uarchs (cfr go's Greentea GC). You want to batch the address calculation/loading, deref/load and subsequent dependent ops like marking. Reorder buffers and load-store buffers are pretty big these days, so anything that breaks the addr->load->do dependency chain is a huge win, especially if there are any near that traverse loop.

19. mike_hearn ◴[05 Sep 25 08:26 UTC] No.45136267[source]▶

>>45135102 (TP) #

Chrome is a bad example. It uses a tracing GC in its most performance sensitive parts explicitly to reduce the number of memory safety bugs (it's called Oilpan). And much of the rest is written in C++ simply because that's the language Chrome standardized on, they are comfortable relying on kernel sandboxes and IPC rather than switching to a more secure language.

replies(2): >>45138899 #>>45146262 #

20. zelphirkalt ◴[05 Sep 25 08:44 UTC] No.45136378{3}[source]▶

>>45135330 #

Question is, whether one would really notice a slowdown of factor 2 in a browser. For example, if it takes some imaginary 2ms to close a tab, would one notice, if it now took 4ms? And for page rendering the bottleneck might be retrieving those pages.

replies(2): >>45136830 #>>45137704 #

21. spacechild1 ◴[05 Sep 25 08:48 UTC] No.45136399{3}[source]▶

>>45135330 #

I would like to add:

* DAWs and audio plugins

* video editors

Audio plugins in particular need to run as fast as possible because they share the tiny time budget of a few milliseconds with dozens or even hundreds of other plugins instances. If everthing is suddenly 2x slower, some projects simply won't anymore in realtime.

22. gf000 ◴[05 Sep 25 09:39 UTC] No.45136725{4}[source]▶

>>45135323 #

> "Concurrent" doesn't usually mean "bounded in worst-case execution time"

Sure, though this is also true for ordinary serial code, with all the intricate interactions between the OS scheduler, different caches, filesystem, networking, etc.

replies(1): >>45136794 #

23. kragen ◴[05 Sep 25 09:53 UTC] No.45136794{5}[source]▶

>>45136725 #

Usually when people care about worst-case execution time, they are running their code on a computer without caches and either no OS or an OS with a very simple, predictable scheduler. And they never access the filesystem (if there is one) or wait on the network (if there is one) in their WCET-constrained code.

Those are the environments that John upthread was talking about when he said:

> There's tons of embedded use cases where a GC is not going to fly just from a code size perspective, let alone latency. That's mostly where I've often seen C (not C++) for new programs.

But I've seen C++ there too.

If you're worried about the code size of a GC you probably don't have a filesystem.

replies(2): >>45137346 #>>45140130 #

24. saagarjha ◴[05 Sep 25 09:59 UTC] No.45136830{4}[source]▶

>>45136378 #

Yes, people will absolutely notice. There's plenty of interactions that take 500ms that will now take a second.

25. gf000 ◴[05 Sep 25 11:28 UTC] No.45137346{6}[source]▶

>>45136794 #

Well, there is a whole JVM implementation for hard real-time with a GC, that's used in avionics/military -- hard real time is a completely different story, slowness is not an issue here, you exchange fast execution for a promise of keeping a response time.

But I don't really think it's meaningful to bring that up as it is a niche of a niche. Soft-real time (which most people may end up touching, e.g. video games) are much more forgiving, see all the games running on Unity with a GC. An occasional frame drop won't cause an explosion here, and managed languages are more than fine.

replies(1): >>45137511 #

26. kragen ◴[05 Sep 25 11:54 UTC] No.45137511{7}[source]▶

>>45137346 #

Are you talking about Ovm https://dl.acm.org/doi/10.1145/1324969.1324974 https://apps.dtic.mil/sti/citations/ADA456895? pizlonator (the Fil-C author) was one of Ovm's authors 17 years ago. I don't think it's in current use, but hopefully he'll correct me if I'm wrong. The RTSJ didn't require a real-time GC (and IIRC at the time it wasn't known how to write a hard-real-time GC without truly enormous overheads) and it didn't have a real-time GC at the time. Perhaps one has been added since then.

I don't agree that "it is a niche of a niche". There are probably 32× as many computers in your house running hard-real-time software as computers that aren't. Even Linux used to disable interrupts during IDE disk accesses!

27. const_cast ◴[05 Sep 25 12:24 UTC] No.45137704{4}[source]▶

>>45136378 #

2 - 4 ms? No. The problem is that many web applications are already extremely slow and bogged down in the browser. 500 ms - 1s? Yes, definitely people will notice. Although that only really applies to React applications that do too much, network latency isn't affected.

28. pizlonator ◴[05 Sep 25 13:54 UTC] No.45138618[source]▶

>>45135102 (TP) #

> While there are certainly other reasons C/C++ get used in new projects, I think 99% not being performance or footprint sensitive is way overstating it.

Here’s my source. I’m porting Linux From Scratch to Fil-C

There is load bearing stuff in there that I’d never think of off the top of my head that I can assure you works just as well even with the Fil-C tax. Like I can’t tell the difference and don’t care that it is technically using more CPU and memory.

So then you’ve got to wonder, why aren’t those things written in JavaScript, or Python, or Java, or Haskell? And if you look inside you just see really complex syscall usage. Not for perf but for correctness. It code that would be zero fun to try to write in anything other than C or C++

replies(3): >>45141479 #>>45143527 #>>45146455 #

29. wffurr ◴[05 Sep 25 14:17 UTC] No.45138899[source]▶

>>45136267 #

Chrome security is encouraging use of memory safe languages via the Rule of 2: https://chromium.googlesource.com/chromium/src/+/main/docs/s...

IIRC Crubit C++/Rust Interop is from the chrome team: https://github.com/google/crubit

replies(1): >>45140153 #

30. addaon ◴[05 Sep 25 15:19 UTC] No.45139590{4}[source]▶

>>45135425 #

Not in my case.

31. pizlonator ◴[05 Sep 25 16:00 UTC] No.45140057[source]▶

>>45135395 #

Code size overhead is really bad right now, but I wouldn't read anything into that other than "Fil didn't optimize it yet".

Reasons why it's stupidly bad:

- So many missing compiler optimizations (obviously those will also improve perf too).

- When the compiler emits metadata for functions and globals, like to support accurate GC and the stack traces you get on Fil-C panic, I use a totally naive representation using LLVM structs. Zero attempt to compress anything. I'm not doing any of the tricks that DWARF would do, for example.

- In many cases it means that strings, like names of functions, appear twice (once for the purposes of the linker and a second time for the purposes of my metadata).

- Lastly, an industrially optimized version of Fil-C would ditch ELF and just have a Fil-C-optimized linker format. That would obviate the need for a lot of the cruft I emit that allows me to sneakily make ELF into a memory safe linker. Then code size would go down by a ton

I wish I had data handy on just how much I bloat code. My totally unscientific guess is like 5x

32. pizlonator ◴[05 Sep 25 16:02 UTC] No.45140094{4}[source]▶

>>45135323 #

> "Concurrent" doesn't usually mean "bounded in worst-case execution time", especially on a uniprocessor. Does it in this case?

Meh. I was in the real time GC game for a while, when I was younger. Nobody agrees on what it really means to bound the worst case. If you're a flight software engineer, it means one thing. If you're a game developer, it means something else entirely. And if you're working on the audio stack specifically, it means yet another thing (somewhere in between game and flight).

So let me put it this way, using the game-audio-flight framework:

- Games: I bound worst case execution time, just assuming a fair enough OS scheduler, even on uniprocessor.

- Audio: I bound worst case execution time if you have multiple cores.

- Flight: I don't bound worst case execution time. Your plane crashes and everyone is dead

replies(1): >>45143570 #

33. pizlonator ◴[05 Sep 25 16:05 UTC] No.45140130{6}[source]▶

>>45136794 #

Yeah totally, if you're in those kinds of environments, then I agree that a GC is a bad choice of tech.

I say that even though, as you noticed in another reply, I worked on research to try to make GC suitable for exactly those environments. I had some cool demos, and a lot of ideas in FUGC come from that. But I would not recommend you use GC in those environments!

There is a way to engineer Fil-C to not rely on GC. InvisiCaps would work with isoheaps (what those embedded dudes would just call "object pools"). So, if we wanted to make a Fil-C-for-flight-software then that's what it would look like, and honestly it might even be super cool

34. mike_hearn ◴[05 Sep 25 16:07 UTC] No.45140153{3}[source]▶

>>45138899 #

Memory safe languages aren't allowed in the Chrome codebase. Java is only for Android, Swift only for iOS/Mac, and Rust only for third party uses.

That might well change, but it's what their docs currently say.

replies(1): >>45140641 #

35. steveklabnik ◴[05 Sep 25 16:44 UTC] No.45140641{4}[source]▶

>>45140153 #

> That might well change, but it's what their docs currently say.

It's not, actually: https://source.chromium.org/chromium/chromium/src/+/main:doc...

> Rust can be used anywhere in the Chromium repository (not just //third_party) subject to current interop capabilities, however it is currently subject to a internal approval and FYI process. Googlers can view go/chrome-rust for details. New usages of Rust are documented at rust-fyi@chromium.org.

It is true that two years ago it was only third party, but it's been growing ever since.

36. reorder9695 ◴[05 Sep 25 17:53 UTC] No.45141479[source]▶

>>45138618 #

I have no credentials here but I'd be interested in knowing what environmental impact things like this (like relatively high overhead things like filc, vms, containers) as opposed to running optimised well designed code. I don't mean in regular project's, but in things specifically like the linux kernel that's potentially millions? billions? of computers

37. johncolanduoni ◴[05 Sep 25 19:15 UTC] No.45142482[source]▶

>>45135400 #

What is in theory possible in a language/runtime is often less important than historically contingent factors like which languages it’s easy to hire developers for that can achieve certain performance envelopes and which ecosystems have good tooling for micro-optimization.

In JS for example, if you can write your code as a tight loop operating on ArrayBuffer views you can achieve near C performance. But that’s only if you know what optimizations JS engines are good at and have a mental model how processors respond to memory access patterns, which very few JS developers will have. It’s still valid to note that idiomatic JS code for an arbitrary CPU-bound task is usually at least tens of times slower than idiomatic C.

38. kragen ◴[05 Sep 25 20:58 UTC] No.45143527[source]▶

>>45138618 #

I wonder if something like LuaJIT would be an option. Certainly Objective-C would work.

39. kragen ◴[05 Sep 25 21:02 UTC] No.45143570{5}[source]▶

>>45140094 #

Haha, yeah, I know.

40. johncolanduoni ◴[06 Sep 25 03:02 UTC] No.45146262[source]▶

>>45136267 #

The only thing I intimated about Chrome is that if it got 2x slower, many users would in fact care. I have no doubt that they very well might not write it in C++ if they started today (well, if they decided not to start with a fork of the WebKit HTML engine). I’m not sure what Oilpan has to do with anything I said - I suspect that it would do memory operations too opaque for Fil-C’s model and V8 certainly would but I don’t see how that makes it a bad example of performance-sensitive C++ software.

41. johncolanduoni ◴[06 Sep 25 03:53 UTC] No.45146455[source]▶

>>45138618 #

My source is that Google spent a bunch of engineer time to write, test, and tweak complicated outlining passes for LLVM to get broad 1% performance gains in C++ software, and everybody hailed it as a masterstroke when it shipped. Was that performance art? 1% of C++ developers drowning out the apparent 99% of ones that didn’t (or shouldn’t) care?

I never said there was no place for taking a 2x performance hit for C or C++ code. I think Fil-C is a really interesting direction and definitely well executed. I just don’t see how you can claim that C++ code that can’t take a 2x performance hit is some bizarre, 1% edge case for C++.

42. pizlonator ◴[06 Sep 25 14:37 UTC] No.45149655{3}[source]▶

>>45135330 #

I’m already living on a Fil-C compiled CPython. It doesn’t matter.

And a Fil-C compiled text editor. Not VSCode, but still

I absolutely do think you could make the browser 5x slower (in CPU time - not in IO time) and you wouldn’t care. For example Lockdown Mode really doesn’t change your UX. Or using a browser on a 5x slower computer. You barely notice.

And most of the extant C++ code doesn’t fit into any of the categories you listed.

↑