Most active commenters

WalterBright(6)
toast0(5)
sitkack(4)
eru(4)
mbac32768(3)
(3)
throwaway2037(3)
PeterStuer(3)
ch33zer(3)
baq(3)

Popular/hot comments

>>45072199 #
>>45069282 #
>>45070345 #
>>45073339 #
>>45071106 #
>>45070231 #
>>45069369 #
>>45070341 #
>>45071005 #
>>45069918 #
>>45069349 #
>>45072259 #
>>45072424 #
>>45069287 #

←back to thread

John Carmack's arguments against building a custom XR OS at Meta

(twitter.com)

https://xcancel.com/ID_AA_Carmack/status/1961172409920491849

1. jnwatson ◴[29 Aug 25 17:46 UTC] No.45067216[source]▶

>>45066395 (OP) #

I've written a lot of low level software, BSPs, and most of an OS, and the main reason to not write your own OS these days is silicon vendors. Back in the day, they would provide you a spec detailed enough that you could feasibly write your own drivers.

These days, you get a medium-level description and a Linux driver of questionable quality. Part of this is just laziness, but mostly this is a function of complexity. Modern hardware is just so complicated it would take a long time to completely document, and even longer to write a driver for.

replies(13): >>45067491 #>>45069282 #>>45069287 #>>45069349 #>>45069690 #>>45070345 #>>45071036 #>>45071086 #>>45072259 #>>45072391 #>>45073789 #>>45075476 #>>45081942 #

2. tanvach ◴[29 Aug 25 18:09 UTC] No.45067491[source]▶

>>45067216 (TP) #

Yeah reverse engineering all the drivers is going to be a huge headache.

replies(1): >>45069918 #

3. boredatoms ◴[29 Aug 25 20:54 UTC] No.45069282[source]▶

>>45067216 (TP) #

Presumably if you’re meta you could pay the vendors enough to write drivers for any arbitrary OS

replies(5): >>45069321 #>>45069853 #>>45070444 #>>45071037 #>>45073107 #

4. mbac32768 ◴[29 Aug 25 20:54 UTC] No.45069287[source]▶

>>45067216 (TP) #

Yeah this. I tried to modify a hobby OS recently so it would process the "soft reboot" button (to speed up being rebooted in GCP) and it was so unbelievably hard to figure out how to support it. I tried following the instructions on the OS Dev Wiki and straight up reading what both Linux and FreeBSD do and still couldn't make progress. Yes. The thing that happens when you tell Windows or Linux to "restart". Gave up on this after spending days on it.

The people who develop OSes are cut from a different cloth and are not under the usual economic pressures.

replies(3): >>45069466 #>>45069733 #>>45070466 #

5. rwmj ◴[29 Aug 25 20:59 UTC] No.45069321[source]▶

>>45069282 #

But is that a good use of Meta's money? Compared to making a few patches to Linux to fix any performance problems they find.

(And I feel bad saying this since Meta obviously did waste eleventy billion on their ridiculous Second Life recreation project ...)

replies(1): >>45069369 #

6. bbarnett ◴[29 Aug 25 21:03 UTC] No.45069349[source]▶

>>45067216 (TP) #

Modern hardware is just so complicated it would take a long time to completely document, and even longer to write a driver for.

That's what's claimed. That's what people say, yet it's just an excuse. I've heard the same sort of excuse people have, after they write a massive codebase, then say "Oops, sorry, didn't get around to documenting it".

And no, hardware is not more difficult than software to document.

If the system is complex, there's more need to document, just as with a huge codebase. On their end, they have new employees to train up, and they have to manage testing. So any excuse that silicon vendors have to deal with such immense complexity? My violin plays for them.

replies(3): >>45069603 #>>45070341 #>>45073345 #

7. bbarnett ◴[29 Aug 25 21:05 UTC] No.45069369{3}[source]▶

>>45069321 #

I don't like Meta, but there used to be a time where big corp used to spend 30% of its budget on R&D. It's how we got all the toys we have now, R&D labs of big Bell and others.

So please don't mock the spend. Big spends fail sometimes, and at least people were paid to do the work.

replies(3): >>45069405 #>>45069546 #>>45072217 #

8. rwmj ◴[29 Aug 25 21:08 UTC] No.45069405{4}[source]▶

>>45069369 #

It's just that it was so obviously going to fail, because there's no mass market for a product that you have to strap onto your face. You didn't need to spend billions to learn that.

If they'd spent the money researching nuclear fusion or space flight or a new way to develop microprocessors, I would be cheering their efforts even if they had failed in the end.

9. gmueckl ◴[29 Aug 25 21:14 UTC] No.45069466[source]▶

>>45069287 #

I also think that they have access to more helpful resources than people outside the field do, e.g. being able to contact people working on the lower layers to get the missing info. These channels exist in the professional world, but they are hard to access.

10. crote ◴[29 Aug 25 21:23 UTC] No.45069546{4}[source]▶

>>45069369 #

The difference is that organisations like Bell Labs and Xerox PARC were primarily tech-first: innovations were the result of very clever and creative people doing blue skies research. The most groundbreaking stuff shocked the world while it was still a hacked-together demo, and similarly the cost of failure was quite low.

On the other hand, Meta's experiment is primarily CEO-driven. The outcome is predetermined, changing direction is not possible. Sure, clever engineers get to draw the rest of the owl, but that's not very useful when it turns out that everyone needs a horse instead.

They are spending a fortune, but rather than getting 900 crappy ideas to throw away and 100 great ones to pick from for continued development, they are developing 1 technological marvel nobody is interested in.

replies(2): >>45069709 #>>45069825 #

11. supermatt ◴[29 Aug 25 21:28 UTC] No.45069603[source]▶

>>45069349 #

> If the system is complex, there's more need to document

It’s not first party documentation that’s the problem. The problem is that they don’t share that documentation, so in order to get documentation for an “unsupported” OS a 3rd party needs to reverse engineer it.

12. ◴[29 Aug 25 21:37 UTC] No.45069690[source]▶

>>45067216 (TP) #

13. throwway120385 ◴[29 Aug 25 21:39 UTC] No.45069709{5}[source]▶

>>45069546 #

It was also pretty obvious how the VR glasses would support Meta's existing goals. It would give Meta total power over what you see and who you can speak with through their system. It's a natural extension of their total control over how people interact with on the Internet. And I think the only reason it failed is because it was expensive and dumb-looking.

14. sitkack ◴[29 Aug 25 21:42 UTC] No.45069733[source]▶

>>45069287 #

The VMM on GCP has only really been tested with Linux. You are kinda wasting your time, the only way to make it work is to make the hobby OS Linux.

replies(1): >>45070064 #

15. ForHackernews ◴[29 Aug 25 21:50 UTC] No.45069825{5}[source]▶

>>45069546 #

Arguably the distinction you're pointing at is macroinvention (the transistor) vs microinvention (a better VR headset): one is a refinement of something that exists, another is transformative opening up whole new worlds of possibility. https://www.antonhowes.com/blog/macroinvention-vs-microinven...

replies(2): >>45072225 #>>45073396 #

16. silvestrov ◴[29 Aug 25 21:52 UTC] No.45069853[source]▶

>>45069282 #

Vendors might say that they don't have the resources (man hours) and don't want to hand over documentation to external developers.

17. markus_zhang ◴[29 Aug 25 22:00 UTC] No.45069918[source]▶

>>45067491 #

Sounds like super fun if I could be paid a bit for it.

What is an easy gate task to get into “reverse engineering some drivers for some OS”?

Second thought: I don’t even know how to write a driver or a kernel, so I better start from there.

replies(3): >>45069947 #>>45070420 #>>45073406 #

18. wmf ◴[29 Aug 25 22:04 UTC] No.45069947{3}[source]▶

>>45069918 #

Asahi Linux.

19. toast0 ◴[29 Aug 25 22:19 UTC] No.45070064{3}[source]▶

>>45069733 #

> You are kinda wasting your time, the only way to make it work is to make the hobby OS Linux.

Not the parent, but of course they're wasting their time... That's the point of a hobby OS.

I'm working on a hobby OS, and I have no illusions that it's most likely fewer than 10 people will ever run it, and less than 100 will hear about it, but it lets me explore some interesting (to me) ideas, and forces me to learn a little more about random pieces of computing. If I ran on GCP, I'd want the reboot button to work. That sounds useful.

On the topic, I don't see why anyone would want to build a general purpose OS. There's enough already and even with the shrinking of hardware variety, there's a lot of stuff to support to make a general purpose OS work on enough hardware for people to consider using it. You can take Linux or a BSD and hack it up pretty good to explore a lot of OS ideas. Chances are you're going to borrow some of their drivers anyway, and then you'll end up with at least some similarity... may as well start there and save a lot of time. (My hobby OS has a custom kernel and custom drivers, but I only support a bare minimum of devices... (pc) console i/o, one real NIC, and virtio-net... that's all I need; I might add support for more NICs and more consoles later)

replies(1): >>45070231 #

20. sitkack ◴[29 Aug 25 22:41 UTC] No.45070231{4}[source]▶

>>45070064 #

I didn't say they were wasting their time on their hobby OS, they are wasting their time trying to get it to do very esoteric thing on GCP.

They aren't trying to get reboot to work, they are trying to get their version of kexec to work so their hobby os reboots faster.

https://wiki.archlinux.org/title/Kexec

The biggest scam in the OS world is drivers, we should demand more out of our hardware. Drivers shouldn't be necessary.

replies(4): >>45070352 #>>45072184 #>>45073077 #>>45075278 #

21. makeitdouble ◴[29 Aug 25 22:57 UTC] No.45070341[source]▶

>>45069349 #

> "Oops, sorry, didn't get around to documenting it".

That's obviously the wrong message. They should say "Go ask the engineering VP to get us off any other projects for another cycle while we're writing 'satisfying' documentation".

Extensive documentation comes at a price few companies are willing to pay (and that's not just a matter of resources. Look at Apple's documentation)

replies(3): >>45071005 #>>45072212 #>>45072560 #

22. dist1ll ◴[29 Aug 25 22:57 UTC] No.45070345[source]▶

>>45067216 (TP) #

Intel still does it. As far as I can see they're the only player in town that provide open, detailed documentation for their high-speed NICs [0]. You can actually write a driver for their 100Gb cards from scratch using their datasheet. Most other vendors would either (1) ignore you, (2) make you sign an NDA or (3) refer you to their poorly documented Linux/BSD driver.

Not sure what the situation is for other hardware like NVMe SSDs.

[0] 2750 page datasheet for the e810 Ethernet controller https://www.intel.com/content/www/us/en/content-details/6138...

replies(4): >>45070705 #>>45071380 #>>45072199 #>>45076796 #

23. toast0 ◴[29 Aug 25 22:58 UTC] No.45070352{5}[source]▶

>>45070231 #

They said they wanted the soft reboot button to work. I assumed they meant catching the button press, which having seen some of this stuff is probably very tricky.

I don't see why a kexec alike wouldn't work about the same on GCP vs qemu vs bare metal... Or what that has to do with a GCP soft reboot button (which again, I think is referring to the reboot button in the GCP console)

Either way, the whole thing is a waste of time, yes? Why not waste time on the part that's engaging?

> The biggest scam in the OS world is drivers, we should demand more out of our hardware. Drivers shouldn't be necessary.

I can't even fathom what you mean here? You've got to have some interface to communicate with hardware. That's a driver. Some hardware only needs a very small driver... Tell the hardware where to send input, how to notify when input is ready and when its ready for output, and tell the hardware where data to output is. Maybe some setup stuff for modes and whatever if the needs aren't obvious and universal. I don't see how you could possibly avoid that.

It would certainly be possible for more devices to use common interfaces so a single driver could operate many different devices. Maybe that's what you mean? There's some movement towards that... SATA controllers generally speak AHCI, human interface devices generally appear as USB HID devices, etc. NICs tend to have a wide variety of setup sequences, but data queues usually fit into one of a limited number of patterns.

replies(1): >>45071090 #

24. toast0 ◴[29 Aug 25 23:08 UTC] No.45070420{3}[source]▶

>>45069918 #

I don't know how you get paid for it, but if you want to write your own kernel, I'd start with an osdev tutorial. started with this one [1], but this one [2] has a promissing name... and I haven't really looked around.

It helps to have a concept to guide you too, but you can certainly make some progress on the basics before you figure out what you really want to do.

[1] https://wiki.osdev.org/User:Zesterer/Bare_Bones

[2] https://osdev.wiki/wiki/Multiboot1_Bare_Bones

replies(1): >>45070553 #

25. eklitzke ◴[29 Aug 25 23:11 UTC] No.45070444[source]▶

>>45069282 #

Writing drivers is easy, getting vendors to write *correct* drivers is difficult. At work right now we are working with a Chinese OEM with a custom Wifi board with a chipset with firmware and drivers supplied by the vendor. It's actually not a new wifi chipset, they've used it in other products for years without issues. In conditions that are difficult to reproduce sometimes the chipset gets "stuck" and basically stops responding or doing any wifi things. This appears to be a firmware problem because unloading and reloading the kernel module doesn't fix the issue. We've supplied loads of pcap dumps to the vendor, but they're kind of useless to the vendor because (a) pcap can only capture what the kernel sees, not what the wifi chipset sees, (b) it's infeasible for the wifi chipset to log all its internal state and whatnot, and (c) even if this was all possible trying to debug the driver just from looking at gigabytes of low level protocol dumps would be impossible.

Realistically for the OEM to debug the issue they're going to need a way to reliably repro which we don't have for them, so we're kind of stuck.

This type of problem generalizes to the development of drivers and firmware for many complex pieces of modern hardware.

replies(1): >>45072223 #

26. toast0 ◴[29 Aug 25 23:14 UTC] No.45070466[source]▶

>>45069287 #

To clarify, are you having trouble getting the signal to reboot from the gcp console into your OS? Or are you having trouble rebooting on gcp?

replies(1): >>45075193 #

27. markus_zhang ◴[29 Aug 25 23:29 UTC] No.45070553{4}[source]▶

>>45070420 #

Thanks. I got all the resources covered. But I don’t have the energy to work on them as a side project any more. Alas! I wasted my younger days and hope you fare better!

replies(1): >>45073294 #

28. the-rc ◴[29 Aug 25 23:55 UTC] No.45070705[source]▶

>>45070345 #

On the other hand, see the complete mess that are the IPU6/7 camera chipsets and their Linux support.

replies(1): >>45072771 #

29. MathMonkeyMan ◴[30 Aug 25 00:55 UTC] No.45071005{3}[source]▶

>>45070341 #

I write documentation as I'm writing the code. In my opinion, the code is only as good as its documentation -- they're two parts of the same thing. It's mostly comments at the top of files, and sometimes a markdown file in the same directory.

This way, good documentation is priced into my estimate for the project. I don't have a work item "spend a few days documenting." Nope, if I'm doing a foo then that includes documenting a foo at the same time.

replies(3): >>45071326 #>>45072618 #>>45073289 #

30. andreww591 ◴[30 Aug 25 01:03 UTC] No.45071036[source]▶

>>45067216 (TP) #

At least for certain types of OSes, it should be relatively easy to get most of Linux's hardware support by porting LKL (https://github.com/lkl/linux) and adding appropriate hooks to access hardware.

Of course, your custom kernel will still have to have some of its own code to support core platform/chipset devices, but LKL should pretty much cover just about all I/O devices (and you also get stuff like disk filesystems and a network stack along with the device drivers).

Also, it probably wouldn't work so well for typical monolithic kernels, but it should work decently on something that has user-mode driver support.

replies(1): >>45071106 #

31. dedup-com ◴[30 Aug 25 01:03 UTC] No.45071037[source]▶

>>45069282 #

XROS had a completely new and rapidly evolving system call surface. No vendor would've been able to even start working on a driver for their device, let alone hand off a stable, complete result. It wasn't a case of "just rename a few symbols in a FreeBSD implementation and run a bunch of tests".

32. leoc ◴[30 Aug 25 01:14 UTC] No.45071086[source]▶

>>45067216 (TP) #

My hunch is that for nearly anyone who is serious about it these days, the way forward is either to have unusually tight control over the underlying platform, or to include a servant Linux installation with your OS. If Windows is a buggy set of device drivers, then Linux is a free set of buggy device drivers. If you're happy with your OS running as a client of a Linux hypervisor indefinitely then you could go for that; otherwise you'd have to try to gradually move bits of the hardware support into your OS over time—ideally faster than new Linux dependencies arise...

33. foxglacier ◴[30 Aug 25 01:15 UTC] No.45071090{6}[source]▶

>>45070352 #

> Tell the hardware where to send input, ...

I agree you need a driver but for most hardware, that should be pretty simple, and easily documented by the hardware vendor, shouldn't it? A button has to be about the simplest possible I/O device imaginable.

replies(1): >>45071200 #

34. snickerbockers ◴[30 Aug 25 01:18 UTC] No.45071106[source]▶

>>45071036 #

>but LKL should pretty much cover just about all I/O devices (and you also get stuff like disk filesystems and a network stack along with the device drivers).

thus calling into question why you ever bothered writing a new kernel in the first place if you were just going to piggyback Linux's device drivers onto some userspace wrapper thingy.

Im not necessarily indoctrinated to the point where I can't conceive of Linux being suboptimal in a way which is so fundamental that it requires no less than a completely new OS from scratch but you're never going to get there off of recycling linux's device drivers because that forces you to design your new OS as a linux clone in which cade you definitely did not need to write an entire new kernel from scratch.

replies(4): >>45071257 #>>45072209 #>>45072630 #>>45073243 #

35. toast0 ◴[30 Aug 25 01:41 UTC] No.45071200{7}[source]▶

>>45071090 #

Yeah, problem is it's likely an acpi button, which ties you into all the fun of that.

Of course, ACPI is supposed to make interfacing with lots of similar things easier, kind of, so there you go.

36. andrekandre ◴[30 Aug 25 01:52 UTC] No.45071257{3}[source]▶

>>45071106 #

  > you're never going to get there off of recycling linux's device drivers because that forces you to design your new OS as a linux clone in which cade you definitely did not need to write an entire new kernel from scratch.

thats in interesting point, and makes me wonder if some kind of open interface for drivers to write to (and os's could implement) wouldn't be worthwhile?

probably it would have to be very general in design, but something along the lines of driverkit or iokit might work?

replies(1): >>45072167 #

37. branko_d ◴[30 Aug 25 02:09 UTC] No.45071326{4}[source]▶

>>45071005 #

In my experience, coding is much faster when doing it this way.

Yes, you can produce a small amount of code faster if you don’t “waste” your time on documentation, but that becomes counterproductive as soon as you can no longer keep the entire codebase in your head.

38. wtallis ◴[30 Aug 25 02:18 UTC] No.45071380[source]▶

>>45070345 #

The NVMe spec is freely downloadable and sufficient to write a driver with, if your OS already has PCIe support (which doesn't have open specifications). You don't need any vendor-specific features for ordinary everyday use, so it's a bit of a different situation from NICs. (Also, NVMe was in very large part an Intel creation, though it's maintained by an industry consortium.)

39. pjmlp ◴[30 Aug 25 05:46 UTC] No.45072167{4}[source]▶

>>45071257 #

That is how all OSes with binary drivers kind of work.

However it goes into the same direction of the previous commenter, device drivers are intertwined with the OS semantics, even on microkernels, so eventually it ends being just something like POSIX.

40. eru ◴[30 Aug 25 05:50 UTC] No.45072184{5}[source]▶

>>45070231 #

> The biggest scam in the OS world is drivers, we should demand more out of our hardware. Drivers shouldn't be necessary.

What do you mean by that?

replies(1): >>45075827 #

41. throwaway2037 ◴[30 Aug 25 05:53 UTC] No.45072199[source]▶

>>45070345 #

Wow... that PDF is 2,750 pages! There must be an army of technical writers behind it. That is an incredible technical achievement.

Real question: Why do you think Intel does this? Does it guarantee a very strong foothold into data center NICs? I am sure competitors would argue two different angles: (1) this PDF shares too much info; some should be hidden behind an NDA, (2) it's too hard to write (and maintain) this PDF.

replies(9): >>45072325 #>>45072561 #>>45072665 #>>45073187 #>>45073195 #>>45073339 #>>45075185 #>>45076049 #>>45076880 #

42. eru ◴[30 Aug 25 05:54 UTC] No.45072209{3}[source]▶

>>45071106 #

You make a good argument, but let me take the other side:

What you describe is probably necessary for getting _fast_ Linux compatibility. However, if you are willing to take the overhead of a few layers of indirection, you can probably sandbox the Linux land somewhere, and not have it impact the rest of your design much.

Most hardware access doesn't have to be particularly efficient. And, yes, for the few pieces of hardware that you do want to support efficiently (eg your storage devices or networking, whatever you want to concentrate on in your design) these you can handle natively.

Btw, I would suggest that most people these days should write their toy operating systems to run as a VM on a hypervisor like Xen or similar. The surface to the outside world is smaller that way.

43. throwaway2037 ◴[30 Aug 25 05:55 UTC] No.45072212{3}[source]▶

>>45070341 #

    > Look at Apple's documentation

To clarify for me: Is this good or bad?

replies(1): >>45072300 #

44. eru ◴[30 Aug 25 05:56 UTC] No.45072217{4}[source]▶

>>45069369 #

> I don't like Meta, but there used to be a time where big corp used to spend 30% of its budget on R&D. It's how we got all the toys we have now, R&D labs of big Bell and others.

Just because you spend a lot of your money on R&D, doesn't mean that each R&D project is automatically a good one. You still have to make choices between them.

45. throwaway2037 ◴[30 Aug 25 05:57 UTC] No.45072223{3}[source]▶

>>45070444 #

    > custom Wifi board

Why didn't you use something more mainstream? Cost?

replies(1): >>45072441 #

46. eru ◴[30 Aug 25 05:57 UTC] No.45072225{6}[source]▶

>>45069825 #

Eh, the very first transistor they invented was pretty crappy and not all that useful.

Every improvement after that would count as micro-invention in your dichotomy.

47. deadbabe ◴[30 Aug 25 06:04 UTC] No.45072259[source]▶

>>45067216 (TP) #

Wouldn’t LLMs make it way easier

replies(3): >>45072271 #>>45072534 #>>45073116 #

48. mrbungie ◴[30 Aug 25 06:07 UTC] No.45072271[source]▶

>>45072259 #

Only if you are an expert who wants to use time debugging LLM code rather than coding it yourself.

PS: Half-joking, you can write some big portions with LLMs but the point stands.

49. simonw ◴[30 Aug 25 06:12 UTC] No.45072300{4}[source]▶

>>45072212 #

It's bad. Apple's documentation is notoriously weak, despite them being one of the most well-resourced companies in the world.

replies(1): >>45073279 #

50. pjjpo ◴[30 Aug 25 06:17 UTC] No.45072325{3}[source]▶

>>45072199 #

In terms of (2), I wonder if it's even possible to write a driver without such a document. In the end, the vendor is on the foot for the driver for major platforms (let's assume Linux) - if they can write a Linux driver without a similar spec to this doc, then the doc probably doesn't need to exist since the business wins from hobbyist drivers will be low. If they can't though, then it's just a matter of formatting an internal document for public consumption - the doc itself has to be maintained anyways so the cost seems lower and maybe reasonable. I have a feeling the doc is necessary but I am not specialized in the field.

Assumptions, fair or not, about (1) seems more likely somehow.

replies(1): >>45072752 #

51. Joker_vD ◴[30 Aug 25 06:35 UTC] No.45072391[source]▶

>>45067216 (TP) #

> Modern hardware is just so complicated it would take a long time to completely document, and even longer to write a driver for.

You know, one'd think that having a complex hardware should make writing a driver easier because the hardware is able to take care of itself just fine, and provide a reasonable interface, as opposed to devices of the yore which you had to babysit, wasting your main CPU's time, and doing silly stuff like sending them two identical initialization commands with 30 to 50 microseconds delay between or whatever.

replies(1): >>45072424 #

52. IshKebab ◴[30 Aug 25 06:43 UTC] No.45072424[source]▶

>>45072391 #

No, the complexity usually isn't hidden. It's the driver's job to do that.

I guess one exception maybe is Nvidia who have sort of hidden the complexity by moving most driver functionality onto software on the card. At least that's how I understood it. Don't quote me on that.

replies(3): >>45072457 #>>45073285 #>>45090707 #

53. typpilol ◴[30 Aug 25 06:47 UTC] No.45072441{4}[source]▶

>>45072223 #

Probably some weird design spec or size requirement

54. Joker_vD ◴[30 Aug 25 06:49 UTC] No.45072457{3}[source]▶

>>45072424 #

> No, the complexity usually isn't hidden. It's the driver's job to do that.

Why not, though? We used to have e.g. glass teletypes with microprocessors (8080/8051) in them that exposed a serial bus with very neat command protocol that we still use nowadays, that could boot up, init and self-test all on their own.

55. underdeserver ◴[30 Aug 25 07:05 UTC] No.45072534[source]▶

>>45072259 #

I think this is one area where LLMS would be particularly bad at. Opaque code with no documentation across the field.

replies(1): >>45072884 #

56. PeterStuer ◴[30 Aug 25 07:12 UTC] No.45072560{3}[source]▶

>>45070341 #

With documentation one of the major hurdles is the maintainance. It is caring for a set of documents, created by people with different specializations, that describe the artefact from specific perspectives, but need to be kept in sync with the active creation and evolution of the artefact itself.

This is not impossible, but the effort and costs required are substantial and often lose out on a priority basis to just fixing or improving the product itself.

57. bhawks ◴[30 Aug 25 07:12 UTC] No.45072561{3}[source]▶

>>45072199 #

Id wager high frequency trading applications.

58. PeterStuer ◴[30 Aug 25 07:23 UTC] No.45072618{4}[source]▶

>>45071005 #

Documenting what is there usually is not the hard part (and AI is getting pretty good at that part btw).

Documenting how to use or interact with it in a specific context, which often includes perspectives on interactions with other components, or e.g. protocols not explicit in the code, deciding where to draw the lines of what can assumed trivial common knowledge and what should be specified or explicitly not specified without notices not to rely on these etc. , that is a different thing.

If it wasn't, then truly as they used to say, the source code would be it's own best documentation (I am a big fan of programming for readability, but even the best readable code, while it will be correct and up to date, will never be enough nor the best for all)

59. PeterStuer ◴[30 Aug 25 07:25 UTC] No.45072630{3}[source]▶

>>45071106 #

Is this the old 'an OS is just a bag of buggy device drivers' argument?

60. awjlogan ◴[30 Aug 25 07:32 UTC] No.45072665{3}[source]▶

>>45072199 #

This is a pretty standard document length. Modern microcontrollers have similar lengths (e.g. ATSAMD51 is ~2000 pages). Some of it is not software related, things like pin outs and electrical and mechanical descriptions.

It does take a huge amount of work to write and maintain. Typically the authors are not technical, so it also relies on the designers being available to answer questions as well. Then there’s a choice of how it’s written: narrative and potentially imprecise but readable, or terse and precise but hard to read. There’s both styles in the same document, terse for register descriptions.

61. ch33zer ◴[30 Aug 25 07:51 UTC] No.45072752{4}[source]▶

>>45072325 #

Didn't all the asahi Linux Mac m1 drivers essentially get reverse engineered with little to no support from apple and no public docs? If I'm remembering correctly then I guess it's possible with enough effort and reverse engineering skills

replies(2): >>45073073 #>>45073987 #

62. XorNot ◴[30 Aug 25 07:54 UTC] No.45072771{3}[source]▶

>>45070705 #

Good christ this is my current work laptop. It...mostly doesn't work. Plug in a USB camera and it'll just go. Several drivers, userspace utilities and other daemons and sometimes gstreamer works, but does Zoom work? Who knows!

63. deadbabe ◴[30 Aug 25 08:16 UTC] No.45072884{3}[source]▶

>>45072534 #

Incredible job security

64. nicce ◴[30 Aug 25 08:53 UTC] No.45073073{5}[source]▶

>>45072752 #

But it took 5 years. And since the first model, there are many others. It is huge work.

65. baq ◴[30 Aug 25 08:53 UTC] No.45073077{5}[source]▶

>>45070231 #

Hardware is so broken that getting useful functionality basically amounts to casting magic spells and drivers are supposed to be master wizards who know all the points where the spell book is wrong or incomplete. If you think drivers are bad, don’t look at the hardware, you’ll get depressed.

replies(1): >>45075820 #

66. baq ◴[30 Aug 25 08:58 UTC] No.45073107[source]▶

>>45069282 #

Things you can’t buy: vendor who cares enough to replicate your exact use cases in their lab

67. baq ◴[30 Aug 25 09:01 UTC] No.45073116[source]▶

>>45072259 #

LLMs trust the docs. This is a rookie mistake in driver development, especially on prerelease hardware

68. lelanthran ◴[30 Aug 25 09:15 UTC] No.45073187{3}[source]▶

>>45072199 #

For datasheets that's normal. Might even be leaning towards smaller than average for the device in question.

For comparison, a data sheet for a single transistor can be around 12 to 30 pages. A data sheet for a tiny microcontroller is probably a few hundred pages.

I once wrote a driver for a flash chip and that had a data sheet of around 80 pages.

69. miki123211 ◴[30 Aug 25 09:17 UTC] No.45073195{3}[source]▶

>>45072199 #

Probably CPU vendor culture? I forgot how large Intel's manual set is, but ARM's was ~11k pages the last time I checked. Intel's was smaller, but not that much smaller, certainly within an order of magnitude.

70. lelanthran ◴[30 Aug 25 09:28 UTC] No.45073243{3}[source]▶

>>45071106 #

If you're going this route, I have found netBSD a better option for this sort of thing.

It has a rump kernel architecture which makes reusing the drivers almost trivial compared to reusing linus drivers with a new kernel.

71. saagarjha ◴[30 Aug 25 09:35 UTC] No.45073279{5}[source]▶

>>45072300 #

Don't worry they're writing documentation now for AI agents

replies(1): >>45073867 #

72. saagarjha ◴[30 Aug 25 09:36 UTC] No.45073285{3}[source]▶

>>45072424 #

Yes, and then you get odd behavior you can't introspect because the card is a black box to you.

73. makeitdouble ◴[30 Aug 25 09:37 UTC] No.45073289{4}[source]▶

>>45071005 #

> the code is only as good as its documentation

This heavily depends on your niche I think. If you're writing closed source vendor software and your client's only guiding light is your documentation, it's 100% true.

If you're working on a 5 people project that evolves at a fast pace, and everyone touching the code is expected to be familiar with the domain and operations, you'll mostly leave comments (todos, meta info, external ticket links etc), not documentation per se.

replies(1): >>45076157 #

74. saagarjha ◴[30 Aug 25 09:38 UTC] No.45073294{5}[source]▶

>>45070553 #

Unfortunately, you are unlikely to be able to jump into just being paid to write an OS with no experience.

75. WalterBright ◴[30 Aug 25 09:48 UTC] No.45073339{3}[source]▶

>>45072199 #

I may be the only person who ever understood every detail of C++, starting with the preprocessor. I can make that claim because I'm the only person who ever implemented all of it. (You cannot really know a language until you've implemented it.) I gave up on that in the 2000's. Modern C++ is simply terrifying in its complexity.

(I'm not including the C++ Standard Library, as I didn't implement it.)

replies(4): >>45073884 #>>45076201 #>>45077194 #>>45083146 #

76. WalterBright ◴[30 Aug 25 09:50 UTC] No.45073345[source]▶

>>45069349 #

I find myself largely unable to document code as I write it. It all seems obvious at the time. It's when I go back to it later, and I re-figure it out, that the documentation then can be written.

77. mastermage ◴[30 Aug 25 10:04 UTC] No.45073396{6}[source]▶

>>45069825 #

In my opinion the difference is rather invention versus innovation. A better VR headset is innovation, transistors are an invention.

78. mastermage ◴[30 Aug 25 10:07 UTC] No.45073406{3}[source]▶

>>45069918 #

Isn't that what low level does on his YouTube channel teach people to reverse engineer stuff?

replies(1): >>45078782 #

79. lstodd ◴[30 Aug 25 11:35 UTC] No.45073789[source]▶

>>45067216 (TP) #

heh, in mid-2000s all I had were a batch of misbehaving SATA controllers under freebsd, and an (actually quite well-written core of a) linux driver was all I had to work with.

Without that, we would have probably just switched hw, because the quite obscure bug was in the ASIC, and debugging that on 2005-6-ish hw is just infeasible.

80. A4ET8a8uTh0_v2 ◴[30 Aug 25 11:54 UTC] No.45073867{6}[source]▶

>>45073279 #

Honestly, this is by far the most amusing side effect of AI thus far -- management demanding better documentation to help AI digest it.

81. WalterBright ◴[30 Aug 25 11:58 UTC] No.45073884{4}[source]▶

>>45073339 #

P.S. we're adding an "Editions" feature to D so we can simplify the language by removing obsolete and deadend features. We didn't get everything right, and want to fix it!

replies(1): >>45074864 #

82. stefan_ ◴[30 Aug 25 12:18 UTC] No.45073987{5}[source]▶

>>45072752 #

It was reverse engineered from a driver. With no driver and purely some PCIE device registers mapped into memory you might as well be trying to guess lottery numbers.

replies(1): >>45074953 #

83. metaltyphoon ◴[30 Aug 25 14:14 UTC] No.45074864{5}[source]▶

>>45073884 #

This is one thing Rust did it right and I hope more languages adopt this.

84. ch33zer ◴[30 Aug 25 14:26 UTC] No.45074953{6}[source]▶

>>45073987 #

I guess the driver was the one that runs on Mac that they were able to refer to? Not sure you have any links to blog posts about this process it sounds so cool

replies(1): >>45075548 #

85. jovial_cavalier ◴[30 Aug 25 14:54 UTC] No.45075185{3}[source]▶

>>45072199 #

Look up the Texas Instruments am3358. It's a tiny SOC, it was used in the beaglebone black. Its technical reference manual[1] is over 5000 pages, and it details all peripherals, all of the interconnects and every single register in the system. This, by contrast, is really just an overview.

Regards to (1), if you don't publish this information you're not selling a CPU, you're selling a very expensive chunk of sand. There is simply no way that a customer can guess at what your implementation looks like. Additionally, Intel barely has IP in the traditional sense. They hold patents, but their only real competitor in making x86 processors, AMD, has a long-standing mutual non-enforcement agreement wrt patents.

Regards to (2), I'm guessing a majority of this PDF can be generated sort of like you generate API documentation from doxygen comments.

[1]: https://www.ti.com/lit/ug/spruh73q/spruh73q.pdf?ts=175651560...

replies(2): >>45077557 #>>45081798 #

86. mbac32768 ◴[30 Aug 25 14:55 UTC] No.45075193{3}[source]▶

>>45070466 #

I mean when the hobby OS wants to shut down, it can power the machine it's running on down. Not unlike what would happen if you clicked power off on your desktop OS menu.

Getting it to work on GCP meant properly driving something called the Intel PIIX4 controller which was emulated into the VM.

Separately from the OS being able to turn itself off, the OS needs to process a signal received by the hypervisor on this controller to support the hypervisor gracefully shutting it down. Otherwise GCP will wait 90 seconds after it has sent the shut down signal to give up and terminate the VM itself.

The problem I was trying to solve was (a) OS can shut itself down in GCP (b) restarts in GCP from the GCP console would be instant, rather than take 90+ seconds

87. mbac32768 ◴[30 Aug 25 15:04 UTC] No.45075278{5}[source]▶

>>45070231 #

I misremembered (since it was 4 years ago).

I was actually just trying to support "power off" in GCP, with the stretch goal of being able to support graceful power off from the GCP console (which is part of supporting power off then power back on restart).

88. anikom15 ◴[30 Aug 25 15:29 UTC] No.45075476[source]▶

>>45067216 (TP) #

It’s entirely laziness.

replies(1): >>45075777 #

89. morganw ◴[30 Aug 25 15:38 UTC] No.45075548{7}[source]▶

>>45074953 #

From https://news.ycombinator.com/item?id=39385382 Here's Hector Martin (marcan) talking about their tooling: https://asahilinux.org/2021/08/progress-report-august-2021/

Running macOS in a hypervisor controlled from a second machine (proxy mode + verbose==on) to watch drivers talk to the hardware:

https://asahilinux.org/docs/sw/m1n1-user-guide/#running-a-ma...

replies(1): >>45082055 #

90. ◴[30 Aug 25 16:10 UTC] No.45075777[source]▶

>>45075476 #

91. sitkack ◴[30 Aug 25 16:16 UTC] No.45075820{6}[source]▶

>>45073077 #

This is fundamentally the problem. Just like being able to send OTA updates has enshittified all software, having this magic shim layer that fixes hardware problems has enabled shit hardware, and then foisted all this complexity into the OS. Many abstractions are like bondo, they just cover rot.

I am addressing your comment and eru's question about drivers.

The hardware that would normally need drivers should present itself over a fixed, well documented protocol. Think virtio, or usb device classes but more comprehensive. This would also allow for said hardware to rigorously tested before it ever sees an OS. As it is now, because the hardware is shit and requires a driver, you can't really test the hardware in a way that an OS would expect because it requires the OS driver to even start to function. The job of the OS is now to repair broken hardware.

https://docs.oasis-open.org/virtio/virtio/v1.3/virtio-v1.3.h...

https://wiki.osdev.org/Virtio

https://en.wikipedia.org/wiki/USB_communications_device_clas... (the only good thing to come out of usb)

replies(1): >>45081874 #

92. sitkack ◴[30 Aug 25 16:17 UTC] No.45075827{6}[source]▶

>>45072184 #

Responded https://news.ycombinator.com/item?id=45075820

93. ◴[30 Aug 25 16:48 UTC] No.45076049{3}[source]▶

>>45072199 #

94. MathMonkeyMan ◴[30 Aug 25 17:01 UTC] No.45076157{5}[source]▶

>>45073289 #

Everywhere I've worked, there is software written by those five people, who are now all rich and don't write code anymore, and I still would appreciate the courtesy of an explainer.

The trouble with good docs is that they are work to maintain, like good code. If we decide to change this component in a substantial way soon, which we likely will, I'd have to practically rewrite the docs! Why bother?

Because the docs are part of the code. Write the docs.

I don't expect to win this crusade, but I'll keep writing docs anyway. Then later people will modify the code without modifying the docs, and so the docs will be a lie, but still useful, I think.

It's like asking three people who are not closely familiar with a component, but who have worked with it, "what is this thing, how does it work?" You will get three different answers. It would be nice if one of them were a written description straight from the horse's mouth, even if the component is now more of a camel.

95. cornstalks ◴[30 Aug 25 17:05 UTC] No.45076201{4}[source]▶

>>45073339 #

You've done some great work but I have to call BS with this claim:

> I'm the only person who ever implemented all of it.

Sean Baxter is an easy counter example.

replies(1): >>45076767 #

96. WalterBright ◴[30 Aug 25 18:05 UTC] No.45076767{5}[source]▶

>>45076201 #

I don't know much of anything about him. Did he implement the preprocessor? the optimizer? the code generator?

(For some context, back in the 80's, code generators needed enhancements to implement C++. You couldn't just use an existing one. Bjarne had to do some ugly workarounds because of this.)

replies(1): >>45077753 #

97. theideaofcoffee ◴[30 Aug 25 18:08 UTC] No.45076796[source]▶

>>45070345 #

That's interesting that it's that short. I remember a long while ago I had aspirations of implementing a custom board for Prestonia-/Gallatin-era Xeons and the datasheets and specs for those was around 3000 pages, iirc. Supporting infra was about that long as well. So I'm surprised to see a modern ethernet controller fit into the same space. I appreciated all of the docs because it was so open, I felt like I could actually achieve that project, but other things took priority.

98. mrandish ◴[30 Aug 25 18:22 UTC] No.45076880{3}[source]▶

>>45072199 #

> Real question: Why do you think Intel does this?

I'm not sure large traditional silicon vendors like Intel, TI, et al re-evaluate the documentation requirements (and costs) on a chip by chip basis. It's probably done by chip class and for companies who've been selling chips by the millions over many decades to industries as diverse as defense, aerospace, automotive, etc there are classes of chips where robust, complete documentation is not only expected but often a required part of the RFP, compliance or conformance processes.

While this level of effort probably isn't needed for every chip in that class, it could be hard to reliably predict when a general purpose chip is still in the design phase which customers may be interested in it during its life (which for some of these chips might be decades). Many chips which conform to MIL-SPEC or other similar standards which can require extensive documentation are simply enhanced versions of standard chips, so the docs exist anyway. Finally, there's the organizational capabilities and culture aspect. Once the org needs to maintain the systemic ability to generate serious documentation at scale, you end up with a lot of managers and staff who think this way.

99. Conscat ◴[30 Aug 25 19:11 UTC] No.45077194{4}[source]▶

>>45073339 #

Sean Baxter single-handedly implemented all of up to C++23, and some C++26, including a huge number of GNU extensions and possibly an even larger number of his own features.

replies(1): >>45078216 #

100. GeorgeTirebiter ◴[30 Aug 25 20:01 UTC] No.45077557{4}[source]▶

>>45075185 #

I worked on a similar TI SoC -- with War-and-Peace-sized datasheet. My eyes burned out and brain exploded. Ultimately, another engineer had to take over the project -- or rather TEAM of engineers, of which I did only a part. It's simply to much complexity to expect one engineer to grok it all, do the schematic & PCB & power supply & hi-speed MIPI connections and radios and... and THEN to write the software for it all. It's too much. (This is the Life one gets in Startups, it seems -- worked to the (beagle)bone!)

101. tux3 ◴[30 Aug 25 20:29 UTC] No.45077753{6}[source]▶

>>45076767 #

Sean Baxter's circle compiler uses LLVM as a backend, but I believe the rest is from scratch.

Arguably these days having a clear frontend/backend separation is good compiler architecture. It might slow down compile times a bit, but it's worth the cost.

replies(2): >>45077952 #>>45078541 #

102. WalterBright ◴[30 Aug 25 20:58 UTC] No.45077952{7}[source]▶

>>45077753 #

It wouldn't have made much sense to write the preprocessor these days, too, but it is part of the C++ compiler. Unless integrating it with the C++ lexer for speed purposes, as I did.

103. WalterBright ◴[30 Aug 25 21:39 UTC] No.45078216{5}[source]▶

>>45077194 #

Impressive!

104. vkazanov ◴[30 Aug 25 22:28 UTC] No.45078541{7}[source]▶

>>45077753 #

So it sounds like he wrote the frontend of a cpp compiler? There's a lot of work in other layers as well.

105. spauldo ◴[30 Aug 25 23:04 UTC] No.45078782{4}[source]▶

>>45073406 #

Mostly he seems to try to nail all the items on the "how to grow your YouTube channel" checklists. Clickbait sensationalist titles, exaggerated face on the thumbnail, etc. I have a rule that I don't click on any video where I can't tell the subject by the title, and that eliminated him from my watchlist a while ago.

I wish the guy luck on becoming a superstar influencer, since that seems to be his goal, but he can do it without my help.

106. rcxdude ◴[31 Aug 25 09:28 UTC] No.45081798{4}[source]▶

>>45075185 #

having used the AM3358 extensively, the TRM is not complete. There are some pretty important and complex systems that have literally no documentation at all in the TRM, not to mention the large number of quirks and small details that you can only pick up from a scattering of other areas (including a wiki that TI deleted some years ago). It is, however, miles better than the documentation available for most SOCs.

replies(1): >>45085376 #

107. rcxdude ◴[31 Aug 25 09:46 UTC] No.45081874{7}[source]▶

>>45075820 #

The problem with your wish is that's you're kinda getting it in some cases, and it's a bit of a monkey paw. Hardware vendors are increasingly creating systems that abstract away the underlying hardware from the OS (usually by writing their own software on some other core that really drives the hardware), but the problem is they're generally closed, buggy, and leaky, and so the OS stops really being the OS of the system and instead you have a collection of barely-related subsystems that it's really difficult to get to work together effectively, and way more security holes than you can shake a stick at. (Oh, and they're usually only ever tested against one particular OS and so they're not actually particularly portable)

108. lelele ◴[31 Aug 25 10:01 UTC] No.45081942[source]▶

>>45067216 (TP) #

>> These days, you get a medium-level description and a Linux driver of questionable quality.

Then how do devices end up up having drivers for major OSes? It's all guesswork?

109. ch33zer ◴[31 Aug 25 10:27 UTC] No.45082055{8}[source]▶

>>45075548 #

Thanks for taking the time to round up these posts for me!

110. ezoe ◴[31 Aug 25 13:50 UTC] No.45083146{4}[source]▶

>>45073339 #

Now, who can claim such bold... oh, it's YOU.

111. jovial_cavalier ◴[31 Aug 25 18:04 UTC] No.45085376{5}[source]▶

>>45081798 #

Agreed. The actual technical writing portion of the TRM leaves much to be desired.

112. account42 ◴[01 Sep 25 08:23 UTC] No.45090707{3}[source]▶

>>45072424 #

In some cases it is hidden though. USB has a lot of "generic" device classes where (at least in theory) the OS only needs to deal with a standardized interface and whatever actual hardware is behind it is driven by an embedded controller.

It's just that most internal hardware initially only cares only about Windows (or only Linux) so it makes more financial sense to develop a more complex driver than a complex firmware. The equation might change later on but by then you are stuck with the hardware design.

↑