←back to thread

158 points kenjackson | 2 comments | | HN request time: 0.618s | source
Show context
roblabla ◴[] No.41031699[source]
This is some very poor journalism. The linux issues are so, so very different from the windows BSOD issue.

The redhat kernel panics were caused by a bug in the kernel ebpf implementation, likely a regression introduced by a rhel-specific patch. Blaming crowdstrike for this is stupid (just like blaming microsoft for the crowdstrike bsod is stupid).

For background, I also work on a product using eBPFs, and had kernel updates cause kernel panics in my eBPF probes.

In my case, the panic happened because the kernel decided to change an LSM hook interface, adding a new argument in front of the others. When the probe gets loaded, the kernel doesn’t typecheck the arguments, and so doesn’t realise the probe isn’t compatible with the new kernel. When the probe runs, shit happens and you end up with a kernel panic.

eBPF probes causing kernel panics are almost always indication of a kernel bug, not a bug in the ebpf vendor. There are exceptions of course (such as an ebpf denying access to a resource causing pid1 to crash). But they’re very few.

replies(4): >>41031896 #>>41032164 #>>41032610 #>>41034621 #
josefx ◴[] No.41031896[source]
> likely a regression introduced by a rhel-specific patch. Blaming crowdstrike for this is stupid (just like blaming microsoft for the crowdstrike bsod is stupid).

Yeah, it isn't as if crowdstrike was specifically advertising certified support for RedHat Linux and related products.

https://www.crowdstrike.com/partners/falcon-for-red-hat/

replies(2): >>41032039 #>>41032241 #
dtx1 ◴[] No.41032039[source]
But being certified for RedHat Linux doesn't protect you from Bugs in the RedHat Kernel. That's on RedHat.
replies(1): >>41032169 #
michaelt ◴[] No.41032169[source]
Back in The Good Old Days, an OS vendor would release a beta version and software vendors would test against it and fix problems before the stable OS version was released.

Obviously OS updates come out a lot more often these days than they used to - but we're also better at test automation than ever before, and beta software is easier to get than ever.

It sure would be nice if companies that decide to produce kernel modules and to support certain OSes could test those kernel modules against those OSes at the beta stage.

replies(1): >>41032257 #
roblabla ◴[] No.41032257[source]
1.An eBPF probe is not a kernel module. An eBPF probe should never cause kernel panics.

2. RHEL didn't provide beta kernels before very recently, as far as I can tell.

3. Even if you caught an error then, you're still at the mercy of RHEL to fix it. If RHEL breaks a feature, you report it to them, and they decide to ship anyways... well, your product will still kpanic. I'm not talking hypotheticals: I haven't seen RHEL do that, but I've seen other distros do it.

replies(3): >>41032955 #>>41034458 #>>41043172 #
1. freedomben ◴[] No.41034458[source]
Emphasis added:

> An eBPF probe should never cause kernel panics.

Should, but did. This is the point at which to learn and adapt.

Also, kernels are software just like nearly everything else, and software is buggy. It's a balance obviously, but some basic defensive development can be a real savior for your users.

I don't know the details about this CrowdStrike incident, but I would also be surprised if you couldn't write an automated test (even a "smoke test") to quickly test out these new kernels before they hit your customers. Given what happened, it seems like negligence not to do that.

replies(1): >>41035656 #
2. roblabla ◴[] No.41035656[source]
It's possible CS can do better, of course. But it's just wrong to blame them for the Linux crashes - they're not the ones that introduced buggy code and broke their users. RHEL/Linux did.