←back to thread

158 points kenjackson | 3 comments | | HN request time: 0.626s | source
Show context
roblabla ◴[] No.41031699[source]
This is some very poor journalism. The linux issues are so, so very different from the windows BSOD issue.

The redhat kernel panics were caused by a bug in the kernel ebpf implementation, likely a regression introduced by a rhel-specific patch. Blaming crowdstrike for this is stupid (just like blaming microsoft for the crowdstrike bsod is stupid).

For background, I also work on a product using eBPFs, and had kernel updates cause kernel panics in my eBPF probes.

In my case, the panic happened because the kernel decided to change an LSM hook interface, adding a new argument in front of the others. When the probe gets loaded, the kernel doesn’t typecheck the arguments, and so doesn’t realise the probe isn’t compatible with the new kernel. When the probe runs, shit happens and you end up with a kernel panic.

eBPF probes causing kernel panics are almost always indication of a kernel bug, not a bug in the ebpf vendor. There are exceptions of course (such as an ebpf denying access to a resource causing pid1 to crash). But they’re very few.

replies(4): >>41031896 #>>41032164 #>>41032610 #>>41034621 #
josefx ◴[] No.41031896[source]
> likely a regression introduced by a rhel-specific patch. Blaming crowdstrike for this is stupid (just like blaming microsoft for the crowdstrike bsod is stupid).

Yeah, it isn't as if crowdstrike was specifically advertising certified support for RedHat Linux and related products.

https://www.crowdstrike.com/partners/falcon-for-red-hat/

replies(2): >>41032039 #>>41032241 #
roblabla ◴[] No.41032241[source]
Yes, and? They probably do test their software on RHEL.

But how are they supposed to prevent a bug in a newly released kernel update? You can't test your software on future updates that aren't out yet.

If RHEL breaks some core functionality you depend on, in a newly released update, you can't really do much to prevent breakage, even with the best QA in the world. At best, they could have caught it as soon as RHEL published the new kernel... but by then it's already too late, all your currently-deployed probes now have a ticking time bomb, and need to be updated before the RHEL kernel update is applied, lest you kernel panic.

replies(1): >>41032327 #
hsbauauvhabzb ◴[] No.41032327[source]
Maybe by not loading the module into unknown kernels in the first place?

If say you support a distro, you can’t turn around and complain that supporting the newest version is hard, no matter who caused the problem. Plenty of products say ‘this works on $x but it’s not officially supported’.

replies(3): >>41032799 #>>41033155 #>>41034502 #
1. roblabla ◴[] No.41033155[source]
Again: this is not a kernel module. eBPF probes are meant to be Compile Once, Run Everywhere, that's their whole point! https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf...

If you expect software to be future-bug-proof, well, I guess you live in a far better world than I do.

If you advertise your software to be compatible with RHEL, but a glibc bug gets in and causes your sw to crash for a couple of days, before RHEL realises the problem and fixes it, does that mean your software should instantly no longer be advertised as RHEL compatible? That'd make things a lot more confusing, if you ask me.

replies(2): >>41033633 #>>41039055 #
2. linuxftw ◴[] No.41033633[source]
Not all eBPF programs are compile once, run everywhere.

RHEL 'updates' can mean different things. A patch release won't change kernel ABI. A minor release will. Writing a non-CORE eBPF program for, say RHEL 8.6, might break on RHEL 8.7. It's not advisable to update across minor releases without lots of testing. Most of the time, things 'just work' but RHEL is a very complex product with a specific support cycle, and laziness of users and 3rd party vendors is not their fault.

3. hsbauauvhabzb ◴[] No.41039055[source]
That doesn’t really change my point - if stability issues are known to occur in a dependency, you can’t say you support that system.