Most active commenters
  • roblabla(13)
  • _flux(6)
  • mbreese(4)
  • hello_moto(4)
  • CRConrad(4)
  • dathinab(3)

←back to thread

158 points kenjackson | 68 comments | | HN request time: 3.138s | source | bottom
1. roblabla ◴[] No.41031699[source]
This is some very poor journalism. The linux issues are so, so very different from the windows BSOD issue.

The redhat kernel panics were caused by a bug in the kernel ebpf implementation, likely a regression introduced by a rhel-specific patch. Blaming crowdstrike for this is stupid (just like blaming microsoft for the crowdstrike bsod is stupid).

For background, I also work on a product using eBPFs, and had kernel updates cause kernel panics in my eBPF probes.

In my case, the panic happened because the kernel decided to change an LSM hook interface, adding a new argument in front of the others. When the probe gets loaded, the kernel doesn’t typecheck the arguments, and so doesn’t realise the probe isn’t compatible with the new kernel. When the probe runs, shit happens and you end up with a kernel panic.

eBPF probes causing kernel panics are almost always indication of a kernel bug, not a bug in the ebpf vendor. There are exceptions of course (such as an ebpf denying access to a resource causing pid1 to crash). But they’re very few.

replies(4): >>41031896 #>>41032164 #>>41032610 #>>41034621 #
2. josefx ◴[] No.41031896[source]
> likely a regression introduced by a rhel-specific patch. Blaming crowdstrike for this is stupid (just like blaming microsoft for the crowdstrike bsod is stupid).

Yeah, it isn't as if crowdstrike was specifically advertising certified support for RedHat Linux and related products.

https://www.crowdstrike.com/partners/falcon-for-red-hat/

replies(2): >>41032039 #>>41032241 #
3. dtx1 ◴[] No.41032039[source]
But being certified for RedHat Linux doesn't protect you from Bugs in the RedHat Kernel. That's on RedHat.
replies(1): >>41032169 #
4. mbesto ◴[] No.41032164[source]
> just like blaming microsoft for the crowdstrike bsod is stupid

Wait, how is this stupid? Unless I'm missing something, wasn't the patch part of a Microsoft payload that included an update to Crowdstrike? Surely Crowdstrike is culpable, but that doesn't completely absolve Microsoft of any responsibility, as its their payload.

replies(7): >>41032197 #>>41032249 #>>41032287 #>>41032415 #>>41032517 #>>41032630 #>>41032666 #
5. michaelt ◴[] No.41032169{3}[source]
Back in The Good Old Days, an OS vendor would release a beta version and software vendors would test against it and fix problems before the stable OS version was released.

Obviously OS updates come out a lot more often these days than they used to - but we're also better at test automation than ever before, and beta software is easier to get than ever.

It sure would be nice if companies that decide to produce kernel modules and to support certain OSes could test those kernel modules against those OSes at the beta stage.

replies(1): >>41032257 #
6. roblabla ◴[] No.41032197[source]
Do you have a source for this? It's the first time I hear of this. From what I've understood (perhaps wrongly), the error came from the CrowdStrike driver (csagent.sys) having bugs in their configuration parser that could cause it to BSOD. CrowdStrike pushed a corrupted configuration (the CS-000whatever.sys we're told to delete) that hit that bug. I'm not sure how Microsoft fits in this story.
replies(1): >>41032313 #
7. roblabla ◴[] No.41032241[source]
Yes, and? They probably do test their software on RHEL.

But how are they supposed to prevent a bug in a newly released kernel update? You can't test your software on future updates that aren't out yet.

If RHEL breaks some core functionality you depend on, in a newly released update, you can't really do much to prevent breakage, even with the best QA in the world. At best, they could have caught it as soon as RHEL published the new kernel... but by then it's already too late, all your currently-deployed probes now have a ticking time bomb, and need to be updated before the RHEL kernel update is applied, lest you kernel panic.

replies(1): >>41032327 #
8. angulardragon03 ◴[] No.41032249[source]
You’re missing something. The Crowdstrike issue was caused by a channel update (basically a definitions update) that they pushed that broke their own sensor. Microsoft wasn’t involved in the delivery of that update.
9. roblabla ◴[] No.41032257{4}[source]
1.An eBPF probe is not a kernel module. An eBPF probe should never cause kernel panics.

2. RHEL didn't provide beta kernels before very recently, as far as I can tell.

3. Even if you caught an error then, you're still at the mercy of RHEL to fix it. If RHEL breaks a feature, you report it to them, and they decide to ship anyways... well, your product will still kpanic. I'm not talking hypotheticals: I haven't seen RHEL do that, but I've seen other distros do it.

replies(3): >>41032955 #>>41034458 #>>41043172 #
10. ninepoints ◴[] No.41032287[source]
You're missing something. Many somethings.
11. mbesto ◴[] No.41032313{3}[source]
Just read more into it. You're correct. I think it would be dumb to solely blame MS, but I don't think you can completely absolve them.

this comment right here sums it up:

> Sure, but Windows shares some portion of the blame for allowing third-party security vendors to “shit in the kernel”.

https://news.ycombinator.com/item?id=41006176

replies(1): >>41033074 #
12. hsbauauvhabzb ◴[] No.41032327{3}[source]
Maybe by not loading the module into unknown kernels in the first place?

If say you support a distro, you can’t turn around and complain that supporting the newest version is hard, no matter who caused the problem. Plenty of products say ‘this works on $x but it’s not officially supported’.

replies(3): >>41032799 #>>41033155 #>>41034502 #
13. GuB-42 ◴[] No.41032415[source]
What I understand is that some Azure VMs are running CrowdStrike, and like any other computer running CrowdStrike on Windows, they crashed. Totally not Microsoft's fault, CrowdStrike messed with the kernel, the only thing we can blame Microsoft on is to allow such a software to exist.

Where Microsoft is to blame however is the unrelated Azure outage in the Central US region that happened (and was fixed) just before the CrowdStrike faulty update.

14. sschueller ◴[] No.41032517[source]
Microsoft should revoke the CrowdStrike driver signature and should do an internal check as to why CrowdStrike's driver was approved when it can execute arbitrary code on the kernel level without any checks. If your "driver" requires this feature MS should require CrowdStrike to submit the entire source and they should have to pay MS to do a review of the code.

What is the point of driver signing if a vendor can basically build in a back door and Microsoft doesn't validate that this back door is at least somewhat reasonable

replies(6): >>41032936 #>>41033093 #>>41033397 #>>41033699 #>>41034816 #>>41034838 #
15. watt ◴[] No.41032610[source]
There should not be any software caused crashes during operation of software. Every NPE that is not caused by hardware issue, is a null pointer not properly handled. Software needs to handle their null checks. Missing a check (or precondition, or validation) is squarely on Microsoft.

> always indication of a kernel bug and before

but:

> blaming microsoft for the crowdstrike bsod is stupid

and who owns the kernel in windows land? Microsoft. how is it stupid to blame Microsoft for not making kernel safe?

replies(3): >>41032783 #>>41032802 #>>41033041 #
16. PedroBatista ◴[] No.41032630[source]
The "only" thing Microsoft should be blame is for signing a driver updates itself by taking "configurations ( code )" from userspace and turning a blind eye to all these loopholes because they also know it's not practical for them to sign all the driver code that goes into the kernel.

Maybe most of these "drivers" shouldn't be in Ring0 to begin with? This is a general problem and the norm, Windows is just another OS that allows this this way.

replies(1): >>41033958 #
17. beAbU ◴[] No.41032666[source]
Crowdstrike pushed out a world-wide update to their client software, which auto-updated itself.

This update was buggy and it caused the host machine to go into a BSOD boot loop.

The fact that the host machine happened to be running Windows has very little to nothing to do with it.

It's like blaming a pothole for your car going exploding. Yes there was a pothole, yes it shouldn't have been there, yes it could have been avoided, but the fact that your car self immolated because of mere pothole points to possibly other underlying issues with your car.

18. Diggsey ◴[] No.41032783[source]
Microsoft don't own the kernel in that sense: anyone can write kernel drivers for windows... While there are some things that the kernel can do to protect against a bad driver, it's not a security boundary, so ultimately bad code can cause crashes.

AIUI, Microsoft actually has good tooling for validating drivers before they are deployed, but it requires that you actually run the validation...

19. broknbottle ◴[] No.41032799{4}[source]
This was their newer eBPF falcon sensor that was trying to load a bpf program in the kernel and triggered kernel panic. This shouldn’t have happened and was definitely a bug in the kernel.

For the kernel mode, their software will flag an unknown kernel as unsupported and go into a reduced functionality mode (rfm).

The idiots didn’t know that RH E4S was a thing for like 3+ years.. I’m still baffled by how clueless most of the security people and vendors are when it comes to backporting and different streams / channels that are offered by multiple Linux OS vendors.

https://access.redhat.com/solutions/7001909

20. mebeim ◴[] No.41032802[source]
Let me give you an analogy: Volvo is known to manufacture very safe cars. Now let's say I drive a Volvo car with a box of dynamite on the passenger seat. I stop at a red light but hit the brake a bit too hard and the box of dynamite falls and causes an explosion, disintegrating everything in a 20-foot radius. So whose fault was it? Volvo?

> Missing a check (or precondition, or validation) is squarely on Microsoft.

Missing a check for presence of dynamite before allowing me to start the car is squarely on Volvo!

You see how silly that sounds?

Now, back to being serious: MS cannot possibly control and validate everything you decide to install and run on your system, specially if the things you install are kernel drivers. It is simply impossible. If you install a kernel driver developed by a 3rd party company, and that driver crashes your system because the devs at that company forgot to perform proper validation of data, well... that's on them. Even if MS wanted, they wouldn't be able to verify the soundness of any piece of code that is installed as a driver and runs with kernel level privileges. That'd require solving the halting problem.

21. _flux ◴[] No.41032936{3}[source]
Do you think Microsoft customers using CrowdStrike would then be happier, being unable to run the software at all, due to an action Microsoft took?

Backdoors of all kinds can be installed to most any operating system without vendor co-operation. That is the nature of general-purpose operating systems.

replies(3): >>41033775 #>>41033840 #>>41034227 #
22. _flux ◴[] No.41032955{5}[source]
> and they decide to ship anyways... well, your product will still kpanic

But then you are in position to share your customers that this will happen before it actually does and they can choose their way of proceeding.

One such way would be being careful with the update and then exercising their own support contracts with RH.

23. roblabla ◴[] No.41033041[source]
@watt there's a big difference here.

eBPF is a bytecode that is interpreted in the kernel, with the explicit goal to allow writing code that executes at the kernel-level in a safe way. Any kernel panic (again, short of pid1 kills) is considered a bug, and could even potentially be exploited to gain capabilities in some cases. Here, the kernel explicitly says "this is safe", so any problem within is a bug in the kernel.

In contrast, a kernel module/driver is just some third-party code that is loaded in the kernel. Here, all bets are off: it is up to the third-party to do their job properly and make sure their code is correct.

In this case, CrowdStrike explicitly opted into writing a kernel module, and then failed to, as you say, "handle their null check". The segfault wasn't in Windows code, it was in CrowdStrike code that lives in the kernel. Crowdstrike should have handled their nullcheck, failed, and that will lead to a BSOD.

To be clear: the only way microsoft could make the kernel safer here is by disallowing kernel modules entirely. While there is an argument to be made that this could be a good idea, it is a bit beside the point.

24. roblabla ◴[] No.41033074{4}[source]
Yeah, the fact that Windows requires kernel-level access to be able to do EDR stuff is really unfortunate. MacOS has been very successful with their userspace EndpointSecurity Framework for this purpose.

On the other hand, Linux is similarly crippled: eBPF LSM are fairly recent and don't work everywhere (I'm looking at you Ubuntu[0]), and the only real alternative if you want to be able to block processes is a kernel module. Which comes with the same dangers as Windows.

[0]: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2054810

replies(1): >>41033470 #
25. roblabla ◴[] No.41033093{3}[source]
Unless you have a source, you should really avoid spreading misinfo here. CrowdStrike doesn't have kernel-level ACE. It has a buggy configuration parser, and they pushed a corrupted config that triggered those buggy codepath in the parser.
replies(2): >>41033227 #>>41033765 #
26. roblabla ◴[] No.41033155{4}[source]
Again: this is not a kernel module. eBPF probes are meant to be Compile Once, Run Everywhere, that's their whole point! https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf...

If you expect software to be future-bug-proof, well, I guess you live in a far better world than I do.

If you advertise your software to be compatible with RHEL, but a glibc bug gets in and causes your sw to crash for a couple of days, before RHEL realises the problem and fixes it, does that mean your software should instantly no longer be advertised as RHEL compatible? That'd make things a lot more confusing, if you ask me.

replies(2): >>41033633 #>>41039055 #
27. sschueller ◴[] No.41033227{4}[source]
My source: https://www.youtube.com/watch?v=wAzEJxOo1ts

From my understanding the CS driver lives in the kernel space and parses configs/applications downloaded in the user space. Hence the system even does a BSOD.

"CrowdStrike doesn't have kernel-level ACE" please provide your source.

replies(1): >>41034800 #
28. AshamedCaptain ◴[] No.41033397{3}[source]
> when it can execute arbitrary code on the kernel level without any checks

That would be grounds for blacklisting indeed (out of experience). However, that's not the case here, no matter how you put it.

29. sgift ◴[] No.41033470{5}[source]
Well, Microsoft says that's because of the EU commission: https://news.ycombinator.com/item?id=41029590

I have my doubts, but that's at least what they give as the reason for the kernel-level access of EDR tools.

30. linuxftw ◴[] No.41033633{5}[source]
Not all eBPF programs are compile once, run everywhere.

RHEL 'updates' can mean different things. A patch release won't change kernel ABI. A minor release will. Writing a non-CORE eBPF program for, say RHEL 8.6, might break on RHEL 8.7. It's not advisable to update across minor releases without lots of testing. Most of the time, things 'just work' but RHEL is a very complex product with a specific support cycle, and laziness of users and 3rd party vendors is not their fault.

31. cookiengineer ◴[] No.41033699{3}[source]
> What is the point of driver signing if a vendor can basically build in a back door and Microsoft doesn't validate that

Before you downvote that comment, I'd like to remind everyone that this was already happening. Realtek's driver cert was leaked, and a lot of malware used this cert to sign their drivers for _a decade_ until anything happened about it.

Microsoft's driver signing workflow is utterly pointless and it doesn't mean anything. Any vendor that takes their security serious should never trust those driver signatures.

32. dathinab ◴[] No.41033765{4}[source]
there are some places claiming that this "config" language is so flexible that it's basically a interpreted scripting language

but AFIK no sources I trust have yet claimed it

but it's probably where the idea comes from

33. sigseg1v ◴[] No.41033775{4}[source]
I'm a customer that is forced to use CrowdStrike via IT policies and I would be giddy with delight if something came along and caused the removal of it from my systems. I don't need programs sitting on my computer preventing me from installing code that I've literally just compiled, preventing me from deleting or modifying folders on my machine, and causing extreme lag for many basic system operations even when it does work. At this point, the time in lost productivity (via normal operation) and downtime (via their recent bug) has easily exceeded a thousand times over the aggregate sum of all benefits that CrowdStrike will ever have provided from threat detection and prevention. It's time to remove the malware.
replies(3): >>41033889 #>>41034040 #>>41034857 #
34. mbreese ◴[] No.41033840{4}[source]
At this point… yes.

It would be one thing Microsoft could do to focus 100% of the attention/blame away from Windows and onto CloudStrike. And customers will want their pound of flesh from somewhere.

Really, this should serve as a wake up call w/in Microsoft to start to harden the kernel against such vulnerabilities.

Was the crash the fault of Windows? No. But did a Windows design decision make this possible? yes.

I’m sure the design decision made sense at the time (at least business sense). Keeping the kernel more open for others to add drivers to makes it easier to write/add drivers, but makes the system more vulnerable. This a good opportunity within Microsoft to get support for changing that.

replies(1): >>41034028 #
35. _flux ◴[] No.41033889{5}[source]
You are not the customer, though, your employer is the customer.

Perhaps you should push this change up in the food chain, then, and if the company is good the request will be taken seriously. As I understand it, while CrowdStrike is the biggest name in EDR, it's far from the only one, if that's what your company requires to pass some checkboxes in certifications.

replies(1): >>41034900 #
36. dathinab ◴[] No.41033958{3}[source]
it's not that simple

user-land drivers are a thing, heck they are the standard for modern micro kernel architectures

and even with hybrid kernels pushing part of the code out of the kernel into something like "user land co-processes" is more then doable, now it's now trivial to retrofit in a performant way and flexible way but possible

Mac has somewhat done that (but I don't know the details).

On Linux it's also possible, through with BPF a bit of a in-between hybrid (leaving some parts of the drivers in kernel, but as BPF programs which are much less likely to cause such issues compared to "normal" drivers).

A good example for that is how graphic drivers have developed on Linux, with most code now being in a) the user-land part of the driver and b) on the GPU itself leaving the in kernel part to be mostly just memory management.

And the thing is Windows has not enforced such direction, or even pushed hard for it AFIK, and that is something you can very well blame then for. You in general should not have a complicated config file parser in a kernel driver, that's just a terrible idea, some would say negligent and Windows shouldn't have certified drivers like that. (But then given that CrowdStrike insists that it _must_ be loaded on start (outside of recovery mode) I guess it would still have hung all systems even if the parsing would have been outsourced because it can't start if it can't parse it's config).

replies(1): >>41034683 #
37. _flux ◴[] No.41034028{5}[source]
Ultimately this would have been almost a non-issue if there had been better deployment strategies in place for also the data file updates.

If by changing the system you mean adding some kind of in-kernel isolation to it, then I don't think it would be worth the effort to make that kind of major change to the way operating systems work just to give arguably a minor risk reduction to systems—in particular if CrowdStrike and other vendors take some learnings from this event.

Microsoft might improve their system rollback mechanism to also include files that are not strictly integrated to the system, merely used by the parts that are (the channel files loaded by the driver).

Actually I think we can just be happy that the incident was a mistake, not an attack. Had this kind of "first ever" situation been an attack, it could be extremely difficult to recover from it. I wonder how well EDRs deal with "attacks from within"..

CrowdStrike pulled off the update within 1.5 hours. I wonder if they actually use Falcon themselves? But then somehow missed the problem? Doesn't seem like they eat their own dog food :). (Or at least their own channel files.)

replies(2): >>41034669 #>>41034911 #
38. ◴[] No.41034040{5}[source]
39. CaptainZapp ◴[] No.41034227{4}[source]
> Backdoors of all kinds can be installed to most any operating system without vendor co-operation

Not on Kernel level. Not without active support by the vendor.

replies(1): >>41034340 #
40. _flux ◴[] No.41034340{5}[source]
How much does it really help you if your complete user-space can still be messed up by an offending Windows SYSTEM process? As I understand it, they are able to hurt the system e.g. by killing processes, uninstalling applications, replacing binaries, allocating memory, starting too many processes, ..

Actually I could easily see a buggy remote system management update could just decide to uninstall everything and nuke the system, because it thinks it's stolen. And it would be designed functionality for it.

41. freedomben ◴[] No.41034458{5}[source]
Emphasis added:

> An eBPF probe should never cause kernel panics.

Should, but did. This is the point at which to learn and adapt.

Also, kernels are software just like nearly everything else, and software is buggy. It's a balance obviously, but some basic defensive development can be a real savior for your users.

I don't know the details about this CrowdStrike incident, but I would also be surprised if you couldn't write an automated test (even a "smoke test") to quickly test out these new kernels before they hit your customers. Given what happened, it seems like negligence not to do that.

replies(1): >>41035656 #
42. freedomben ◴[] No.41034502{4}[source]
> Maybe by not loading the module into unknown kernels in the first place?

Then you better tell your customers not to `dnf update` until you've had a chance to whitelist the new kernel and ship it in your own stream. Otherwise everyone who updates before you do ends up broken. If a vendor told me that, I would laugh, realize they were serious, thank them for their time but let them know that we will be going a different direction.

43. xyzzy123 ◴[] No.41034621[source]
It's not clear to me they are so different but maybe I am not "sufficiently smart".

To me this feels like a complicated question - both Linux and Windows organisations are quite good at kernel reliability engineering even though quite different organisational structures and engineering approaches are involved.

Yes "the wrong people were trusted" but I don't see how we can completely solve this with engineering.

replies(1): >>41040545 #
44. mbreese ◴[] No.41034669{6}[source]
If many things had gone differently, this could have been avoided. But I’m looking at this from the Microsoft perspective. No matter how much people scream high and loud that it was a CloudStrike issue and not Microsoft’s fault, Microsoft is still getting blamed. It’s a Windows BSOD.

I talked to my dad (retired enterprise operations/IT) this weekend and he was telling me that the next computer he buys will probably be a Mac, largely because he doesn’t want to deal with the possibility of a crash like this. Does he run CloudStrike? Not at all. Does he know who they are? Nope. (He’s been retired for a while) What he does know (well, thinks) is that Windows now has an unstable kernel.

And Microsoft has no control over distribution policies for other vendors. How those vendors distribute updates is up to them. Even if a sane deployment strategy could have avoided the larger global problems, Microsoft can’t control that.

So, if you have Microsoft dealing with negative publicity and public sentiment, with no way to control errors like this in the future, what can you do? To me, the best they can do is kneecap CloudStrike, put the full blame on them, and use this as an excuse to change the kernel/driver model to one where they can have more control over the stability of the OS.

replies(1): >>41034932 #
45. traject_ ◴[] No.41034683{4}[source]
> And the thing is Windows has not enforced such direction, or even pushed hard for it AFIK, and that is something you can very well blame then for.

Even here it's pretty hard to blame them due to antitrust concerns. Just google the word Patchguard.

replies(1): >>41054612 #
46. roblabla ◴[] No.41034800{5}[source]
> From my understanding the CS driver lives in the kernel space and parses configs/applications downloaded in the user space. Hence the system even does a BSOD.

That's my understanding as well, but not quite the same as

> execute arbitrary code on the kernel level without any checks

At least for me, when we talk about kernel-level ACE, it's something like libcapcom[0], which allowed executing arbitrary unsigned code in the kernel.

Here, the driver can only execute the code present within itself, which was signed by Microsoft. The configuration itself isn't signed by microsoft, but the config isn't code (at least, as far as I can tell - I see some people claiming the CS-0000.sys files are essentially bytecode, but have yet to see conclusive proof of this).

Now, we could argue that it's weird that Microsoft signed a buggy driver, and MS should do better qualification of third-party drivers. But in practice, MS doesn't really vet driver quality. From what I can tell, the driver signing is mostly there so they can easily attribute provenance of drivers, and revoke the certs if it ends up in the hands of malicious actors.

[0]: https://github.com/notscimmy/libcapcom

47. SoftTalker ◴[] No.41034816{3}[source]
I never assumed that driver signing was any kind of indicator of quality. It simply says "this is the Crowdstrike driver, it has not been modified"

Maybe I'm wrong and Microsoft does some QA on drivers before they are signed?

48. rramadass ◴[] No.41034838{3}[source]
Finally! You hit the nail right on the head!
49. hello_moto ◴[] No.41034857{5}[source]
Sounds like your IT (sec team, specifically) doesn't setup the software correctly.

I've worked for a company that installs Falcon on all its fleet and I never run into issues like yours.

50. hello_moto ◴[] No.41034900{6}[source]
Vendors are competing with one and another to win contracts.

CIO/CISO don't select vendors lightly.

There seems to be a typical/classical Engineer's mindset of "make a claim first, ask later" around the subject lately.

"My boss plays golf with Sales Rep" might need more proof because if they selected the lesser capable vendors and they got hit with ransomware, bet my ass your boss will no longer play Golf with any Sales Rep ever.

replies(1): >>41043441 #
51. roblabla ◴[] No.41034911{6}[source]
There's a simple thing microsoft could do to avoid this, that doesn't require anything too crazy. EDRs work in kernel-land because that's the only place you can place yourself to block certain things, like process creation, driver loading, etc...

macOS has a userland API for this, called EndpointSecurity, which allows doing all the things an EDR needs, without ever touching kernelland. Microsoft could introduce a similar API, and EDRs would no longer need a driver.

replies(2): >>41035024 #>>41035439 #
52. hello_moto ◴[] No.41034932{7}[source]
They will kneecap Security industry and open up another can of worm: Windows insecure back on the menu.
replies(1): >>41035376 #
53. _flux ◴[] No.41035024{7}[source]
I suppose that's what CrowdStrike's system on Mac uses as well, then. Apparently on Linux they use EBPF and Microsoft is researching that for Windows as well: https://github.com/microsoft/ebpf-for-windows . So maybe that's actually the solution they'll go with?

It would certainly help solving this particular problem, even if not the kernel-integration in general.

54. mbreese ◴[] No.41035376{8}[source]
There are other vendors.

Microsoft could even reinstate CloudStrike at some point, but only after an extensive review process. And then probably require similar process reviews/checks for any other vendor that requires the same kernel access.

Or just remove the need for kernel access at all and migrate to a better driver architecture at the sacrifice of backwards compatibility. Security software doesn’t need to run in kernel space… there are other ways.

replies(1): >>41035499 #
55. mbreese ◴[] No.41035439{7}[source]
This is exactly what I’d advocate for. There are many things that run in kernel space that don’t need to. The Mac model with user-land hooks is one model. EBPF from Linux (and Windows?) is another.

I’m sure the reason why Apple migrated was because of all of the bugs/crashes security companies kept introducing into the kernel with kexts. Apple had the ability to change their architecture on a whim because of they aren’t quite a beholden to backwards compatibility as Windows.

Microsoft could take this as an opportunity to make some major changes that would be more readily accepted by the market.

56. hello_moto ◴[] No.41035499{9}[source]
That could potentially be a lawsuit against MSFT since their own MSFT Defender is in this space and potentially doing the same thing or else they have way less potency of catching attacks no?
57. roblabla ◴[] No.41035656{6}[source]
It's possible CS can do better, of course. But it's just wrong to blame them for the Linux crashes - they're not the ones that introduced buggy code and broke their users. RHEL/Linux did.
58. hsbauauvhabzb ◴[] No.41039055{5}[source]
That doesn’t really change my point - if stability issues are known to occur in a dependency, you can’t say you support that system.
59. roblabla ◴[] No.41040545[source]
> It's not clear to me they are so different but maybe I am not "sufficiently smart".

They're different because linux promises "eBPF are safe and cannot crash the kernel", and failed to deliver on that, while Microsoft says "drivers are all-powerful and as such must be written with care", and CrowdStrike did not heed this warning.

> Yes "the wrong people were trusted" but I don't see how we can completely solve this with engineering.

I mean, we could solve the "third party software fucks the kernel up" problem easily with engineering: providing userspace APIs to do stuff that currently need kernelspace access. There's no inherent reason security products (or, really, any products) needs to live in the kernel, it's just that there are no APIs to do this job, so security products have to go there. If Microsoft provided a good API doing what the custom drivers currently do, most security products would drop their driver in a heartbeat.

For instance, macOS fixed this exact issue a couple years ago by introducing Endpoint Security Framework, a userspace API that allows watching a bunch of events, and authorizing whether they should be allowed or blocked. It's a well-designed API that should obsolete the need for kernelspace access in security products.

replies(1): >>41043586 #
60. CRConrad ◴[] No.41043172{5}[source]
> An eBPF probe is not a kernel module.

But if it runs on the same privilege level as the rest of the kernel, then isn't it, for the purposes of this discussion, in effect "a kernel module"?

replies(1): >>41056844 #
61. CRConrad ◴[] No.41043441{7}[source]
> Vendors are competing with one and another to win contracts.

Sure, in a well-functioning market economy without any distortions. But there are lots of those at play, so competition is severely hampered (by network effects, regulatory capture, and on and on... Up to and including, I suspect, mere ephemeral fashion). What we actually have in many areas of the "tech market" are oligopolies and near-monopolies, not perfect competition.

> CIO/CISO don't select vendors lightly.

Muahaha. Seems rather more like they're at least as naïve as any Web-surfing consumer on their sofa, easily bamboozled by trendy buzzwords and slick marketing campaigns.

62. j2bryson ◴[] No.41043586{3}[source]
So what happened with the linux bug? Presumably people fixed the OS side problem straight away?
replies(1): >>41058707 #
63. dathinab ◴[] No.41054612{5}[source]
Thats is misleading.

Falcon uses apis like eBPF when available/usable , they are not stupid if they can use something which is more secure and reliable why should they not use it.

E.g. they use it on Linux, even through they could have created a custom kernel module (idk. if they maybe also have a custom kernel module tbh.).

And pushing for something doesn't mean banning other things. E.g. they could certify "following best security practices" and not give it to vendors not using the more modern APIs, while they can't block drivers based on it with the right marketing customers of CrowdStrike wouldn't want to buy it without such cert.

I.e. while MS doesn't provide viable ways to get the functionality Falcon and similar need without kernel modules it indeed would be a bit ridiculous for them to ban such software, and as of yet the do not.

64. kragen ◴[] No.41056844{6}[source]
it doesn't—the semantics of ebpf confine it—so it isn't
replies(1): >>41087243 #
65. roblabla ◴[] No.41058707{4}[source]
kernel-5.14.0-427.13.1.el9_4 broke it. It was released in Apr 30, 2024, with RHEL 9.4 (this was the RHEL 9.4 release kernel).

According to the comments on https://access.redhat.com/solutions/7068083, RHEL became aware of the issue on May 3, 2024.

A workaround was identified (configuring CS to use the kernel module backend instead of the ebpf backend) on May 9, 2024.

RHEL then fixed it in kernel-5.14.0-427.18.1.el9_4, in May 23, 2024.

So the bug was fixed in ~20 days from the moment it was reported.

It's unclear whether this issue was caused by a RHEL-specific backport/patch or was also present in mainline kernels.

66. CRConrad ◴[] No.41087243{7}[source]
Either

1) Those Crowdstrike unit files aren't ebpf probes, so the whole subject of ebpf probes is irrelevant here; or

2) They're obviously able to stop the rest of the kernel from even booting up (as Crowdstrike so convincingly demonstrated millions of times over[1]), so yes, they do indeed have at least as much power as any other bit of the kernel.

Either way, hunting around for nits to pick is a bit pathetic.

[1]: In July 0000002024...

replies(1): >>41087925 #
67. kragen ◴[] No.41087925{8}[source]
denial of service is not the same thing as arbitrary code execution, and that goes double in kernel mode, but yes, it does seem that the linux implementation of ebpf had buggy sandboxing; i don't think allowing clownstrike to prevent booting was part of the intended objective

i wasn't hunting around for nits to pick; i was hunting around to see if you'd ever contributed any useful comments to the site. instead i found you making authoritative pronouncements about ebpf that were so wrong that you had evidently never read so much as a one-line summary of what ebpf was for. do you have a more promising historical comment to offer? perhaps something where people complimented your contribution as being informative?

have you ever made a worthwhile comment on hn?

on thursday, wahern posted this comment https://news.ycombinator.com/item?id=41061179 where they traced through the illumos/opensolaris source code to track down how a peculiar solaris interprocess communication mechanism worked, an investigation i had started but gotten stuck on. why can't you make comments like that instead of harassing me about how i format my comments?

the reason i'm asking is because i'd like to be able to talk to more people like wahern, but most of them avoid this site. a major reason why is that comments here frequently receive vacuous, aggressive responses like the comment you made the day before in https://news.ycombinator.com/item?id=41056718, where you launched a personal attack on me because you didn't like how i was formatting my comments

i'd like you to ⓐ apologize for doing that (this is not the first time you've done that to me personally; so far i haven't looked through your comment history far enough to find out how many other people you have a history of repeatedly harassing) and ⓑ commit to not doing it again

because i'm sure you're capable of making comments that make the site better instead of worse

replies(1): >>41113364 #
68. CRConrad ◴[] No.41113364{9}[source]
> do you have a more promising historical comment to offer? perhaps something where people complimented your contribution as being informative?

> have you ever made a worthwhile comment on hn?

I might answer that. If I thought you were owed any justifications from me. Which I don't.

And no, I'm neither “harassing” you nor being “vacuous, aggressive”. This isn't ad hominem, it's ad habitem. You write here for other people to read, and I'd even appreciate many of your comments -- if they weren't so infuriatingly idiosyncratically formatted as to disrupt fluent reading. Have the fucking courtesy to write like a normal person, and you'll be treated like a normal person. To begin with, get the shift key on your keyboard unstuck so you can start your sentences with capitals. And in case your dot / period / full-stop key is totally gone, copy-paste some of these: ........... So you can end them properly too.

Because I'm sure you're capable of making comments without coming off like an illiterate buffoon.

And yes, BTW, you totally were. Careful now, you don't want to end up like chockablock again, do you?