Klarna users are being signed in to random accounts

1. henvic ◴[27 May 21 15:07 UTC] No.27304014[source]▶

As a software engineer, I hate when I add a check for something "that will never happen" but that if happens is awful, and people complain.

A classic example: you need to get a user from a session, check against a database, and continue if they're signed in.

Then I add a simple if databaseUser.Username != form.Username and people will say "if that happens we've something worse wrong". Geez, something might be wrong and such double checking might provide to be useful.

On a smaller scale, bits flip due to cosmic rays and so on. Of course, there must be a limit where we stop, but people are used to actively avoid doing such "silly assertions" even for important steps.

¯\_(ツ)_/¯

replies(9): >>27304123 #>>27304382 #>>27304569 #>>27304654 #>>27304687 #>>27304894 #>>27308296 #>>27308719 #>>27309906 #

2. bagacrap ◴[27 May 21 15:15 UTC] No.27304123[source]▶

>>27304014 (TP) #

it's fine to make the check but I hope you don't sweep it under the rug with an early out without at least logging the occurrence

replies(1): >>27304171 #

3. henvic ◴[27 May 21 15:21 UTC] No.27304171[source]▶

>>27304123 #

uh? Why would you make the check, find a critical internal inconsistency, and skip logging it? :)

replies(1): >>27304387 #

4. geofft ◴[27 May 21 15:40 UTC] No.27304382[source]▶

>>27304014 (TP) #

I think there's merit in objecting to "that will never happen" checks in some cases (though, to be clear, I'm not saying the people objecting to your code are thinking about the same thing I am).

Specifically, if you have data that is loaded from some other source, your extra safety check might be checking data that's loaded from the same source, in a way where if something did go wrong, it went wrong in both places you're checking.

In this case, it seems pretty unlikely that Klarna's bug was that they ran "SELECT * FROM users WHERE Username = 'joeuser'" and they got back a row where Username != 'joeuser'. I don't think there's a recorded case of that ever happening with databases.

However, it seems much more likely that Klarna's bug was in HTTP caching or something, that results were returned for the wrong user. Then there's no opportunity to see databaseUser.Username != form.Username: that check would have indicated that things are correct, but the username being passed into this code was wrong in the first place. That sort of problem definitely happens in the wild - see the "Kenneth" story elsewhere in these comments, or off the top of my head https://blog.zulip.com/2021/03/20/zulip-cloud-security-incid... from two months ago.

And if it is, somehow, a database bug, why do you trust the database at that point? What if the database returns part of one row and part of another? What if it returns the username you sent in because of some optimization to avoid copying data, but thanks to a bug (or a cosmic ray) it reads in the rest of the data from an unrelated row? In the unlikely but not totally impossible case that you need to protect yourself against this, validating the username isn't enough; you'd better sign the entire database row and validate the signature before trying to use any of the data that's been returned. (And come up with some reason why you trust your own app code more than the database.)

The problem with such "silly assertions" is that they make you feel like you've added test coverage, when the thing you're testing is something like a database that is extensively tested by its vendor and by everyone else using the database, and there are other seams in your code which are much more likely to break. Meanwhile, they make the code longer and harder to read, which prevents readers of the code from easily identifying what those seams are.

(And by slowing down the API endpoint that talks to the database, it motivates other developers to try to put some caching in front of that endpoint, which may actually cause this sort of problem!)

replies(1): >>27304636 #

5. dsego ◴[27 May 21 15:40 UTC] No.27304387{3}[source]▶

>>27304171 #

log("this should never happen")

6. rightbyte ◴[27 May 21 15:58 UTC] No.27304569[source]▶

>>27304014 (TP) #

I like defensive programming. Even though I think the state is unreachable, it feels nice to add a panic assert just in case.

7. henvic ◴[27 May 21 16:05 UTC] No.27304636[source]▶

>>27304382 #

> I don't think there's a recorded case of that ever happening with databases. > and there are other seams in your code which are much more likely to break.

One such thing is the abuse of layers and layers of abstractions. For example, many people (unfortunately, in my view) love to use ORMs and query builds, and things like these are much more easier to happen when things are too generic.

And signing the entire database row and validating it, and so on, might be unjustified for most people, especially if you already count with correction from a TLS layer, and you can just have the trade-off of adding a simple conditional to check if the data you receive is sane.

This is not something essential for everything, but that is nice to have, especially the further you're out of control.

For example, if you retrieve data from an external API you should not trust it blindly, but rely on your internal references (security concerns aside, I'm talking about other kind of erratic behavior or bad data).

8. jojohohanon ◴[27 May 21 16:06 UTC] No.27304654[source]▶

>>27304014 (TP) #

A lifetime ago I was writing code for airline data processing. The specs are very clear about what the valid representation of every field was (less so about what they meant, but...).

So we generated our parser to fail if field ORG/1457 (made up) was not numeric max 8 digits. Or missing where mandatory.

Even if we never touched the data in that field.

Turns out that no-one else used the spec that way. No two were the same, so we had to basically implement two layers of parsing. One to put the data in a common parse tree, and the other to per-sending-mainframe interpret the data as how the sender had implemented.

We assumed that the mainframe would never send illformed data, and indeed that-could-never-happen. But they differed in what they thought was well formed.

9. jacquesm ◴[27 May 21 16:08 UTC] No.27304687[source]▶

>>27304014 (TP) #

This is very good practice as far as I'm concerned. Functions should treat their arguments as potentially hostile input.

replies(1): >>27305821 #

10. YeBanKo ◴[27 May 21 16:26 UTC] No.27304894[source]▶

>>27304014 (TP) #

If it is due to cache, then extra check like you described probably would not help.

11. cerved ◴[27 May 21 17:39 UTC] No.27305821[source]▶

>>27304687 #

maybe if it helps to fail fast and only public functions

12. mekkkkkk ◴[27 May 21 20:47 UTC] No.27308296[source]▶

>>27304014 (TP) #

I agree, and I've also been called out for doing "stupid" defensive assertions. It's almost certainly not a code-level issue this time though. This whole thing reeks of infrastructure/caching issues.

13. anticristi ◴[27 May 21 21:22 UTC] No.27308719[source]▶

>>27304014 (TP) #

Most people I met who do double checks would simply return "not loggen in" and issue a WARN deep within the other 200 WARNs-per-second. That is IMHO a very bad usage of double checks. It gives a false sense of security and masks the deeper problem until it's too late.

However, if you make the assertion fail loud, then it provides an additional security layer and should be used as often as makes sense.

14. rad_gruchalski ◴[27 May 21 23:31 UTC] No.27309906[source]▶

>>27304014 (TP) #

Klarna uses erlang and erlang enforces exhaustive conditionals and pattern matching so the error / unsupported case is always handled.