GitLab has released a fix on their end for anyone else wondering
https://about.gitlab.com/releases/2025/03/12/patch-release-g...
He's mentioned in the article, but a major shout-out is warranted for ahacker1. He's doing really sophisticated and valuable work to secure SAML implementations. We at SSOReady are really appreciative of his work.
Earlier this week, WorkOS put together a nice write-up on their own collaboration with ahacker1: https://workos.com/blog/samlstorm
The SAML spec itself is fairly reasonable, but is built upon XML signatures (and in turn, XML canonicalization) which are truly insane standards, if they can even be called such.
Only a committee could produce such a twisted and depraved specification, no single mind would be capable of holding and combining such contradictory ideas.
It would be so simple to just transmit signatures out-of-band and SAML would be a pleasure to implement.
Coming up with your own protocol on top of a protocol for a tiny amount of data amounting to not much more than what’s in an authentication cookie is the special kind of stupid that only the largest and most bureaucratic committees can produce.
A SAML session is only required if said app fetches data via a token obtained from that user - and in my glance around, this was almost never the case - SAST tools almost always use app instance tokens and are happy to show anyone with a GitHub account in your organization your code. Tailscale fixed this when I pointed it out, Sonarcloud told me to please don't tell anyone and GitHub took a few weeks to say this is totally expected behavior - when no vendor I told did, and their docs contradicted them.
I swear, reporting security bugs is a thankless endeavor, even if you just randomly stumble over them. I couldn't imagine doing this as a job.
It’s quite literally parsing xml using regular expressions. It’s an excellent case study for why you shouldn’t do that.
Projects didn’t start using Nokogiri for performance. They used it because it’s correct.
Parser differentials are expected and even necessary. What you intend to get from a signed response is very meaningful. A dilemma in modern TLS is that sometimes you want to trust one internal CA; That's the easy path. Sometimes you want to accept a certificate from a partner's CA, and you've got multiple partners - and you can no longer examine just the end certificate, but the root of that chain is equally important in your decisions.
This is also why I recommend whenever possible against AWS Sig algorithms; V4 is theoretically secure, but they screwed it up twice - SigV1 and SigV3 were insecure by design, and yet somehow made it past design review and into the public.
I was testing o3 recently and it kept changing the library used by a block of code every time it tried to fix an issue in the block that was unrelated to the library used (haven't seen that happen with Sonnet)
Easy to see how issues could creep in because a modification is made that switches to an inferior library/gem that exists in the code base or standard library so still passes tests etc. but doesn't need a Gemfile change
Also using comments to bypass saml is very old news. https://duo.com/blog/duo-finds-saml-vulnerabilities-affectin... is a post from 2018 about it.
In the context of saml that's hardly the least of it. Lots of the problems are things like allowing comments to sort of change the meaning of the document, allowing signatures to sign only part of the document. Allowing multiple signatures to sign different parts of the document, etc.
The issues goes beyond authorization. I’ve had Github randomly once in a blue use my personal email address as the default when merging a work PR. If anyone asks, I advice against mixing personal and professional stuff in the same Github account (or anywhere).
Any type of password store, even a physical one, or just reusing passwords, ends up being safer.
Minimalism wins again
If you are an IT admin with any pride, SAML is out of any future plans. The idea of SSO is suspect as a whole. Xml parsing has been hit twice in a week, avoid it in the future, anything wrong with a policy that replaces xml with json?
The nature of web software is 100 times riskier than anything else because of the risk profiles and 100% connectivity
At first read, I think you’re JSplaining, but I’m willing to give you the benefit of the doubt.
How difficult is it exactly? Can you provide examples, perhaps even of the particular difficulties? Are the difficulties on the side of the convincer or the convincee, or both?
Obviously, if you can avoid doing SSO with SAML, you should.
Someone should go deep on the mailing list and standards body horrors of WS-* and OASIS/XACML and all that crap
Github also makes their OAuth permissions picker extremely confusing. When I "login with Github" I am never sure exactly what I'm sharing, from which organizations I'm a member of.
Related submission a year ago: https://news.ycombinator.com/item?id=38743029
When we get vulnerabilities in the SSO protocol (SAML or otherwise) these vulnerabilities generally only affect some of the clients (identity consumers) who have implemented the protocol incorrectly or are using a feature that the provider has implemented incorrectly. Vulnerabilities that break the entire provider are less common.
When comparing this situation to having multiple different accounts, I can't see how SSO is less secure. Sure, when you have breach that affects the entire identity provider the damage is high, but the risk of having a breach (any breach!) is lower, since implementations are fewer, more consolidated and usually developed by people with better expertise.
I wonder how many other libraries are vulnerable to similar parser differential attacks. It's a good reminder to be extremely careful when dealing with XML and SAML, which are complex beasts at the best of times. As asmor pointed out, Github's SAML implementation has other issues too. It seems like SAML is just inherently difficult to get right.
Also, to the person who suggested not mixing personal and professional stuff in the same Github account: wise words! I've seen that cause headaches more than once.
OAuth 2.0 and its extension Open ID Connect have been around for over a decade. They have their own gotchas (like in badly defined ID token in OIDC and the ill-thought implicit and hybrid flows), but nothing there is nearly as dangerous as SAML.
Most applications support Open ID Connect now, but I'm still seeing organization choosing to use SAML out of inertia even when they are fully capable of using Open ID Connect.
“The code I write doesn’t have XSS or SQL injection vulnerabilities,” sure. At least those are plausible things to believe.
Client side validation?? How could anybody believe in that?
Is this speaking from experience?
In short, nesting trees and signing them is difficult and prone to pitfalls. It's easier if the envelope holds the message as a raw string, and the signing is performed on the raw string.
But then if you do that you also lose all your open source work history, which is important from a hiring/resume perspective.
Q: Is there any non-legacy reason to use SAML instead of libsodium’s public key authenticated encryption (crypto_box)?
Another Q: Is there any non-theoretical risk of parser differential when using libsodium’s cyrpto_box on one end and Golang’s x/crypto/nacl/box on the other end?
Now, xml has also been used for a lot of things where a hierarchical format like json would have worked better than a markup format, of which SAML would be a good example. But there are also cases where a markup format makes more sense, like svg or docbook, or odf.
Implicitly, that means no security software dealing with json should be written in Go, Javascript, ruby, python, etc (where practically everyone uses json parsers that silently ignore duplicate keys)
Plenty of languages do have common json libraries w/ duplicate key errors, like haskell (aeson), rust (serde_json), java (gson, org.json, probably others), so there's plenty of good choices.
So yeah, correct parse result is '400 bad request'
I did a full writeup here: https://notes.acuteaura.net/posts/github-enterprise-security...
Do they? You don't have to mess with syncing teams, memberships, or assignment to repos if you don't want to. You can make one api call:
> The authenticated user has explicit permission to access repositories they own, repositories where they are a collaborator, and repositories that they can access through an organization membership.
https://docs.github.com/en/rest/repos/repos?apiVersion=2022-...
So I'd give it about a 50:50 chance of working.
Edit: I just realized it eats your non-gated notifications too, if they're further down than position 25, and the "Next" button just leads to the same page with "?query=". Yay, another ticket about how glued on GitHub Enterprise Cloud is. The last one (GitHub eats API calls to accept invites to SAML organizations, deletes the invite, and sends a 200, writes success to the audit log... but ends up being a no-op) only has been 2 months or so ago. Thanks Microsoft.
In some cases (even in the US), if the employer does something that would be considered a "breach of contract", you can force them to remove all your code as well.
So, it would not be in the company's best interest to scrub their git history.
https://github.com/protocolbuffers/protobuf/blob/6aefdde9736...
Some other headaches:
Having decentralised authentication means that onboarding and offboarding need to have a bunch of tedious manual steps, or custom automation.
Whoever does user support for the organization has to be trained to reset passwords/unlock accounts in a hodgepodge of systems.
Any security controls the organization wants to implement need to be reimplemented or approximated in a bunch of different systems. E.g. if there are regulatory requirements for account lockouts, time between explicit reauthentication, etc.
It becomes much more critical to collect the authentication logs/event data for all of those systems, and harmonize its formatting with everything else so that the security ops team isn't maintaining separate monitoring/alerting rules for every system.
For large-scale systems, there are also at least theoretically performance advantages to the kind of signed ticket approach that SSO mechanisms tend to use, versus having to do database lookups of session IDs or verify a password. It's possible to do that without SSO, but if you're going to the trouble of implementing that kind of mechanism, you're most of the way to having SSO anyway, and might as well just finish the job IMO.
I'm sure the specifics will come out sooner or later.
i.e. it looks like a reasonably good way of exchanging encrypted messages, but I don't see anything in the docs indicating that it would provide the equivalent of group membership/roles/permissions.
Building something like that as custom code is a huge commitment, and could easily result in severe vulnerabilities specific to that system.
Strictly not a parser problem.
Csv is also available.
And binary protocols, with index based implicit keys are and byte length prepended to variable length fields. Those are the gold standard (see ip and tcp headers.)
I don’t even trust Git profiles. I buy a new license for GitKraken at any job I go to, even if I could avoid it; to me the possibility of accidentally trying to commit to work GitHub with my personal GitHub or vice verse is not worth it.
It’s the same with Microsoft accounts and their infamously bad-tech-debt-caused spaghetti.
Like if you try login to Outlook on iOS and you get a threatening message to the effect of “your system administrator will be able to remotely control and wipe your entire device if you proceed”. If it’s even a possibility that an incompetent or malicious IT department wipes your personal device, then no thank you.
See also that HN thread where a father let his child use his laptop, where they signed into their Microsoft school account, and somehow his personal Microsoft account was merged into their school account and from what I could tell he was never able to fix it and the school IT department didn’t care.
The idea of separating work and personal seems to be becoming old-fashioned.
Now, if you're a contractor performing work for a company, this may be quite different. But as an employee, I don't think you have any claim of authorship to the code you right as part of your job.
Particularly something someone might reasonably need 3 or more different instances of. E.G. Personal SemiProfessional, Personal NSFW stuff, Work but they didn't give an X this service demands.
Furthermore, SAML SSO alone does not save you from worrying about this, ideally you'd also implement SCIM, to have actually automated + real-time identity updates, which is yet another protocol separate from SAML.
For any hate JWT gets, JSON-inside-JSON,etc, at least it architecturally avoids these kinds of security issues since you verify once and only read data from what has been verified and nothing else, instead of having to re-create data and hope that the loose structure doesn't mess things up.
I ended up using the GraphQL API and making a query like this:
query($cursor0: String, $cursor1: String) {
search(query:"org:peterldowns", type: REPOSITORY, first: 100, after: $cursor0) {
pageInfo {
hasNextPage
endCursor
}
repositories: nodes {
... on Repository {
id
name
collaborators(first: 100, after: $cursor1) {
pageInfo {
hasNextPage
endCursor
}
edges {
permission
node {
login
id
}
}
}
}
}
}
}
Removing permissions that are no longer present in this result set is left as an exercise for the reader.I stopped working on the product so I never implemented the event stream consumer that would let me listen for "this user was removed as a collaborator" or "this team no longer has admin access to that repo". The entire permissioning model for Github is extremely complex and learning about all of its intricacies was half the battle.
Hence the whole "enterprise" IT.
For awhile GitHub was rather unavoidably the only place in my company where there was no reliable line between personal and professional accounts/systems.
I moved us to Forgejo after trialing it against Github (and GitLab, and Gitea).
At a prior employer everyone just used their personal GitHub accounts for the business. Once it became a “capital-E-Enterprise” making promises about things like employee SSO, they quickly retreated to an on-premise platform (not GitHub EE).
Often it takes several penetrations via compromised/replaced clients to get the message through.
Just look at all the discussions about why browser-based javascript encryption is problematic.