Removing PGP from PyPI (2023)

(blog.pypi.org)

Show context

politelemon ◴[17 Oct 24 21:51 UTC] No.41874187[source]▶

This feels like perfect being the enemy of good enough. There are examples where the system falls over but that doesn't mean that it completely negates the benefits.

It is very easy to get blinkered into thinking that the specific problems they're citing absolutely need to be solved, and quite possibly an element of trying to use that as an excuse to reduce some maintenance overhead without understanding its benefits.

replies(2): >>41874198 #>>41874289 #

creatonez ◴[17 Oct 24 21:53 UTC] No.41874198[source]▶

>>41874187 #

Its benefits are very much completely negated in real-world use. See https://blog.yossarian.net/2023/05/21/PGP-signatures-on-PyPI... - the data suggests that nobody is verifying these PGP signatures at all.

replies(2): >>41874468 #>>41874473 #

Diti ◴[17 Oct 24 22:31 UTC] No.41874473[source]▶

>>41874198 #

I believe the article you linked to doesn’t seem to say anything about “nobody verifying PGP signatures”. We would need PyPI to publish their Datadog & Google Analytics data, but I’d say the set of users who actually verify OpenPGP signatures intersects with the set of users faking/scrambling telemetry.

replies(1): >>41874512 #

1. woodruffw ◴[17 Oct 24 22:36 UTC] No.41874512[source]▶

>>41874473 #

I wrote the blog post in question. The claim that "nobody is verifying PGP signatures (from PyPI)" comes from the fact that around 1/3rd had no discoverable public keys on what remains of the keyserver network.

Of the 2/3rd that did have discoverable keys, ~50% had no valid binding signature at the time of my audit, meaning that obtaining a living public key has worse-than-coin-toss odds for recent (>2020) PGP signatures on PyPI.

Combined, these datapoints (and a lack of public noise about signatures failing to verify) strongly suggest that nobody was attempting to verify PGP signatures from PyPI at any meaningful scale. This was more or less confirmed by the near-zero amount of feedback PyPI got once it disabled PGP uploads.

replies(1): >>41875814 #

2. opello ◴[18 Oct 24 02:16 UTC] No.41875814[source]▶

>>41874512 (TP) #

This all makes sense.

PEP 740 mentions:

> In their previously supported form on PyPI, PGP signatures satisfied considerations (1) and (3) above but not (2) (owing to the need for external keyservers and key distribution) or (4) (due to PGP signatures typically being constructed over just an input file, without any associated signed metadata).

It seems to me that the infrastructure investment in sigstore.dev vs. PGP seems arbitrary. For example, on the PGP side, PyPI keyserver and tooling to validate uploads as to address (2) above. And (4) being handled similar to PEP 740 with say signatures for provenance objects. Maybe the sigstore is "just way better" but it doesn't exactly seem so cut-and-dried of a technical argument from the things discussed in these commends and the linked material.

It's perfectly responsible to make a choice. It seems unclear just what the scope of work difference would be despite there being a somewhat implicit suggestion across the discussions and links in the comments that it was great. Maybe that's an unreasonable level of detail to expect? But with what seems to come across as "dogging on PGP" it seems what I've found disappointing with my casual brush with this particular instance of PGP coming up in the news.

replies(1): >>41875869 #

3. woodruffw ◴[18 Oct 24 02:29 UTC] No.41875869[source]▶

>>41875814 #

(2) is addressed by Sigstore having its own infrastructure and a full-time rotation staff. PyPI doesn't need to run or operationalize anything, which is a significant relief compared to the prospect of having to operationalize a PGP keyserver with volunteer staffing.

(I'm intentionally glossing over details here, like the fact that PyPI doesn't need to perform any online operations to validate Sigstore's signatures. The bottom line is that everything about it is operationally simpler and more modern than could be shaped out of the primitives PGP offers.)

(4) could be done with PGP, but would go against the long-standing pattern of "sign the file" that most PGP tooling is ossified around. It also doesn't change the fact that PGP's signing defaults aren't great, that there's a huge tail of junk signing keys out there, and that to address those problems PyPI would need to be in the business of parsing PGP packets during package upload. That's just not a good use of anybody's time.

replies(1): >>41876039 #

4. opello ◴[18 Oct 24 03:08 UTC] No.41876039{3}[source]▶

>>41875869 #

> having its own infrastructure

This seems like a different brand of the keyserver network?

> PyPI doesn't need to run or operationalize anything

So it's not a new operational dependency because it's index metadata? That seems more like an implementation detail (aside from the imagined PGP keyserver dependency) that seems accommodatable given either system.

> like the fact that PyPI doesn't need to perform any online operations to validate Sigstore's signatures

I may be missing something subtle (or glaring) but "online operations" would be interactions with some other service or a non-PSF service? Or simply a service not-wholly-pypi? Regardless, the index seems like it must be a "verifier" for design consideration (2) from PEP 740 to hold, which would mean that the index must perform the verification step on the uploaded data--which seems inconsequentially different between an imagined PGP system (other than it would have to access the imagined PyPI keyserver) and sigstore/in-toto.

> ... PyPI would need to be in the business of parsing PGP packets during package upload.

But the sigstore analog is the JSON array of in-toto attestation statement objects.

replies(1): >>41879728 #

5. woodruffw ◴[18 Oct 24 14:19 UTC] No.41879728{4}[source]▶

>>41876039 #

> This seems like a different brand of the keyserver network?

It serves a vaguely similar purpose, if that's what you mean. That shouldn't be surprising, since this is all PKI-shaped problems under the hood.

To reiterate: the operational constraints here are (1) simplicity and reliability for PyPI, plus secure defaults for the signing scheme itself. Running a PGP keyserver would not offer (1), and PGP as an ecosystem does not offer (2). This is even before desired properties, like strong identity binding, which PGP cannot offer in its current form.

> "online operations" would be interactions with some other service or a non-PSF service? Or simply a service not-wholly-pypi?

In the PGP setting, that means PyPI would need to pull from a keyserver. That keyserver would need to be one that PyPI doesn't control in order for the threat model to be coherent.

In the PEP 740 setting, PyPI does not need to pull any material from anywhere besides what the uploader is providing: the signatures in the attestations uploaded are signed with an attacked ephemeral signing certificate, which has a trusted publisher as its identity. That signing certificate can then be chained to an already established root of trust, in "normal" X.509 PKI fashion.

You could approximate this design with PGP, but none of the primitives currently exist (or if they exist, are inoperational).

> But the sigstore analog is the JSON array of in-toto attestation statement objects.

Yes. Believe it or not, a big ugly JSON blob is simpler than dealing with PGP's mess of packet versions and formats.

replies(1): >>41883521 #

6. opello ◴[18 Oct 24 21:11 UTC] No.41883521{5}[source]▶

>>41879728 #

> This is even before desired properties, like strong identity binding, which PGP cannot offer in its current form.

If the strong identity binding is OIDC then I disagree. It's convenient but no more evidence of identity than being able to unlock a private key.

> ... one that PyPI doesn't control in order for the threat model to be coherent.

This doesn't make sense unless the author key material was only ever published on the PyPI keyserver.

> PyPI does not need to pull any material from anywhere besides what the uploader is providing

> in "normal" X.509 PKI fashion.

What about checking for revocation?

replies(1): >>41883654 #

7. woodruffw ◴[18 Oct 24 21:28 UTC] No.41883654{6}[source]▶

>>41883521 #

> If the strong identity binding is OIDC then I disagree. It's convenient but no more evidence of identity than being able to unlock a private key.

Modulo the security of the OIDC provider, it's a very strong proof of identity. The world already assumes this in practice (via pervasive OAuth and OIDC in other contexts); all PEP 740 does is make the same assumption rather than trying to bolt strong identity onto PGP.

(I think there are good objections to OIDC, including the risk of centralization. But I've yet to see a better widely adopted system for publicly verifying identity claims.)

> This doesn't make sense unless the author key material was only ever published on the PyPI keyserver.

Is there any evidence that anybody is reconciling results from different PGP keyservers? I don't think anybody was doing this even at the peak of the SKS network.

(But even this assumes more sophistication than the average user is putting into verification: 99% of Python distribution installations aren't doing even hash-checking. Expecting that users will begin to do keyring curation isn't reasonable, and - critically - is not empirically supported by the last 2 decades of PGP support on PyPI.)

> What about checking for revocation?

PEP 740 assumes short-lived (~10 minute) signing certificates with ephemeral signing keys. Subscriber-level revocation hasn't scaled well for the Web PKI, so the underlying stack here (Sigstore) prefers limiting the scope of signing materials and enforcing transparency instead.

replies(1): >>41885010 #

8. opello ◴[19 Oct 24 01:47 UTC] No.41885010{7}[source]▶

>>41883654 #

> it's a very strong proof of identity

My point is that it is not stronger than the corresponding ability to unlock a private PGP key. OAuth/OIDC is convenient and has more friendly tooling, I concede this easily. But to make a claim that it's a strong proof of identity would require the account provider behind the OAuth to follow some sort of "know your customer"-like verification to claim more than "the request came from a system which has access to the account."

> I don't think anybody was doing this even at the peak of the SKS network.

Maybe not? Most of my interaction with PGP was as an attestation of content, I really produced this text. That was relevant in sub-networks of contacts which had mutual trust via signatures, or fingerprints shared through another medium.

But that doesn't change my perspective on the threat model. If there is an "evil PyPI" and end users need to have trust of the author, it seems to me the same trust relationship graph needs to be constructed. And that sigstore provides some means to do that means it's less work, and that's great and compelling. But it doesn't actually change the number of points to inspect to conclude "trusted," which seems to fail a simple test of either being more or less difficult to consider as the same number of trust checks need to occur.

> But even this assumes more sophistication than the average user is putting into verification

Is there some user story for how an end user might "build warm fuzzies" about trusting packages in the post-PEP 740 world? If pip (or some tool) ends up with functionality to effect the verification, the "expecting users to manage keyrings" argument seems to go away, as that tool would also be the logical place to drop that functionality. Or is it that `cosign --verify` is expected to be a lighter lift to document than `gpg --verify`?

I see how the ephemeral keys mean revocation of the signing keys is less of a concern. And even how transparency provides an avenue to discover misuse, so long as it is monitored. It seems like PyPI doing some consistency monitoring on behalf of authors would be required to make a claim of trustworthiness, which strikes me as an expanded operational concern.

↑