Most active commenters
  • revvx(9)
  • lvh(7)
  • rocqua(6)
  • tialaramex(6)
  • AmericanChopper(3)

←back to thread

1318 points xvector | 118 comments | | HN request time: 1.844s | source | bottom
1. needle0 ◴[] No.19823806[source]
I’ll still keep using Firefox since I recognize the importance of browser diversity and the hazards of a Chrome monoculture (that and vertical tabs), but, yikes.

Still, this type of oversight seems all too common even in large companies. I remember several cases from Fortune 500 companies in the past few years alone. What would be a good way to automate checking for them? Has anyone developed a tool designed specifically to avoid certificate expiry disasters?

replies(18): >>19823825 #>>19823829 #>>19823831 #>>19823840 #>>19823848 #>>19823861 #>>19823913 #>>19823994 #>>19824009 #>>19824223 #>>19824243 #>>19824298 #>>19824668 #>>19824724 #>>19824795 #>>19824840 #>>19824927 #>>19825103 #
2. wbl ◴[] No.19823825[source]
We scan our codebase for anything that looks like a cert and send emails when it gets close. Might not have helped here if it was an intermediate owned by a CA. There but for the grace of God go I.
replies(2): >>19824231 #>>19824332 #
3. crazygringo ◴[] No.19823829[source]
That's a great question.

I've never seen a bulletproof solution for organizational tasks that need to be done yearly.

If someone's in charge... and both they and their manager happen to leave in the same year... and whatever system they had in place to remember (probably their personal calendars) is gone... and the manager's manager has 1,000 other things to remember...

...how does an organization ensure the task still gets done?

replies(4): >>19823893 #>>19823969 #>>19824149 #>>19824734 #
4. ◴[] No.19823831[source]
5. ShinTakuya ◴[] No.19823840[source]
It's not that complicated, just add scheduled health checks to the same system you use for checking if the website and such is up. If the expiry date isn't updated within a week of expiry start paging engineers.

I'm willing to bet Mozilla already does something like this but an engineer didn't set it up correctly for this certificate.

6. _wmd ◴[] No.19823848[source]
Let's not forget multiple mobile networks across Europe went down on the same day last year because Ericsson(?) let a cert expire on some internal management system that had not been updated. SSL cert renewal is one of the great unsolved problems in computer science

edit: not Europe, just UK and Japan apparently: https://www.zdnet.com/article/ericsson-expired-certificate-c...

replies(2): >>19823910 #>>19824709 #
7. kam ◴[] No.19823861[source]
ACME / Let's Encrypt go in the direction of making expiry happen so often that renewal gets automated, rather than a being a rare manual process that can be forgotten about.

Not sure that's viable for a signing certificate like this, but that's the way to solve it for the web PKI.

replies(3): >>19824049 #>>19824088 #>>19824159 #
8. emgee_1 ◴[] No.19823893[source]
Just one : Emacs orgmode
9. MrEldritch ◴[] No.19823910[source]
There was also an issue last year where every single Oculus Rift was essentially bricked because they forgot to renew a cert (apparently, what with the chaos of the Rift launch and the Facebook acquisition between the cert issuance and expiration, they just kind of ... lost track).

It took like two days before there was any kind of fix available, and they couldn't even roll it out automatically because the expiration had also disabled the auto-updating.

10. minetest2048 ◴[] No.19823913[source]
Talking about vertical tabs, I was in the middle of studying for an upcoming exam, then when I alt-tabbed back into Firefox, all of my tabs are missing with that unsupported addon error. Fortunately refreshing Firefox gave me back normal tabs, at a cost of uninstalling all of my addons.

The problem is that Tree Style Tabs relies on userchrome.css edit to hide the tab bar, and when TST is forcibly removed there is no way to access the tabs, because that edited userchrome.css is still there. This is very disruptive. At least with the pre WebExtension addon TST itself hides the tab bar, so if TST is removed then the original tab bar comes back on automatically

replies(1): >>19823951 #
11. frosted-flakes ◴[] No.19823951[source]
I have it set up so that the tab bar is only displayed if the menu bar is visible, and I can use the Alt key to toggle them together.

https://github.com/eoger/tabcenter-redux/wiki/Custom-CSS-Twe...

  #toolbar-menubar[inactive="true"] + #TabsToolbar {
    visibility: collapse !important;
  }
12. Complexicate ◴[] No.19823969[source]
"...how does an organization ensure the task still gets done?"

With something almost stupidly simple and low-tech: checklists.

(I'm reading "The Checklist Manifesto" right now, and the points it makes seem to fit perfectly with everything you mention.)

replies(1): >>19824387 #
13. revvx ◴[] No.19823994[source]
> Still, this type of oversight seems all too common even in large companies. (...) Has anyone developed a tool designed specifically to avoid certificate expiry disasters?

LetsEncrypt renewal is supposed to be automated. [1]

I know of a company that hosted blogs for thousands of customers. They used LetsEncrypt, but the CTO considered automatic renewals a possible security risk, so they did it manually. Problem is, the expiration happened in a weekend and they "forgot" to update the certificates before that. Suffice to say that the next Monday wasn't pleasant. They automated after that.

[1] https://letsencrypt.org/about/

replies(9): >>19824056 #>>19824264 #>>19824303 #>>19824403 #>>19824729 #>>19824926 #>>19825434 #>>19825826 #>>19826191 #
14. js2 ◴[] No.19824009[source]
You can find lots of programs like this one to monitor certs:

https://pypi.org/project/check-tls-certs/

I run one daily from cron and have it email me a report with the days to expiration for the certs I’m responsible for, even for certs that auto renew. I don’t filter the email. Daily is not too frequent for it to go to my inbox, but frequent enough that I’ll notice if it doesn’t mail me. YMMV.

replies(1): >>19824097 #
15. SomeHacker44 ◴[] No.19824049[source]
This is just abusive to the vast majority of users who do not care but still want to use SSL for their servers, frankly. I should be allowed to choose a near unlimited lifetime for my server's certificate if I don't care about the risks that may present.
replies(3): >>19824072 #>>19824221 #>>19824499 #
16. mc32 ◴[] No.19824056[source]
So did they conclude it wasn’t a security concern or did they conclude the security risk was worth the uptime?
replies(2): >>19824094 #>>19824258 #
17. httpsterio ◴[] No.19824072{3}[source]
As the service provider, you shouldn't get to decide. I think it's the users who can decide how long lived certs they're willing to trust.
replies(1): >>19824228 #
18. dev_dull ◴[] No.19824088[source]
It’s funny to me that people talk about this limitation as if it were some kind of virtue.
replies(2): >>19824114 #>>19824142 #
19. mehrdadn ◴[] No.19824094{3}[source]
I'm curious as well. My intuition would be that it's not a concern, since servers already keep their private keys stored locally in order to be able to communicate with clients anyway? Being able to update them doesn't really seem to make things any different. But I feel like I could be missing something/not have thought through it properly. (I imagine security implications can get more complicated if a different server decrypts traffic vs. processes it, etc.)
replies(1): >>19824339 #
20. dev_dull ◴[] No.19824097[source]
Discovery of all the certs is what I think is the harder problem.
replies(2): >>19824189 #>>19824206 #
21. tty2300 ◴[] No.19824114{3}[source]
Its also more secure. Long lived certs risk the possibility that someone who used to own the domain got a certificate on it and it still works after the domain is resold. Once you automate it there is no downside to short lived certs.
replies(1): >>19824233 #
22. lvh ◴[] No.19824142{3}[source]
Short-term certs _are_ a virtue. Not only do you not have a manual event rare enough for people to forget how to do it, you also don't have to worry about which 15 services someone granted a 10 year wildcard cert to early in the company's history.
replies(1): >>19824865 #
23. dguo ◴[] No.19824149[source]
There should be a separation between the things that need to get done and the people that do them. As in, tasks should be created first and then assigned.
24. jetrink ◴[] No.19824159[source]
See also: GPS vs GLONASS time encoding. GPS rolls over every 19 years, so devices, cars and even Boeing aircraft saw their GPS-based clocks turn back to 1999 last month. Meanwhile, GLONASS epochs are only four years long, so every device that uses it as a time reference is built to handle rollover.
25. human20190310 ◴[] No.19824189{3}[source]
I agree. What can be done to prevent developers from adding a certificate dependency without monitoring during the move-fast-and-break-things days of early development, which then sits for X years as developers come and go, and nobody notices until it fails?
replies(4): >>19824242 #>>19824331 #>>19824464 #>>19824969 #
26. stubish ◴[] No.19824206{3}[source]
We have an agent that pulls certs from an internal service and stores them on disk where apps can use them. We no longer manually install certificates. This solves discovery, and gives us alerts on services that have stopped refreshing their certs for any reason. The internal service is wired into lets encrypt and a commercial certificate provider. Setup is minimal, and after that completely automated.
27. lugg ◴[] No.19824221{3}[source]
Security tends towards the lowest common denominator. I'd rather you just figured out how to run a cron job.

The problem comes if your keys ever get compromised or cracked all your historical traffic becomes vulnerable instead of just the most recent window.

replies(1): >>19824270 #
28. otakucode ◴[] No.19824223[source]
>Has anyone developed a tool designed specifically to avoid certificate expiry disasters?

Is anything more than a calendar reminder on the phone of someone important enough to shake the Earth and get it fixed For. Certain. needed? Like, say, the CEO, CTO, and CFO should at a minimum get a notification so they can ask if the refresh was done when necessary?

replies(1): >>19826216 #
29. Godel_unicode ◴[] No.19824228{4}[source]
That's cool and all, but what percentage of users do you think even know certs expire? I'd put the over/under at 1%.
30. adtac ◴[] No.19824231[source]
Why do you have certificates in your code to begin with?
replies(1): >>19824457 #
31. Godel_unicode ◴[] No.19824233{4}[source]
If only there were a way to revoke certificates. Like, some kind of list.
replies(5): >>19824271 #>>19824320 #>>19824476 #>>19824509 #>>19824811 #
32. rhizome ◴[] No.19824242{4}[source]
>What can be done to prevent developers from adding a certificate dependency

Discipline? Experience? PIP?

33. cesarb ◴[] No.19824243[source]
> I’ll still keep using Firefox since I recognize the importance of browser diversity

Also, Chrome is not immune to "crashes for everyone at the same time" bugs. Like that time when the start of daylight saving time made it crash for a full day (a quick search tells me it probably was https://bugs.chromium.org/p/chromium/issues/detail?id=287821).

replies(2): >>19824389 #>>19824698 #
34. revvx ◴[] No.19824258{3}[source]
When pressed, they admitted it was just "gut feeling". The team audited a couple ACME clients and couldn't find anything to justify not automating.
replies(1): >>19824591 #
35. Abishek_Muthian ◴[] No.19824264[source]
Some shared hosting like Bluehost now provide LetsEncrypt by default for all their sites with auto-renewal (But I don't recommend Bluehost shared plans for anything even closer to serious hobby due to absurd downtimes like most other shared hosting).

I used manual renewal for LetsEncrypt for about 4 websites on other shared hosts & renewing them every 3 months was a pain; had to keep reminders and schedules just not to miss renewals until I synchronised their renewal schedules to batch (manual) renewing them.

I had automated renewal for 1 website on a cloud server, it was a one time effort, I never had to bother about SSL cert for that site and the most favourable of them all.

replies(3): >>19824323 #>>19824971 #>>19825046 #
36. gboudrias ◴[] No.19824270{4}[source]
Yeah "just" a cron job except the implementation changes several times a year. Somehow this automated process was more time-consuming than the previous, manual one.
replies(1): >>19824317 #
37. yjftsjthsd-h ◴[] No.19824271{5}[source]
If only such a list were actually effective rather than the majority of clients not bothering to check it.
38. luckylion ◴[] No.19824298[source]
Maybe we need more browser diversity than just two different teams with two different systems. Both are sitting very close to each other geographically, and both are produced in the same culture (as in silicon valley), so it would seem likely that, while they compete with each other, they will apply very similar answers to problems they face.
39. tedunangst ◴[] No.19824303[source]
I have no idea why you'd deliberately wait the full 90 days to do a manual renew. For reasons, I renew manually, but every 60 days or so. Nowhere close to the deadline.
replies(1): >>19824319 #
40. lvh ◴[] No.19824317{5}[source]
Many cloud providers will make this process pretty much entirely automated. But let's say you don't want to do that: when is the last time the way you run caddy changed? Or the last time python-certbot-nginx changed?
replies(1): >>19824338 #
41. revvx ◴[] No.19824319{3}[source]
Exactly.

Their FAQ [1] recommends exactly that: renewing every 60 days.

[1] https://letsencrypt.org/docs/faq/#what-is-the-lifetime-for-l...

42. lvh ◴[] No.19824320{5}[source]
CRLs do not work in practice, and major clients routinely ignore them.
43. revvx ◴[] No.19824323{3}[source]
Another option is using a Web Server/Reverse Proxy that supports Let's Encrypt automatically, like Caddy [1]. I believe Apache HTTPD has partial support [2], too.

[1] https://caddyserver.com

[2] https://httpd.apache.org/docs/2.4/mod/mod_md.html

replies(3): >>19824384 #>>19824418 #>>19824619 #
44. lvh ◴[] No.19824331{4}[source]
Certificate Transparency works pretty darn well for most usecases, we (Latacora) have found while trying to solve exactly this problem (or at least the figure out which certs exist that aren't being regularly re-issued part) :-)
replies(1): >>19825112 #
45. lvh ◴[] No.19824332[source]
If you want to get rid of those and they're public certs: odds are they're in Certificate Transparency logs and you can monitor them from there.
replies(1): >>19825076 #
46. gboudrias ◴[] No.19824338{6}[source]
This was a few years ago, so things may have changed by now. But as they say, once bitten twice shy, and the wisdom of "just cron it" doesn't work with highly experimental tools like LE was for what I estimate to be the majority of its lifetime.
replies(2): >>19824359 #>>19825252 #
47. revvx ◴[] No.19824339{4}[source]
The "manual" process used previously by the company already involved some form of automation, so it was more about trusting CertBot not to do anything horrendous.

But now that you mention it, I wonder what's the opinion of security experts like tptacek on cert renewal automation.

replies(1): >>19824703 #
48. lvh ◴[] No.19824359{7}[source]
I'm sure there's a way to make your LE experience consistently suck but the way to run caddy for a static website has been the same for about as long as caddy has had support for automatic HTTPS, and that's also true for python-nginx-certbot. But more importantly: we can argue about what it was 4 years ago, or we can just observe that it's really easy now.
49. nsomaru ◴[] No.19824384{4}[source]
Nginx works well and there's a tool that automates most of the extra config stuff for you.
50. marcosdumay ◴[] No.19824387{3}[source]
An year is enough time for everybody that knows about the checklist to leave.
replies(2): >>19824432 #>>19825167 #
51. username223 ◴[] No.19824389[source]
> "crashes for everyone at the same time" bugs

What else would you expect for auto-updating software that relies on the internet to work? It's a monoculture attached to a firehose of disease.

This is exactly the same as "pushing out a security fix to all users," except it apparently wasn't intentional. You can't have one without the other.

replies(1): >>19824775 #
52. n42 ◴[] No.19824403[source]
Just curious, are you talking about Webflow? Because I had to hunt down and make sure our Let's Encrypt auto renewal was working until I realized the certificate was served by them. They wait until the last 12 hours to renew the certificate. I have no idea what type of rationalization would lead to that decision.
replies(3): >>19824727 #>>19824834 #>>19826811 #
53. Abishek_Muthian ◴[] No.19824418{4}[source]
Apache HTTPD looks interesting, so using which we renew LetsEncrypt cert without using certbot?
replies(1): >>19824442 #
54. andrewflnr ◴[] No.19824432{4}[source]
Put "make sure someone else knows all this person's checklists" on the employee exit checklist.
replies(1): >>19824793 #
55. revvx ◴[] No.19824442{5}[source]
It requires some fiddling and it's in experimental state, but yes! Here's the documentation:

https://github.com/icing/mod_md/wiki/Migration

56. justinclift ◴[] No.19824457{3}[source]
If you have your own CA for whatever reason, it's common to distribute the root and intermediate certs with your code so things can resolve.

You don't ship the signing keys with the certs, as that would be bad. ;)

replies(1): >>19827020 #
57. technion ◴[] No.19824464{4}[source]
Whilst I'll say "disclaimer, this is my project", monitoring Certificate Transparency with CT Advisor has helped me find out about certificates marketing people deployed and expected me to maintain without my knowledge.

[0] https://ctadvisor.lolware.net/

58. dwaite ◴[] No.19824476{5}[source]
Revocation lists get huge, ultimately becoming another reason to limit cert lifetime (you don't have to tell people you revoked a certificate which is expired naturally).

Very few things check revocation, unfortunately - it puts an extra hop on the fast path of connecting to a server. OCSP stapling is pretty much the only thing a browser would care about - having the server fetch a signed OCSP response that is good for a limited period of time (say, hours), and send that along with the certificate during negotiation.

Or, you could just have the server fetch a certificate thats good for a limited period of time.

59. MrStonedOne ◴[] No.19824499{3}[source]
It's not your risk to decide on. You will not always own that domain name, and allowing you to still have a valid cert for it afterwards is silly.
replies(1): >>19825298 #
60. MrStonedOne ◴[] No.19824509{5}[source]
Revocation requires the private key
replies(1): >>19824796 #
61. dingaling ◴[] No.19824591{4}[source]
Having a root process with write-privileges to /etc on production machines and also able to communicate over the Internet definitely is a security risk.

To mitigate that you end-up building a series of privilege-restricted jobs flowing from the DMZ back into the internal network. And maintaining that might be more complicated than just manually renewing, depending upon the processes and architecture of the company.

replies(1): >>19824753 #
62. glitchcrab ◴[] No.19824619{4}[source]
Traefik is another option here for a reverse proxy with automated renewals; I use it in a ton of places.

https://traefik.io

63. bartread ◴[] No.19824668[source]
> Still, this type of oversight seems all too common even in large companies.

The npm self-signed certificate fiasco of early 2014 springs immediately to mind.

64. tssva ◴[] No.19824698[source]
That bug seems to have affected only users on Android versions earlier than 4.3 and in Brazil or Chile.
65. inflatableDodo ◴[] No.19824703{5}[source]
We could attempt a summoning. Quick, make a wildly inaccurate claim about the correct way to implement an encryption library.
66. AmericanChopper ◴[] No.19824709[source]
>SSL cert renewal is one of the great unsolved problems in computer science

Certificate expiry really only exists to make money for CAs. It doesn’t solve any security problem that CRLs don’t already solve (and solve better). There’s lots of unsolved problems relating to ‘how do you make a reliable PKI’, but cert expiry is really just an unrelated business requirement for CAs.

replies(4): >>19824751 #>>19824854 #>>19824875 #>>19825054 #
67. AmericanChopper ◴[] No.19824724[source]
There’s lots of monitoring services out there that do it. A long time ago I worked at place that used a service called site24x7 for cert and API monitoring. That was before Pingdom kinda got better than most API monitoring services, but I don’t know if they monitor cert expiry.

Taking a look around, you’ll find lots of service providers, or tools you could use. But the main issue is all they do is tell a human being to do something, which they can still fail to do. Which is why automating cert rotation (with things like let’s encrypt or ACM) is arguably a better solution than monitoring it.

68. tass ◴[] No.19824727{3}[source]
90 days is 4 times a year. 60 is 6 times, 50% more expensive when you’re paying someone to perform the task.
replies(1): >>19824768 #
69. owaislone ◴[] No.19824729[source]
They didn't have renew automatically but they could automate notifications, alerts or even banners in their internal apps when 60-70% of the time was exhausted. If I was given such a restriction, I'd still automate it 100% but require a human to authorize it every time by clicking a magic link in their email, slack or some dashboard, and nag them with notifications until someone authorized it.
70. darrenf ◴[] No.19824734[source]
Surely tasks are performed by and assigned to roles, not individuals (who just happen to be in those roles at some moment in time). If a role disappears, e.g. in redundancy, then the role's tasks are evaluated for either transfer to a role that remains, or being discarded.
71. kevingadd ◴[] No.19824751{3}[source]
I'd argue it's a blunt hammer extra layer of defense, where if a certificate gets compromised and the owner never finds out at least it eventually stops working. This kind of compromise is pretty common.
72. Whitestrake ◴[] No.19824753{5}[source]
Why would a process need to run as root or have write privileges to /etc in order to automate LetsEncrypt renewals?

I run Caddy (which uses acme-go/lego as its ACME provider) as a non-root user with no access to /etc at all. It seems to be running fine.

replies(2): >>19824866 #>>19825032 #
73. n42 ◴[] No.19824768{4}[source]
I had the same thought, but I still find that absurd. Say they host 500,000 websites with HTTPS. 1,000,000 renewals they save spread across the year, roughly 2 renewals a minute. That is pennies. A t2.medium could handle that type of load increase
replies(1): >>19824806 #
74. craftinator ◴[] No.19824775{3}[source]
I love "firehose of disease", and will steal it. And I agree that bugs are bugs; every time you add a new capability, you add all the possible bugs that can occur with that capability.
75. craftinator ◴[] No.19824793{5}[source]
Put the checklist on the home page of the company website!
76. kitotik ◴[] No.19824795[source]
Systems designed around long TTLs make this problem worse. I love the default of 90 days for Let’s Encrypt. It forces some good discipline and hygiene. Wish there was a better solution for short lived CAs
77. pinjiz ◴[] No.19824796{6}[source]
This is not true. In Let's Encrypt/ACME for example, you can simply obtain authorizations for all the domains a certificate is valid for and request revocation [1]. The only thing you still need to revoke the certificate, is the certificate itself. The certificate can be obtained from CT logs.

[1] https://tools.ietf.org/html/rfc8555#section-7.6

78. albru123 ◴[] No.19824806{5}[source]
A bit OT, but what's up with this usage of Amazon EC2 tiers as a unit of computational power?
replies(2): >>19825023 #>>19825079 #
79. pinjiz ◴[] No.19824811{5}[source]
OCSP stapling together with OCSP Must Staple is the way to go here. All major browsers support these.

Firefox still does normal OCSP requests, Chromes does not. So if you are a Chrome user, to my understanding, there is now way to know if the server certificate was revoked or not, other than OCSP stapling together with OCSP Must Staple. Additionally, both Chrome and Firefox ship a list of revoked certificates, but it may not be updated quickly enough and as far as i can tell it mostly contains roots and intermediates.

80. revvx ◴[] No.19824834{3}[source]
Nope, content marketing company
81. mattbillenstein ◴[] No.19824840[source]
I built a tool for checking ssl certs some time ago: https://ismycertexpired.com but I'm not checking intermediate certs...
82. jefftk ◴[] No.19824854{3}[source]
If it really was only to make money for CAs we'd see LetsEncrypt offering very long lifetime certs. But:

* Very short lifetimes get people to automate, preventing problems where one cert lasts long enough to lose the institutional knowledge around it.

* CRLs don't work. For performance you don't want to check for a revocation in serial with the request, and you don't want to block all browsing if the revocation list server is down. Revoking a cert will cover some users, but lots will still get "https://" and no warnings.

83. zimpenfish ◴[] No.19824865{4}[source]
Having once had to regenerate 600+ self-signed certs, test that everything still worked, and then insert them into the 600+ live app servers without breaking anything, all within a two week window because no-one had realised the 10 year expiry was just about to bring everything down, I concur.
84. tedunangst ◴[] No.19824866{6}[source]
Depends on setup, but frequently private keys are inaccessible to the web server worker process. (Which starts as root, loads keys, drops privs, etc.)
replies(1): >>19825026 #
85. Beldin ◴[] No.19824875{3}[source]
CRLs are not equivalent at all. They are a last-ditch effort to fix a problem when all else (expiry) has failed.

CRLs require maintenance and distribution of a list by a 3rd party. Creating an accurate, all-inclusive CRL of all website keys that your browser should reject is far, far from easy. (Case in point: "how many web sites are there?" Is not an easy question. )

Properly propagating such a list to any browser that might need it is another daunting task - less than 100% propagation means end users are exposed to security risks.

Certificate expiry is much more elegant: the client can check the certificate's validity himself, without relying on input from 3rd parties.

If certificates didn't expire, CRLs would (by now) be huge and growing enormously every day. They'd be so big that by the time you'd have downloaded one, it'd be outdated.

replies(1): >>19825048 #
86. pmontra ◴[] No.19824926[source]
It's automated but things can go wrong even when correctly configured and tested. Real world example: certbot version got old, the renewal server didn't support it anymore, the certificate didn't renew, the web site got the dreaded https warning page.

Of course that is also a kind of misconfiguration. The site has Debian security auto updates on but certbot is not among them. It should be forced to be updated. Furthermore there was no monitoring of errors in its log file.

Still it's not as simple as one believes Letsencrypt to be.

87. rixed ◴[] No.19824927[source]
> Has anyone developed a tool designed specifically to avoid certificate expiry disasters?

Not perfect, but I've added a TLS certificate extraction tool into a DPI that displays all visible certificates ordered by expiry date.

One could then mirror all one's site traffic to it and let it run in the background. Coupled with some alerting tool it would catch most of those cases I guess.

I could polish the tool a bit more if there is some interest, but anyone could do it as well.

See

https://github.com/rixed/junkie

and more specifically the plugin called 'sslogram'.

88. adrianN ◴[] No.19824969{4}[source]
Hook the alerting for expiring certificates into the library that is used for handling certificates, at least in debug builds.
89. king_phil ◴[] No.19824971{3}[source]
I own a webhosting provider. We offer Let's Encrypt with automatic issuing and renewal, securing 184,961 hostnames (SANs) at this moment.

We issue certificates automatically if none is existing when connecting to a website and renew the certificates in batches 30 days before they expire. When renewing, we merge certificates/hostnames into bigger certificates with 90 hostnames so we don't have so many moving parts.

If renewal would break, however (as it did once or twice before), nothing bad would happen because on page load there would be a new certificate issued.

90. rocqua ◴[] No.19825023{6}[source]
It is a clearly priced unit of computational power maybe?
91. tialaramex ◴[] No.19825026{7}[source]
Most popular ACME (Let's Encrypt) clients allow you to provide a CSR instead of generating the keys themselves. That means a bunch more work for you, but if you're worried about this, that's what you should do. Have your safe (even manual if you insist) process make keys, make CSRs for the keys, and put those somewhere readable. The ACME client will hand them over to the CA saying "I want certs corresponding to these CSRs" without needing access to your TLS private keys at all.
replies(1): >>19825034 #
92. rocqua ◴[] No.19825032{6}[source]
Using http renewal requires listening on port 80 which, by default, requires root.
replies(3): >>19825104 #>>19825110 #>>19826112 #
93. rocqua ◴[] No.19825034{8}[source]
That does mean you aren't automatically rotating keys anymore.
replies(1): >>19826292 #
94. schwurb ◴[] No.19825046{3}[source]
> had to keep reminders and schedules just not to miss renewals until I synchronised their renewal schedules to batch (manual) renewing them.

Another use case for the app I am developing! The basic idea: You can enter an item (i.e. "MyOwnShop Cert") into the list. From that time on, it will be tracked how much time passed since the item was entered or renewed (by clicking the renew button). The item with the longest time since entering/renewing is at the top of the list.

Compared to schedules and reminders it has the advantage that the item is not out of our mind once the reminder or schedule pasts. It just sits there dutifully and its timer keeps increasing.

I use it for keeping up with middle-term contacts ("Wow, I have not written Carl for 3 weeks?") and health-related issues. Logging in stuff that easily spoils would be another use case. And, apparently, cert renewals :)

95. tialaramex ◴[] No.19825048{4}[source]
CRLs can be sharded, the cert carries the URL for the relevent CRL inside it. So they wouldn't need to have grown as huge as you suggest.

But, this sharing carries a cost for user privacy, if I shard certs 16 ways then each CRL download gives me 4 bits of info about which sites you were visiting.

OCSP effectively takes this to the extreme, each lookup is tiny because it's just for one cert, but it gives away exactly which cert you cared about each time.

replies(1): >>19825070 #
96. rocqua ◴[] No.19825054{3}[source]
Cert revocation suffers from a very simple issue. If your check for revocation fails, do you fail open (ie accept the cert) or fail closed (ie reject the cert).

For any method, fail closed is user hostile and often a DOS vulnerability whilst fail open is another way for an attacker to use a revoked cert.

This is a big issue with on-line methods like OCSP as a MitM using a bad cert can probably block OCSP traffic as well.

CSLRs grow out of proportion, and leak information to the outside world.

Cert expiry serves as a backstop to these other revocation methods, and as a bonus ensures that simply forgetting about a cert cannot bite you 10 years later.

replies(1): >>19825183 #
97. rocqua ◴[] No.19825070{5}[source]
Besides leaking data by on demand CLR checking, you also have a difficult fail open v fail closed decision.

Failing closed means failure of a third party immediately breaks your site. Failing open means a MitM can simply block the CRL check.

OCSP stapling and the 'must staple' header are a lot better for privacy, and OCSP responses have some validity so at least a 5 hour outage of your CA doesn't bring your site down immediately.

It is still vulnerable to a DOS and trust on first use though.

replies(1): >>19825872 #
98. tialaramex ◴[] No.19825076{3}[source]
Monitoring CT lets you verify that somebody renewed the certificate, but it doesn't verify they actually installed the replacement correctly.

My employer (Kynd.io) currently monitors public web sites for customers so we can flag e.g. "Hey this site cert expires in a week! If it's dead probably just switch it off, otherwise renew the certificate" and we're in the process of integrating CT but mostly so we can say "You already have a newer cert but need to go install it" in our How To Fix instructions.

99. rickycook ◴[] No.19825079{6}[source]
i think it’s a combined “fixed cost” rather than just computational power... like you could do it with x, thus it should cost at most y

similar to saying that you could do it with a raspberry pi

100. nothrabannosir ◴[] No.19825103[source]
Realistically: reduce your own cert renewal window to weekly, if not daily. This forces you to have a good renewal system in place and alerts you to failures long before actual expiration.

Quixotically: make cert failure a randomised number, linearly related to how long ago the cert expired. This slowly introduces more and more failures, over a certain “grace period”, which makes the problem less of an extinction level event. It’s not a solution but it definitely would help.

101. Whitestrake ◴[] No.19825104{7}[source]
This is technically true, but contextually lacking.

acme-go/lego doesn't use HTTP validation unless you disable just about every other form of validation first. TLS-ALPN validation is much more likely, so port 443.

That said, it is very easy to allow software to bind to privileged ports without providing it root access; this has been solved for a very, very long time.

102. ◴[] No.19825110{7}[source]
103. tialaramex ◴[] No.19825112{5}[source]
Caveats:

Certificates that aren't from the Web PKI almost invariably won't be logged. Most logs explicitly refuse everything except certs from the Web PKI so as not to be burdened storing garbage. So this won't find certs issued by the custom OpenSSL CA on that one guys Linux laptop.

Not all Web PKI certs are logged. There is no BR obligation and no root store programme rule that requires logging. The only things in place that strongly encourage logging are the Chrome and Safari policies. For systems that aren't designed to be accessed with a web browser or, much more rarely, enterprises that have persuaded themselves only IE is authorised anyway, the certs might deliberately not be logged. Yes there are (small) CAs doing this in the Web PKI, on purpose, in 2019.

replies(1): >>19827451 #
104. BuckRogers ◴[] No.19825167{4}[source]
We resolved this issue at my last company with sufficiently large mailing groups for cert renewal reminders. Once you get to 12 people on a mailing list, with new employees being added all the time, it's hard to miss. Usually a manager on that list is pinging people about it. There is the chance of the tragedy of the commons occurring, but I never saw it.

Once you do this, the only checklist that matters are procedural checklists to add a new client or new cert to the renewal notification list. When you use a standard group email for all cert purchases, that one becomes tough to miss.

In my 7 years of being involved, we never missed a cert renewal with this process for ~300 client sites with multiple or wildcard certs.

105. AmericanChopper ◴[] No.19825183{4}[source]
All TLS failures fail closed. The idea that if a cert is compromised it will eventually expire sometime within the next five years is a completely laughable security control. Leaking information is a complete non-concern too. Have you heard of certificate transparency logs?

Short lived certs are quite obviously better from a security perspective, but the security difference between a certificate that expires in five years, and one that expires never is irrelevant.

replies(1): >>19825293 #
106. lugg ◴[] No.19825252{7}[source]
A tool not working well or being "experimental" does not dismiss the premise that frequently run automated tools are a better than infrequently run manual tasks when those manual tasks can take down your infrastructure if done improperly, missed or forgotten.

All it being new means is that depending on your risk ratio you need to decide whether updates to the software need testing or whether you need to invest in your own solution - or, how about just wait until it matures and keep the old process until then.

Waiting doesn't invalidate the premise either. It just means you lack the resources to implement it safely and that's ok.

107. rocqua ◴[] No.19825293{5}[source]
A missing OCSP response does not fail closed, nor does a CLR url 404-ing fail closed.

The information leakage of CRLs is stating to the public that a cert needed to be revoked.

Obviously, a compromised cert that will expire in 5 years is horrible. However, a non compromised cert you are no longer using that will never expire is more off a risk than a disused cert that will expire in a year. Not to say you should leave the one year cert lying around. However, there is no desire to put the one year cert on a pre-shipped CLR.

108. pmontra ◴[] No.19825298{4}[source]
Actually it could be not negligence but a way to perform an attack.

Register a domain, get a certificate lasting forever, let the domain expire and somebody buy it. Then somehow redirect all or part of the traffic to that domain to your own server with a valid certificate. Chances are that few people will notice something has changed in the details of the certificate.

However you'll have left traces all over the place: credit cards, phone numbers, etc.

109. magicalhippo ◴[] No.19825434[source]
Then the automatic update process stops for some reason and your certificate expires...

At the end of the day, someone needs to verify that new certificates gets acquired and installed before the old ones expire. Automation makes acquiring them less tedious, but not much for making sure someone pays attention.

110. bluejay2387 ◴[] No.19825826[source]
We update automatically AND manually check periodically to make sure the update took place. That company must be overly fond of drama...
111. tialaramex ◴[] No.19825872{6}[source]
I would like to live in a world where OCSP stapling is widely deployed and we can require OCSP and advise people to set must-staple if possible while everybody who doesn't staple will just have to eat the privacy implications. But this is not (yet and for the foreseeable future) that world.

Apache and nginx both shipped OCSP stapling implementations that are very bad, awful enough that for almost anyone I'd say "No, don't enable that" rather than try to explain how they need to use it and get them to a place where it's useful and safe. Adam Langley wrote years ago about how to do this correctly, and there does seem to be a little bit of movement in the correct direction at Apache, but the situation remains pretty poor.

112. revvx ◴[] No.19826112{7}[source]
You can just use the web server that is already running on the machine.

You (normally) don't want downtime in your website, so you just let your regular webserver serve the acme challenge instead of stopping it.

113. djhaskin987 ◴[] No.19826191[source]
Didn't Mozilla invent Let's encrypt? That would make this disaster doubly embarrassing.
114. tialaramex ◴[] No.19826216[source]
Admin people. Often the most senior ones get the title "Personal Assistant (to senior person job title)" but not always. They're lead bureaucrats, and tracking things that need to be done and ensuring they get done, either by doing them themselves or assigning them to reliable underlings is the purpose of their role.

Corporations are often not very good at putting the right people in these roles but good ones are invaluable. Since the Marvel Universe is everywhere, Pepper Potts is the archetype in that setting to give you an idea of why you'd need people like this. Tony Stark would be "too busy" to renew the certificates, but Pepper would make sure it gets done.

115. revvx ◴[] No.19826292{9}[source]
If you trust your automation, you put private key rotation into it.

If you don't trust it your automation, you rotate the keys manually, as you would normally.

There are no valid reasons to throw the baby away with the bathwater.

116. brryant ◴[] No.19826811{3}[source]
Not webflow. We auto renew way before LE expires the cert.
117. justinclift ◴[] No.19827020{4}[source]
s/resolve/validate/
118. lvh ◴[] No.19827451{6}[source]
You can tell ACM your CT preference!

(But seriously, sure you’re right but for my audience (which is essentially Latacora’s and HN’s), CT is fine.)