←back to thread

698 points jgrahamc | 2 comments | | HN request time: 0.001s | source
Show context
_wmd ◴[] No.20422740[source]
So in response to a catastrophic failure due to testing in prod, they're going to push out a brand new regex engine with an ETA of 2 weeks. Can anyone say testing in prod?

The constant use of 'I' and 'me' (19 occurrences in total) deeply tarnishes this report, and repeatedly singling out a responsible engineer, nameless or not, is a failure in its own right. This was a collective failure, any individual identity is totally irrelevant. We're not looking for an account of your superman-like heroism, sprinting from meeting rooms or otherwise, we want to know whether anything has been learned in the 2 years since Cloudflare leaked heap all across the Internet without noticing, and the answer to that seems fantastically clear.

replies(6): >>20422871 #>>20422873 #>>20422891 #>>20422903 #>>20422924 #>>20424743 #
jgrahamc ◴[] No.20422903[source]
This report is written by me, the CTO of Cloudflare. I say "I" throughout because organizational failings are my responsibilty. If I'd said "we" I imagine you'd be criticizing me for NOT taking responsibility.

If you read the report you'd see I do not blame the engineer responsible at all. Not once. I made that perfectly clear.

replies(1): >>20423631 #
pvg ◴[] No.20423631[source]
I wonder if you are able to talk a bit about the development of the Lua-based WAF. I imagine the possible unbounded performance of feeding requests into PCRE must have occurred to you or others at the time - or at least, long before this outage.

I don't mean this as some sort of lame 'lol shoulda known better' dunk - stories about technical organizations' decision-making and tradeoff-handling are just more interesting than the details of how regexes typed in a control panel grow up to become Jira tickets.

replies(1): >>20434247 #
1. jgrahamc ◴[] No.20434247[source]
I did a talk about this years ago: https://www.youtube.com/watch?v=nlt4XKhucS4
replies(1): >>20434419 #
2. pvg ◴[] No.20434419[source]
It sounds like one of the primary factors was compatibility with existing (or customer-provided) mod_security rules, if I've understood 1.75x speed hyper-you right.