←back to thread

698 points jgrahamc | 1 comments | | HN request time: 0.216s | source
Show context
blr246 ◴[] No.20422316[source]
Appreciate the detail here. It's a great writeup. Wondering what folks think about one of the changes:

  5. Changing the SOP to do staged rollouts of rules in
     the same manner used for other software at Cloudflare
     while retaining the ability to do emergency global
     deployment for active attacks.
One concern I'd have is whether or not I'm exercising the global rollout procedure often enough to be confident it works when it's needed. Of the hundreds of WAF rule changes rolled out every month, how many are global emergencies?

It's a fact of managing process that branches are liability and the hot path is the thing that will have the highest level of reliability. I wonder if anyone there has concerns about diluting the rapid response path (the one having the highest associated risk) by making this process change.

edit: fix verbatim formatting

replies(4): >>20422597 #>>20422684 #>>20425628 #>>20425800 #
nullwasamistake ◴[] No.20425628[source]
The main problem is that their Regex library doesn't have a recrusion limit. I'm honestly amazed they've been able to scale Lua scripts to the point they can use it as a global WAF. Knowing this, it may be easy to create attacks against their filters.

My takeaway is that it's time to move to a custom solution using a more flexible language. A simple async watchdog on total rule execution time would have prevented this. When running tons of Regex rules I'm amazed they didn't have this

replies(2): >>20426124 #>>20427625 #
1. jacques_chester ◴[] No.20427625[source]
I'm interested in why they wouldn't use LPeg instead. Those seem a lot easier to compose, reason about and debug; plus they have restricted backtracking.