←back to thread

240 points yusufaytas | 1 comments | | HN request time: 0s | source
Show context
jroseattle ◴[] No.41895613[source]
We reviewed Redis back in 2018 as a potential solution for our use case. In the end, we opted for a less sexy solution (not Redis) that never failed us, no joke.

Our use case: handing out a ticket (something with an identifier) from a finite set of tickets from a campaign. It's something akin to Ticketmaster allocating seats in a venue for a concert. Our operation was as you might expect: provide a ticket to a request if one is available, assign some metadata from the request to the allocated ticket, and remove it from consideration for future client requests.

We had failed campaigns in the past (over-allocation, under-allocation, duplicate allocation, etc.) so our concern was accuracy. Clients would connect and request a ticket; we wanted to exclusively distribute only the set of tickets available from the pool. If the number of client requests exceeded the number of tickets, the system should protect for that.

We tried Redis, including the naive implementation of getting the lock, checking the lock, doing our thing, releasing the lock. It was ok, but administrative overhead was a lot for us at the time. I'm glad we didn't go that route, though.

We ultimately settled on...Postgres. Our "distributed lock" was just a composite UPDATE statement using some Postgres-specific features. We effectively turned requests into a SET operation, where the database would return either a record that indicated the request was successful, or something that indicated it failed. ACID transactions for the win!

With accuracy solved, we next looked at scale/performance. We didn't need to support millions of requests/sec, but we did have some spikiness thresholds. We were able to optimize read/write db instances within our cluster, and strategically load larger/higher-demand campaigns to allocated systems. We continued to improve on optimization over two years, but not once did we ever have a campaign with ticket distribution failures.

Note: I am not an expert of any kind in distributed-lock technology. I'm just someone who did their homework, focused on the problem to be solved, and found a solution after trying a few things.

replies(8): >>41895681 #>>41895829 #>>41895977 #>>41896180 #>>41896281 #>>41896833 #>>41897029 #>>41897993 #
stickfigure ◴[] No.41895977[source]
I think this illustrates something important, which is that: You don't need locking. You need <some high-level business constraint that might or might not require some form of locking>.

In your case, the constraint is "don't sell more than N tickets". For most realistic traffic volumes for that kind of problem, you can solve it with traditional rdbms transactional behavior and let it manage whatever locking it uses internally.

I wish developers were a lot slower to reach for "I'll build distributed locks". There's almost always a better answer, but it's specific to each application.

replies(1): >>41900150 #
1. jroseattle ◴[] No.41900150[source]
This is exactly how we arrived at our solution. We needed to satisfy the constraint; locking was one means of addressing the constraint.

Maybe we were lucky in our implementation, but a key factor for our decision was understanding how to manage the systems in our environment. We would have skilled up with Redis, but we felt our Postgres solution would be a good first step. We just haven't had a need to go to a second step yet.