←back to thread

240 points yusufaytas | 1 comments | | HN request time: 0.202s | source
Show context
jroseattle ◴[] No.41895613[source]
We reviewed Redis back in 2018 as a potential solution for our use case. In the end, we opted for a less sexy solution (not Redis) that never failed us, no joke.

Our use case: handing out a ticket (something with an identifier) from a finite set of tickets from a campaign. It's something akin to Ticketmaster allocating seats in a venue for a concert. Our operation was as you might expect: provide a ticket to a request if one is available, assign some metadata from the request to the allocated ticket, and remove it from consideration for future client requests.

We had failed campaigns in the past (over-allocation, under-allocation, duplicate allocation, etc.) so our concern was accuracy. Clients would connect and request a ticket; we wanted to exclusively distribute only the set of tickets available from the pool. If the number of client requests exceeded the number of tickets, the system should protect for that.

We tried Redis, including the naive implementation of getting the lock, checking the lock, doing our thing, releasing the lock. It was ok, but administrative overhead was a lot for us at the time. I'm glad we didn't go that route, though.

We ultimately settled on...Postgres. Our "distributed lock" was just a composite UPDATE statement using some Postgres-specific features. We effectively turned requests into a SET operation, where the database would return either a record that indicated the request was successful, or something that indicated it failed. ACID transactions for the win!

With accuracy solved, we next looked at scale/performance. We didn't need to support millions of requests/sec, but we did have some spikiness thresholds. We were able to optimize read/write db instances within our cluster, and strategically load larger/higher-demand campaigns to allocated systems. We continued to improve on optimization over two years, but not once did we ever have a campaign with ticket distribution failures.

Note: I am not an expert of any kind in distributed-lock technology. I'm just someone who did their homework, focused on the problem to be solved, and found a solution after trying a few things.

replies(8): >>41895681 #>>41895829 #>>41895977 #>>41896180 #>>41896281 #>>41896833 #>>41897029 #>>41897993 #
wwarner ◴[] No.41895681[source]
This is the best way, and actually the only sensible way to approach the problem. I first read about it here https://code.flickr.net/2010/02/08/ticket-servers-distribute...
replies(1): >>41895868 #
hansvm ◴[] No.41895868[source]
> only sensible way

That's a bit strong. Like most of engineering, it depends. Postgres is a good solution if you only have maybe 100k QPS, the locks are logically (if not necessarily fully physically) partially independent, and they aren't held for long. Break any of those constraints, or add anything weird (inefficient postgres clients, high DB load, ...), and you start having to explore either removing those seeming constraints or using other solutions.

replies(1): >>41895896 #
wwarner ◴[] No.41895896[source]
Ok fair; I'm not really talking about postgres (the link i shared uses mysql). I'm saying that creating a ticket server that just issues and persists unique tokens, is a way to provide coordination between loosely coupled applications.
replies(1): >>41896096 #
1. zbobet2012 ◴[] No.41896096[source]
Yeah that's cookies. They are great.