Most active commenters
  • hinkley(3)

←back to thread

205 points anurag | 15 comments | | HN request time: 0s | source | bottom
Show context
shanemhansen ◴[] No.45765342[source]
The unreasonable effectiveness of profiling and digging deep strikes again.
replies(1): >>45776616 #
hinkley ◴[] No.45776616[source]
The biggest tool in the performance toolbox is stubbornness. Without it all the mechanical sympathy in the world will go unexploited.

There’s about a factor of 3 improvement that can be made to most code after the profiler has given up. That probably means there are better profilers than could be written, but in 20 years of having them I’ve only seen 2 that tried. Sadly I think flame graphs made profiling more accessible to the unmotivated but didn’t actually improve overall results.

replies(4): >>45777180 #>>45777265 #>>45777691 #>>45783146 #
Negitivefrags ◴[] No.45777265[source]
I think the biggest tool is higher expectations. Most programmers really haven't come to grips with the idea that computers are fast.

If you see a database query that takes 1 hour to run, and only touches a few gb of data, you should be thinking "Well nvme bandwidth is multiple gigabytes per second, why can't it run in 1 second or less?"

The idea that anyone would accept a request to a website taking longer than 30ms, (the time it takes for a game to render it's entire world including both the CPU and GPU parts at 60fps) is insane, and nobody should really accept it, but we commonly do.

replies(4): >>45777574 #>>45777649 #>>45777878 #>>45779600 #
1. javier2 ◴[] No.45777574[source]
its also about cost. My game computer has 8 cores + 1 expensive gpu + 32GB ram for me alone. We dont have that per customer.
replies(3): >>45777680 #>>45777764 #>>45778893 #
2. avidiax ◴[] No.45777680[source]
It's also about revenue.

Uber could run the complete global rider/driver flow from a single server.

It doesn't, in part because all of those individual trips earn $1 or more each, so it's perfectly acceptable to the business to be more more inefficient and use hundreds of servers for this task.

Similarly, a small website taking 150ms to render the page only matters if the lost productivity costs less than the engineering time to fix it, and even then, only makes sense if that engineering time isn't more productively used to add features or reliability.

replies(2): >>45779777 #>>45783407 #
3. oivey ◴[] No.45777764[source]
This is again a problem understanding that computers are fast. A toaster can run an old 3D game like Quake at hundreds of FPS. A website primarily displaying text should be way faster. The reasons websites often aren’t have nothing to do with the user’s computer.
replies(1): >>45778020 #
4. paulryanrogers ◴[] No.45778020[source]
That's a dedicated toaster serving only one client. Websites usually aren't backed by bare metal per visitor.
replies(1): >>45778187 #
5. oivey ◴[] No.45778187{3}[source]
Right. I’m replying to someone talking about their personal computer.
6. Aeolun ◴[] No.45778893[source]
If your websites take less than 16ms to serve, you can serve 60 customers per second with that. So you sorta do have it per customer?
replies(3): >>45779205 #>>45780299 #>>45786223 #
7. vlovich123 ◴[] No.45779205[source]
That’s per core assuming the 16ms is CPU bound activity (so 100 cores would serve 100 customers). If it’s I/O you can overlap a lot of customers since a single core could easily keep track of thousands of in flight requests.
8. onethumb ◴[] No.45779777[source]
Uber could not run the complete global rider/driver flow from a single server.
replies(2): >>45780373 #>>45783170 #
9. OJFord ◴[] No.45780299[source]
With a latency of up to 984ms
10. exe34 ◴[] No.45780373{3}[source]
I believe the argument was that somebody competent could do it.
replies(1): >>45787003 #
11. avidiax ◴[] No.45783170{3}[source]
I'm saying you can keep track of all the riders and drivers, matchmake, start/progress/complete trips, with a single server, for the entire world.

Billing, serving assets like map tiles, etc. not included.

Some key things to understand:

* The scale of Uber is not that high. A big city surely has < 10,000 drivers simultaneously, probably less than 1,000.

* The driver and rider phones participate in the state keeping. They send updates every 4 seconds, but they only have to be online to start a trip. Both mobiles cache a trip log that gets uploaded when network is available.

* Since driver/rider send updates every 4 seconds, and since you don't need to be online to continue or end a trip, you don't even need an active spare for the server. A hot spare can rebuild the world state in 4 seconds. State for a rider and driver is just a few bytes each for id, position and status.

* Since you'll have the rider and driver trip logs from their phones, you don't necessarily have to log the ride server side either. Its also OK to lose a little data on the server. You can use UDP.

Don't forget that in the olden times, all the taxis in a city like New York were dispatched by humans. All the police in the city were dispatched by humans. You can replace a building of dispatchers with a good server and mobile hardware working together.

replies(1): >>45783447 #
12. hinkley ◴[] No.45783407[source]
Practically, you have to parcel out points of contention to a larger and larger team to stop them from spending 30 hours a week just coordinating for changes to the servers. So the servers divide to follow Conway’s Law, or the company goes bankrupt (why not both?).

Microservices try to fix that. But then you need bin packing so microservices beget kubernetes.

13. hinkley ◴[] No.45783447{4}[source]
You could envision a system that used one server per county and that’s 3k servers. Combine rural counties to get that down to 1000, and that’s probably less servers than uber runs.

What the internet will tell me is that uber has 4500 distinct services, which is more services than there are counties in the US.

14. javier2 ◴[] No.45786223[source]
Im just saying that we dont have gaming pc specs per customer to chug that 7GB of data for every request in 30ms
15. lazide ◴[] No.45787003{4}[source]
The reality is that, no, that is not possible. If a single core can render and return a web page in 16ms, what do you do when you have a million requests/sec?

The reality is most of those requests (now) get mixed in with a firehose of traffic, and could be served much faster than 16ms if that is all that was going on. But it’s never all that is going on.