←back to thread

320 points willm | 1 comments | | HN request time: 0s | source
Show context
PaulHoule ◴[] No.45106346[source]
I went through a phase of writing asyncio servers for my side projects. Probably the most fun I had was writing things that were responsive in complex ways, such as a websockets server that was also listening on message queues or on a TCP connection to a Denon HEOS music player.

Eventually I wrote an "image sorter" that I found was hanging up when the browser was trying to download images in parallel, the image serving should not have been CPU bound, I was even using sendfile(), but I think other requests would hold up the CPU and would be block the tiny amount of CPU needed to set up that sendfile.

So I switched from aiohttp to the flask API and serve with either Flask or Gunicorn, I even front it with Microsoft IIS or nginx to handle the images so Python doesn't have to. It is a minor hassle because I develop on Windows so I have to run Gunicorn inside WSL2 but it works great and I don't have to think about server performance anymore.

replies(2): >>45106551 #>>45109325 #
tdumitrescu ◴[] No.45106551[source]
That's the main problem with evented servers in general isn't it? If any one of your workloads is cpu-intensive, it has the potential to block the serving of everything else on the same thread, so requests that should always be snappy can end up taking randomly long times in practice. Basically if you have any cpu-heavy work, it shouldn't go in that same server.
replies(5): >>45106641 #>>45107229 #>>45108270 #>>45109384 #>>45113420 #
1. materielle ◴[] No.45109384[source]
Traditionally, there are two strategies:

1) Use the network thread pool to also run application code. Then your entire program has to be super careful to not block or do CPU intensive work. This is efficient but leads to difficult to maintain programs.

2) The network thread pool passes work back and forth between an application executor. That way, the network thread pool is never starved by the application, since it is essentially two different work queues. This works great, but now every request performs multiple thread hops, which increases latency.

There has been a lot of interest lately to combine scheduling and work stealing algorithms to create a best of both worlds executor.

You could imagine, theoretically, an executor that auto-scales, and maintains different work queues and tries to avoid thread hops when possible. But ensures there are always threads available for the network.