Why HN was slow and how Rtm fixed it

(ycombinator.com)

Show context

rarrrrrr ◴[19 Jan 11 20:49 UTC] No.2121225[source]▶

>>2120756 (OP) #

Since no one has mentioned it yet - Varnish-cache.org, written by a FreeBSD kernel hacker, has a very nice feature, in that it will put all overlapping concurrent requests for the same cacheable resource "on hold", only fetch that resource once from the backend, then serve the same copy to all. Nearly all the expensive content on HN would be cacheable by varnish. Then you can get it down to pretty close to "1 backend request per content change" and stop worrying about how arbitrarily slow the actual backend server is, how many threads, how you deal with the socket, garbage collection, and all that.

replies(4): >>2121261 #>>2121274 #>>2121319 #>>2122946 #

1. nuclear_eclipse ◴[19 Jan 11 21:21 UTC] No.2121319[source]▶

>>2121225 #

Reverse proxies won't work for HN, because requests for the same resource from multiple users can't use the same results. Not only are certain bits of info customized for the user (like your name/link at the top), but even things like the comments and links are custom per user.

Things like users' showdead value, as well as whether the user is deaded, can drastically change the output of each page. Eg, comments by a deaded user won't show as dead to that user, but they will for everyone else...

replies(5): >>2121335 #>>2121419 #>>2121430 #>>2121544 #>>2122733 #

2. rfugger ◴[19 Jan 11 21:26 UTC] No.2121335[source]▶

>>2121319 (TP) #

To make this work you could do all the per-user stuff with javascript and ajax calls in the browser. It would be quite a bit of revamping though.

3. Aaronontheweb ◴[19 Jan 11 21:53 UTC] No.2121419[source]▶

>>2121319 (TP) #

So, you can't do donut caching in Varnish?

4. aonic ◴[19 Jan 11 21:57 UTC] No.2121430[source]▶

>>2121319 (TP) #

Varnish supports edge side includes. The header bar could be an ESI and the rest of the page could be cached

replies(1): >>2121555 #

5. jjoe ◴[19 Jan 11 22:34 UTC] No.2121544[source]▶

>>2121319 (TP) #

There's cookie-based caching in Varnish (and in some other proxy caches too). Essentially, the key is going to be made of the usual hash + the cookie like this:

sub vcl_hash { set req.hash += req.http.cookie; }

What this means is that the cache is per-logged-in-user and pretty much personalized. The server's going to need a lot more RAM than usual. You can set a low TTL on the cache entries so they're flushed and not kept in memory indefinitely. But the performance boost is great.

This is not recommended as an always-on measure. We wrote an entry about accomplishing something similar w/ python&varnish. Here it is if you're interesting in reading about it: http://blog.unixy.net/2010/11/3-state-throttle-web-server/

Regards

6. nuclear_eclipse ◴[19 Jan 11 22:37 UTC] No.2121555[source]▶

>>2121430 #

> and the rest of the page could be cached

Except they can't, for the reasons I mentioned above. Eg, if my account is deaded, when I view a thread with one of my own comments, it looks different than if someone else was viewing that some thread, especially for those of us with or without showdead checked in our profiles.

Its not as straightforward as you would like it to be.

replies(2): >>2121915 #>>2122019 #

7. aonic ◴[20 Jan 11 00:14 UTC] No.2121915{3}[source]▶

>>2121555 #

Special cookies could be set for dead users and users who enable showdead to bypass the cache.

For example, one of the sites I run has about 50K pageviews/day by logged in users, and another 600K pageviews/day by anonymous users coming from referrals or search engines. Logged in users have similar customization options so we bypass cache for these users by detecting a cookie.

Obviously going the cache route would require some changes to how things are setup, its not a turn-key solution. But the insignificant amount of changes are well worth it for most content sites, but for a user generated content site like HN it would also depend on how the TTLs and cache purging are setup.

8. nitrogen ◴[20 Jan 11 00:47 UTC] No.2122019{3}[source]▶

>>2121555 #

The majority of requests probably come from live accounts in good standing or from people not even logged in, so the majority of requests could still be cached.

replies(1): >>2122860 #

9. piotrSikora ◴[20 Jan 11 05:09 UTC] No.2122733[source]▶

>>2121319 (TP) #

Of course it will work. The whole point of reverse-proxy is to buffer slow requests and send them fast over LAN to your back-end servers that cannot handle high concurrency efficiently.

The FreeBSD's accept_filter() used by Rtm does more or less that (you can think of it as of reverse-proxy in the kernel), but it only works for plain HTTP and HEAD/GET methods.

10. nkurz ◴[20 Jan 11 06:05 UTC] No.2122860{4}[source]▶

>>2122019 #

Interesting point: what percentage of viewers are logged in? I was presuming it was high, but I guess I really don't know.

↑