Why HN was slow and how Rtm fixed it

1. mmaunder ◴[19 Jan 11 22:16 UTC] No.2121495[source]▶

>>2120756 (OP) #

"In 7 seconds, a hundred or more connections accumulate. So the server ends up with hundreds of threads, most of them probably waiting for input (waiting for the HTTP request). MzScheme can be inefficient when there are 100s of threads waiting for input -- when it wants to find a thread to run, it asks the O/S kernel about each socket in turn to see if any input is ready, and that's a lot of asking per thread switch if there are lots of threads. So the server is able to complete fewer requests per second when there is a big backlog, which lets more backlog accumulate, and perhaps it takes a long time for the server to recover."

I may have misunderstood but it sounds like you have MzScheme facing the open internet? Try putting nginx (or another epoll/kqueue based server) in front of MzScheme. It will handle the thousands of connections you have that are waiting for IO with very little incremental CPU load and with a single thread. Then when nginx reverse proxies to MzScheme each request happens very fast because it's local which means you need much fewer threads for your app server. That means less memory and less of the other overhead that you get with a high thread count.

An additional advantage is that you can enable keepalive again (right now you have it disabled it looks like) which makes things a faster for first-time visitors. It also makes it slightly faster for us regulars because the conditional gets we do for the gif's and css won't have to reestablish connections. Less connections established means you give your OS a break too with fewer syn/syn-ack/ack TCP handshakes.

Someone mentioned below that reverse proxies won't work for HN. They mean that caching won't work - but a reverse proxy like nginx that doesn't cache but handles high concurrency efficiently should give you a huge perf improvement.

PS: I'd love to help implement this free. I run a 600 req/sec site using nginx reverse proxying to apache.

replies(6): >>2121641 #>>2121644 #>>2122343 #>>2122679 #>>2125682 #>>2126225 #

2. joshu ◴[19 Jan 11 22:56 UTC] No.2121641[source]▶

>>2121495 (TP) #

Exactly this.

3. sedachv ◴[19 Jan 11 22:57 UTC] No.2121644[source]▶

>>2121495 (TP) #

Or I don't know, use continuations in a place that's actually appropriate? John Fremlin showed that even with horrible CPS rewriting and epoll you can get way better throughput in SBCL (TPD2) than nginx. MzScheme comes with native continuations. It's not hard to call out to epoll.

Instead everyone in the Lisp community (pg included) is still enamored with using continuations to produce ugly URLs and unmaintainable web applications.

replies(2): >>2121816 #>>2122083 #

4. pg ◴[19 Jan 11 23:46 UTC] No.2121816[source]▶

>>2121644 #

Instead everyone in the Lisp community (pg included) is still enamored with using continuations to produce ugly URLs and unmaintainable web applications.

If you read the source of HN, you'll see that it doesn't actually use continuations.

I find the source of HN very clear. Have you read it? Is there a specific part you found so complicated as to be unmaintainable?

replies(2): >>2121859 #>>2121983 #

5. axod ◴[19 Jan 11 23:56 UTC] No.2121859{3}[source]▶

>>2121816 #

> If you read the source of HN, you'll see that it doesn't actually use continuations.

> It had to be some dialect of Lisp with continuations, which meant Scheme, and MzScheme seemed the best.

(From further down the page).

I'm confused. What needs continuations?

replies(2): >>2121863 #>>2121867 #

6. dauphin ◴[19 Jan 11 23:58 UTC] No.2121863{4}[source]▶

>>2121859 #

Errors/exceptions, for one, are implemented using continuations.

replies(1): >>2121874 #

7. pg ◴[20 Jan 11 00:00 UTC] No.2121867{4}[source]▶

>>2121859 #

I just wanted to have them in the language. The fact that I don't currently use them in HN doesn't mean they're useless.

replies(2): >>2121885 #>>2122026 #

8. axod ◴[20 Jan 11 00:02 UTC] No.2121874{5}[source]▶

>>2121863 #

Sounds terribly inefficient to me, but what do I know -shrug-

replies(2): >>2121906 #>>2121970 #

9. axod ◴[20 Jan 11 00:06 UTC] No.2121885{5}[source]▶

>>2121867 #

ah ok thanks for clarifying.

10. jrockway ◴[20 Jan 11 00:34 UTC] No.2121970{6}[source]▶

>>2121874 #

All control flow is a subset of continuations. The stack is a continuation (calling a function is call-with-current-continuation, return is just calling the "current continuation"), loops are continuations (with the non-local control flow, like break/last/redo/etc.), exceptions are continuations (like functions, but returning to the frame with the error handler), etc. Continuations are the general solution to things that are normally treated as different. So continuations are just as efficient (or inefficient) as calling functions or throwing exceptions.

In a web app context, though, it's kind of silly to keep a stack around to handler something like clicking a link that returns the contents of database row foo. People do this, call it continuations, and then run into problems. The problem is not continuations, the problem is that you are treating HTTP as a session, not as a series of request/responses. (The opposite of this style is REST.)

replies(2): >>2122010 #>>2122390 #

11. sedachv ◴[20 Jan 11 00:37 UTC] No.2121983{3}[source]▶

>>2121816 #

Pagination/"More" uses fnids; looking at the source it's a callback, but from an HTTP client perspective it might as well be a continuation.

How do you test and debug things like that that have random URIs and function names and get GCed on a regular basis? That's what I mean when I say continuations lead to unmaintainable web apps.

replies(1): >>2122943 #

12. ezalor ◴[20 Jan 11 00:49 UTC] No.2122026{5}[source]▶

>>2121867 #

I always thought the purpose of Arc was to be cruft-free, "don't include it unless it is actually needed".

replies(1): >>2122748 #

13. gchpaco ◴[20 Jan 11 01:09 UTC] No.2122083[source]▶

>>2121644 #

MzScheme/Racket's continuations are of the "copy the C stack" variety, or were last time I checked. They are in no way efficient; it would probably be better to CPS transform your own code than try to use MzScheme/Racket's continuations directly in performance sensitive code.

replies(1): >>2124185 #

14. sedachv ◴[20 Jan 11 02:48 UTC] No.2122390{7}[source]▶

>>2121970 #

In theory yes, in practice you need to reify the stack (even for one-shot continuations). Clinger, Hartheimer and Ost have a really good survey paper of the different ways to do that:

http://www.scribd.com/doc/47221367/Clinger-Implementation-St...

15. esbcupper ◴[20 Jan 11 04:45 UTC] No.2122679[source]▶

>>2121495 (TP) #

Filo's BSD hack to buffer the entire HTTP request and then pass it on has the same effect as using nginx here.

It doesn't help with keepalive though, but that's probably not needed.

16. pg ◴[20 Jan 11 07:02 UTC] No.2122943{4}[source]▶

>>2121983 #

I've been using this technique since 1995 and it has never once been a problem. It's an instance of programming with closures, which has been common in the Lisp world for even longer. One doesn't need to examine something represented as a closure any more than one needs to examine a particular invocation of a recursive function.

Perhaps the reason I've never had a problem is that I've been careful to use this technique in fairly restricted ways. Lisp macros are the same sort of thing. They could yield unreadable code if abused. But good programmers use them in principled ways, and when used with restraint they are clearly a net win.

replies(2): >>2123220 #>>2123604 #

17. blasdel ◴[20 Jan 11 10:24 UTC] No.2123220{5}[source]▶

>>2122943 #

It's a problem for me when the fnids in every reply <form> cause them to expire several times a day when the server crashes.

Edit: also when the server redirects me back to the wrong origin, I was sent to http://news.ycombinator.com/threads?id=pg instead of http://news.ycombinator.com/item?id=2120756 after posting this reply initially.

replies(1): >>2124843 #

18. jules ◴[20 Jan 11 13:57 UTC] No.2123604{5}[source]▶

>>2122943 #

> I've been using this technique since 1995 and it has never once been a problem.

Thousands of HN users experience the problems every day: link expired.

replies(2): >>2123749 #>>2124763 #

19. Flow ◴[20 Jan 11 14:44 UTC] No.2123749{6}[source]▶

>>2123604 #

I get the same problem on Reddit, except they call it "there does not seem to be anything here".

20. ◴[20 Jan 11 16:42 UTC] No.2124185{3}[source]▶

>>2122083 #

21. pg ◴[20 Jan 11 18:49 UTC] No.2124763{6}[source]▶

>>2123604 #

The issue we were talking about was maintainability.

Giving links longer expirations is trivially easy, and I already do it in cases where it matters, like submit buttons on big forms.

replies(1): >>2124874 #

22. pg ◴[20 Jan 11 19:05 UTC] No.2124843{6}[source]▶

>>2123220 #

also when the server redirects me back to the wrong origin

Sounds like that could be a bug. Was http://news.ycombinator.com/item?id=2120756 the page you were on when you replied?

replies(1): >>2124899 #

23. jules ◴[20 Jan 11 19:14 UTC] No.2124874{7}[source]▶

>>2124763 #

The problem is that links expire at all. There are two products: HN with "link expired" and HN without "link expired". You can write the former in a highly maintainable way.

Also, even if you accept that links expire, it's not trivial to make the links not expire for a long time. You can add a lot of RAM to a single machine, but only up to a point. Supporting multiple machines is very hard, although it can be done (distributed object system). RAM is not the only problem however. The server is inevitably going to restart once in a while. Perhaps not in the literal sense of killing and restarting the mzscheme process if you're very careful, but still in the practical sense. The data structures for storing content change as you develop a web app/site, thereby invalidating old closures hanging around.

24. blasdel ◴[20 Jan 11 19:23 UTC] No.2124899{7}[source]▶

>>2124843 #

Yes, I clicked the reply link on your comment from there.

I figured you had to know about this bug, since it happens to me regularly (maybe 10% of eligible comments) when I comment in active threads during standard procrastination hours. The misdirect is usually to the threads page of a user further up the comment tree, though sometimes it's to the permalink of a grandparent comment.

Seems like you're mixing up the redirects of concurrent users but never across comment hierarchies so it's not omnipresent.

25. akkartik ◴[20 Jan 11 22:59 UTC] No.2125682[source]▶

>>2121495 (TP) #

"it sounds like you have MzScheme facing the open internet?"

Yep. The arc webserver runs directly on port 80, for which it needs to run as root. To avoid all sorts of security headaches, it runs:

  (setuid 2)

soon after startup.

The whole thing seems hacky.

26. sayrer ◴[21 Jan 11 02:40 UTC] No.2126225[source]▶

>>2121495 (TP) #

They said caching wouldn't work, but they could be wrong. You can't change the Cache-Control header to public for HN responses, because the same URL can appear different to different users. There may be some ways around this, including giving each logged-in user their own URL to browse with.

But that might be a lot of work. You can still set up a proxy that kicks in only for requests that don't contain a session cookie. Then, requests without a cookie can be responded to with a cached copy from Varnish, and Varnish could refresh every 30 seconds or so. That might reduce the number of connections to MzScheme by quite a lot.