They've never heard of select()? </snark>
But really, is there some reason that it's hard to collect up all the fds at once or something?
[0] http://ycombinator.com/images/hntraffic-17jan11.png [1] http://news.ycombinator.com/item?id=2090191
Pretty whizzy, definitely helped server scaling.
We started shipping in 2001; the dot-com bust more or less canceled any interest in the product, and canceled the company, too . . .
I wouldn't call it a hack, but a feature ;-)
# Buffer a HTTP request in the kernel
# until it's completely read.
apache22_http_accept_enable="yes"
Is HackerNews web scale?I have no idea what MzScheme is but I am curious about why is HN running threads in user space in 2011? The OS kernel knows best what thread to pick to run and that is a very well tuned, O(1) operation for Linux and Solaris.
Serving static content via Apache was a first step ;-)
Don't reinvent the wheel!
The bottleneck is the amount of garbage created by generating pages. IIRC there is some horrible inefficiency involving UTF-8 characters.
Anyone know if they're referring to "accept filters" here? FreeBSD folks can "man accf_http" if they're curious, which does prevent a request from being handed off to the application until the complete (and valid?) request has been made. Certainly not a "hack" but a feature of the OS itself.
Or they could use a proxy. All this "fuck me I'm famous" attitude is stupid.
Regardless of what you think of Yahoo's current situation, somebody who could easily retire wealthy but still hacks and flies economy class on Southwest to meetings in remote offices is worthy of respect.
Of course, these days, N+1 is probably 2, since everything except Windows supports pthreads.
The issue, in the case of HN, is with O(n) IO watchers. Most sockets are idle most of the time, so you really want an algorithm that is O(n) over active sockets, not O(n) over active and inactive sockets. You typically have so few active fds at any time that the n is really tiny, making massively scalable network servers trivial to write. But you also have a lot of connections at any one time, so if you are O(n) over active and inactive fds, then you are going to have performance issues. Basically, you don't want to pay for connections that aren't doing anything.
Fortunately, we have the technology; epoll on Linux, kqueue on BSDs, /dev/poll on Solaris. You just need to use an event loop, so it does all the hard stuff for you (and so you don't have to worry about the OS differences). Hacking a proper event loop into MzScheme may be hard, but it's absolutely necessary for writing scalable network servers. Handling 10k+ open connections is trivial with today's technology. And, all the cool kids are doing it (node.js, GHC, etc.).
I write time-critical applications in Clojure and JVM's -XX:+UseConcMarkSweepGC flag is a lifesaver. We no longer get those multi-second pauses when full GC occurs.
They think they can build a better wheel. They seem to like doing it and have a habit of it. There's nothing wrong with that.
I think in part there is a tendency, among server developers, correctly, to fear anything that looks like a busy wait (e.g., with the name poll). But really poll is just as asynchronous as select in this context (I don't know about FreeBSD's implementation -- but Linux puts to sleep wait queues the same way, afaik). It just doesn't suffer from the crazy indexing scheme of select....
At any rate, I didn't get a chance to finish probing the internals of what mzscheme uses. But if there's a way to substitute poll for select, it can often alleviate those issues of 900 requests queue up and you eventually have an fd with a value of 1024 or greater -- even though you may not have 1024 actual concurrent requests....
Though others feel free to correct me if I'm wrong. I only comment because I came across a similar issue recently. This link may be useful too:
http://www.makelinux.net/ldd3/chp-6-sect-3.shtml
ETA. i finally got a copy of the most recent racket source (though probably not the one rtm and pg are using). but if anyone is curious, browse racket/src/network.c. the source version for mac uses a bunch of selects (e.g., for tcp-accept). replacing with poll might help.... the max number of FDs per login session is often 1024 by default. so you might want to bump that up if it's not already. and consider using poll.... just an idea.
Also if I understand correctly you use flat files that are loaded into memory at startup. It seems like that switching to Redis could be an interesting idea in theory, as it is more or less the implementation of this concept in an efficient and networked way.
Probably with such changes you can go from 20 to a few hundreds requests per second without problems.
http://joshua.schachter.org/2008/01/proxy.html
(Like I suggested in 2009...)
Last time I had lunch with him we talked about the minutia of DNS server implementations because I was working on some optimization tricks and he was really interested in seeing them get implemented.
filo is a really amazing guy. He's the most down to earth billionaire I know. He talks way more about his family and hacking more than 'stuff'. If he isn't hacking on code as much as he used to it's because he cares about his company and is doing an important job looking after technical stuff that needs doing, even if it isn't interesting.
Things like users' showdead value, as well as whether the user is deaded, can drastically change the output of each page. Eg, comments by a deaded user won't show as dead to that user, but they will for everyone else...
The philosophy "Don't reinvent the wheel" however, is definitely inconsistent with their philosophy. They will reinvent the wheel whenever they feel they can make a better one. Just because they haven't reinvented every wheel does not mean "don't reinvent the wheel" applies to this group.
They chose to create the best solution they think they can. They don't seem to care whether or not that involves reinventing wheels. The original argument that they should seems pretty silly.
I once posted a comment which was immediately invisible to everyone besides me - I'm guessing it was marked as spam for some reason, but left visible to me so I think I successfully posted it.
I may have misunderstood but it sounds like you have MzScheme facing the open internet? Try putting nginx (or another epoll/kqueue based server) in front of MzScheme. It will handle the thousands of connections you have that are waiting for IO with very little incremental CPU load and with a single thread. Then when nginx reverse proxies to MzScheme each request happens very fast because it's local which means you need much fewer threads for your app server. That means less memory and less of the other overhead that you get with a high thread count.
An additional advantage is that you can enable keepalive again (right now you have it disabled it looks like) which makes things a faster for first-time visitors. It also makes it slightly faster for us regulars because the conditional gets we do for the gif's and css won't have to reestablish connections. Less connections established means you give your OS a break too with fewer syn/syn-ack/ack TCP handshakes.
Someone mentioned below that reverse proxies won't work for HN. They mean that caching won't work - but a reverse proxy like nginx that doesn't cache but handles high concurrency efficiently should give you a huge perf improvement.
PS: I'd love to help implement this free. I run a 600 req/sec site using nginx reverse proxying to apache.
I understand if you are in tech you might not know figures in history or literature... but these guys?
Every time you login to a UNIX/Linux system you use the passwd file and related setup - authored at least in part by Rtm's father.
http://www.manpages.info/freebsd/passwd.1.html
Rtm has done lots in his own right as the wikipdia pages show.
But seriously - if you don't know who these people are you really should.
Read this: http://www.princeton.edu/~hos/Mahoney/unixhistory
and maybe ESR's writings and that online anthology of the early Apple days and old issues of 2600, etc, etc
I am sorry - but it is really irritating to me that someone would be on this site and really not be aware of the deeper history and culture. It is not that deep - 1950s to present (to cover Lisp).
As Jay-Z (whom you probably know) says - "Go read a book you illiterate son of a bitch and step up your vocab ..."
sub vcl_hash { set req.hash += req.http.cookie; }
What this means is that the cache is per-logged-in-user and pretty much personalized. The server's going to need a lot more RAM than usual. You can set a low TTL on the cache entries so they're flushed and not kept in memory indefinitely. But the performance boost is great.
This is not recommended as an always-on measure. We wrote an entry about accomplishing something similar w/ python&varnish. Here it is if you're interesting in reading about it: http://blog.unixy.net/2010/11/3-state-throttle-web-server/
Regards
Except they can't, for the reasons I mentioned above. Eg, if my account is deaded, when I view a thread with one of my own comments, it looks different than if someone else was viewing that some thread, especially for those of us with or without showdead checked in our profiles.
Its not as straightforward as you would like it to be.
> In 7 seconds, a hundred or more connections accumulate. So
> the server ends up with hundreds of threads, most of them
> probably waiting for input
This is why Nginx handles large site much better. The request are queued without spawning threads. Evented I/O for the rescue.You sure you want to do that? :-)
> Java technology is not fault tolerant and is not designed, manufactured, or intended for use or resale as on-line control equipment in hazardous environments requiring fail-safe performance, such as in the operation of nuclear facilities, aircraft navigation or communication systems, air traffic control, direct life support machines, or weapons systems, in which the failure of Java technology could lead directly to death, personal injury, or severe physical or environmental damage
Aside from the header, there's only a relatively small number of variations for any given content, right? Showdead, ability to downvote, etc? So, each of these variations gets a distinct ESI URL. Like /item_$showdead_$downvote_$etc right in the internal URL, so any combination of these is a distinct URL. Only the first user to hit any particular combination would result in a request to the backend, and that could remain in cache until its content changed. No wizardry required.
Instead everyone in the Lisp community (pg included) is still enamored with using continuations to produce ugly URLs and unmaintainable web applications.
I work with VoIP daily and could name lots of people who you "really should" know - you're using a phone all the time after all. Or people who create amazing stuff right now. But no... actually I don't expect that. Everyone has their own area of interest. I appreciate that someone wrote `cat` or one hundreds of other nice utilities, but I'm not going to read their history unless I've got a lot of free time and want to do that.
If you read the source of HN, you'll see that it doesn't actually use continuations.
I find the source of HN very clear. Have you read it? Is there a specific part you found so complicated as to be unmaintainable?
> It had to be some dialect of Lisp with continuations, which meant Scheme, and MzScheme seemed the best.
(From further down the page).
I'm confused. What needs continuations?
After 4 levels of back and forth (Joe says "...", Tim replies, then Joe replies once more, then Tim replies again), freeze that branch, hide it from the general public, and turn the branch into a settlement: both Tim and Joe are allowed one final comment each, that they both approve. Only once they have posted this compromise, is it shown in-place, where the original sub-thread used to be.
Simple. Prevents endless arguments. Good for everyone.
Civility people. What happened to that?
Btw, I down vote me if you like, but it's true. It's easy for us to get caught up in our own brilliance that we talk down to others that don't know as much in a particular subject as we do.
Ironically, it shows more about you, than it does them.
For example, one of the sites I run has about 50K pageviews/day by logged in users, and another 600K pageviews/day by anonymous users coming from referrals or search engines. Logged in users have similar customization options so we bypass cache for these users by detecting a cookie.
Obviously going the cache route would require some changes to how things are setup, its not a turn-key solution. But the insignificant amount of changes are well worth it for most content sites, but for a user generated content site like HN it would also depend on how the TTLs and cache purging are setup.
In a web app context, though, it's kind of silly to keep a stack around to handler something like clicking a link that returns the contents of database row foo. People do this, call it continuations, and then run into problems. The problem is not continuations, the problem is that you are treating HTTP as a session, not as a series of request/responses. (The opposite of this style is REST.)
It is not the year, but the naivety that is the problem here, if any.
How do you test and debug things like that that have random URIs and function names and get GCed on a regular basis? That's what I mean when I say continuations lead to unmaintainable web apps.
http://www.scribd.com/doc/47221367/Clinger-Implementation-St...
This guy here broke the 10k barrier:
The FreeBSD's accept_filter() used by Rtm does more or less that (you can think of it as of reverse-proxy in the kernel), but it only works for plain HTTP and HEAD/GET methods.
Perhaps the reason I've never had a problem is that I've been careful to use this technique in fairly restricted ways. Lisp macros are the same sort of thing. They could yield unreadable code if abused. But good programmers use them in principled ways, and when used with restraint they are clearly a net win.
Either way, it's 2011 and that really is some spectacular slowness.
Edit: also when the server redirects me back to the wrong origin, I was sent to http://news.ycombinator.com/threads?id=pg instead of http://news.ycombinator.com/item?id=2120756 after posting this reply initially.
I don't value "being able to write it in my favorite language" at all. From what I've read, pg does. To the extent that the product suffers.
There would be absolutely no point me trying to improve mzScheme when you can do exactly the same job in other languages/platforms, and the user doesn't care/know the difference. HN could be rewritten in a weekend, in PHP/python/whatever and we wouldn't be sitting here waiting for pages to load.
(I run Mibbit, which handles a few thousand HTTP requests a second, on VPS level hardware. In Java).
If you were talking about Facebook, Twitter, or Basecamp, that would be a different matter.
Don't slam something saying I could do better, and then have your bluff called.
Makes you look silly. This applies to even if the person calling your bluff was NOT pg.
Also, even if you accept that links expire, it's not trivial to make the links not expire for a long time. You can add a lot of RAM to a single machine, but only up to a point. Supporting multiple machines is very hard, although it can be done (distributed object system). RAM is not the only problem however. The server is inevitably going to restart once in a while. Perhaps not in the literal sense of killing and restarting the mzscheme process if you're very careful, but still in the practical sense. The data structures for storing content change as you develop a web app/site, thereby invalidating old closures hanging around.
I figured you had to know about this bug, since it happens to me regularly (maybe 10% of eligible comments) when I comment in active threads during standard procrastination hours. The misdirect is usually to the threads page of a user further up the comment tree, though sometimes it's to the permalink of a grandparent comment.
Seems like you're mixing up the redirects of concurrent users but never across comment hierarchies so it's not omnipresent.
Yep. The arc webserver runs directly on port 80, for which it needs to run as root. To avoid all sorts of security headaches, it runs:
(setuid 2)
soon after startup.The whole thing seems hacky.
But that might be a lot of work. You can still set up a proxy that kicks in only for requests that don't contain a session cookie. Then, requests without a cookie can be responded to with a cached copy from Varnish, and Varnish could refresh every 30 seconds or so. That might reduce the number of connections to MzScheme by quite a lot.