Serving 200M requests per day with a CGI-bin

1. simonw ◴[04 Jul 25 14:30 UTC] No.44464893[source]▶

I got my start in the CGI era, and it baked into me an extremely strong bias against running short-lived subprocesses for things.

We invented PHP and FastCGI mainly to get away from the performance hit of starting a new process just to handle a web request!

It was only a few years ago that I realized that modern hardware means that it really isn't prohibitively expensive to do that any more - this benchmark gets to 2,000/requests a second, and if you can even get to a few hundred requests a second it's easy enough to scale across multiple instances these days.

I have seen AWS Lambda described as the CGI model reborn and that's a pretty fair analogy.

replies(3): >>44465143 #>>44465227 #>>44465926 #

2. pjc50 ◴[04 Jul 25 15:01 UTC] No.44465143[source]▶

>>44464893 (TP) #

> We invented PHP and FastCGI mainly to get away from the performance hit of starting a new process just to handle a web request!

Yes! Note that the author is using a technology that wasn't available when I too was writing cgi_bin programs in the 00's: Go. It produces AOT compiled executables but is also significantly easier to develop in and safer than trying to do the same with C/C++ in the 00's. Back then we tended to use Perl (now basically dead). Perl and Python would incur significant interpreter startup and compilation costs. Java was often worse in practice.

> I have seen AWS Lambda described as the CGI model reborn and that's a pretty fair analogy.

Yes, it's almost exactly identical to managed FastCGI. We're back to the challenges of deployment: can't we just upload and run an executable? But of course so many technologies make things much, much more complicated than that.

replies(1): >>44467043 #

3. geocar ◴[04 Jul 25 15:10 UTC] No.44465227[source]▶

>>44464893 (TP) #

I think you might have found that CGI scripts deployed as statically-linked C binaries, with some attention given to size, you might've not been so disappointed.

The "performance hit of starting a new process" is bigger if the process is a dynamically-linked php interpreter with gobs of shared libraries to load, and some source file, reading parsing compiling whatever, and not just by a little bit, always has been, so what the author is doing using go, I think, would still have been competitive 25 years ago if go had been around 25 years ago.

Opening an SQLite database is probably (surprisingly?) competitive to passing a few sockets through a context switch, across all server(ish) CPUS of this era and that, but both are much faster than opening a socket and authenticating to a remote mysql process, and programs that are not guestbook.cgi often have many more resource acquisitions which is why I think FastCGI is still pretty good for new applications today.

replies(1): >>44465544 #

4. simonw ◴[04 Jul 25 15:48 UTC] No.44465544[source]▶

>>44465227 #

That's likely true - but C is a scary language to write web-facing applications in because it's so easy to have things like buffer overflows or memory leaks.

replies(7): >>44465786 #>>44466009 #>>44466418 #>>44467031 #>>44470699 #>>44470836 #>>44477576 #

5. rascul ◴[04 Jul 25 16:18 UTC] No.44465786{3}[source]▶

>>44465544 #

Can use rust, go, or whatever compiled language you want and it'll probably be much more performant starting up than any interpreted language.

replies(2): >>44466015 #>>44466046 #

6. citrin_ru ◴[04 Jul 25 16:36 UTC] No.44465926[source]▶

>>44464893 (TP) #

CGI never was prohibitively expensive for low load and for high load a persistent process (e. g. FastCGI) is still better. CGI may be allows to handle 2k rps but FastCGI app doing the same job should handle more. You would need to start an additional server process (and restart it on upgrade) but it's worth to do if performance matters.

replies(1): >>44466710 #

7. foobiekr ◴[04 Jul 25 16:46 UTC] No.44466009{3}[source]▶

>>44465544 #

CGI basically is a protocol between the webserver and some process. It didn't go out of fashion because of C (many/most CGI scripting for a time was perl and php or even bash), it went out of fashion because people wanted (or were told that smart people did this) to use languages that had runtimes which were expensive to run in a fork()+exec() execution model like Java.

You can use it with any language that can read stdin and write stdout. Yes, printing and reading.

replies(1): >>44466901 #

8. simonw ◴[04 Jul 25 16:46 UTC] No.44466015{4}[source]▶

>>44465786 #

We didn't have so many options in the 90s!

9. dewitt ◴[04 Jul 25 16:51 UTC] No.44466046{4}[source]▶

>>44465786 #

> Can use rust, go, or whatever compiled language you want and it'll probably be much more performant starting up than any interpreted language.

One additional bit of context is the person you’re replying to, simonw, is also the creator of Django, which at the time was the world’s defacto standard Python web framework, and was created at a time (2005) that long predates either Go (2009) or Rust (2012).

10. qingcharles ◴[04 Jul 25 17:38 UTC] No.44466418{3}[source]▶

>>44465544 #

Neither of these things came up in the early days of web apps development in the mid-90s when I was doing this.

I had no real debugging environment. I was probably writing all my code in vi and then just compiling and deploying. I guarantee there were buffer overflows and off-by-ones etc.

Web app code was so simple back then, though. The most complex one I wrote was a webmail app, which I was so pleased with, and then HoTMaiL was released three weeks later, with this awesome logo:

https://tenor.com/view/hotmail-outlook-microsoft-outlookcom-...

11. cenamus ◴[04 Jul 25 18:13 UTC] No.44466710[source]▶

>>44465926 #

I agree, but if you're doing fastcgi, you might as well do http directly, with a relay in front of it (load balancing, tls termination, whatever).

replies(1): >>44470900 #

12. mh- ◴[04 Jul 25 18:43 UTC] No.44466901{4}[source]▶

>>44466009 #

I wrote my first web app upon realizing that cgi-bin stuff could just be .bas files I compiled in QB 4.5. I pretty quickly switched to Perl for the ecosystem of helper libs, but at the time Basic was all I knew. I think I was 10 or 11 years old.

13. geocar ◴[04 Jul 25 19:03 UTC] No.44467031{3}[source]▶

>>44465544 #

Don't be afraid.

Look at qmail, which has the best track record of any piece of software I am aware of in wide distribution, and it was written in C.

Also: Memory leaks go away when you exit(), so they are actually more common in dynamic languages in my experience, although they manifest as fragmentation that the interpreter simply lacks the ability to do anything out.

Buffer overflows seem pretty common to people who do a lot of dynamic memory allocation: I would recommend not doing that in response to user-input.

The result is that your C-based guestbook CGI is probably written very differently than a PHP-based guestbook. Mine basically just wrote to a logfile because since 2.6.35 we have been able to easily make a 1mb PIPE_BUF and get lock-free stores with no synchronisation and trivial recovery, and thus know exactly where each post began and end. I'm not sure I want more than 1mb of user input back in those days, but the design made me very confident there were no memory leaks or buffer overflows in what was like 5 system calls. No libraries.

You could do this.

You can do this.

But you want more? That C-based guestbook also only ever needs to write to one file, so permissions could be (carefully) arranged to make that the only file it can write to. A PHP-based guestbook needs read (and possibly write-access) to lots of files. Some of those things can be shared objects. It is so much easier to secure a single static binary than a dynamic language with dynamic loading that if you actually care about security, you could focus on how to make those static binaries easier.

14. shrubble ◴[04 Jul 25 19:04 UTC] No.44467043[source]▶

>>44465143 #

I know of two large telecoms that internally develop with Perl, and a telecom product sold by Oracle that heavily relies on Perl. For text munging etc. it is still used, though I grant that other languages like Python are more popular.

15. senko ◴[05 Jul 25 07:11 UTC] No.44470699{3}[source]▶

>>44465544 #

That didn’t bother me so much (less than it should have), but string manipulation in C is tedious, man! And there was soo much string manipulation (none of it done by a helpful framework)…

I used Perl instead, which worked way better in that regard (+ taint based security was welcome in handling untrusted user input), and an enormous (for the time) CPAN ecosystem, but had other problems.

Python web ecosystem was a mess, so PHP3 it was (ah the “good” ol days of mysql_real_escape_string()) … until some enterprising individuals wrote Django and I happily switched. Thank you :)

16. anonzzzies ◴[05 Jul 25 07:40 UTC] No.44470836{3}[source]▶

>>44465544 #

I made fork run a chroot and if it crashes, it doesn't really matter as it's just a fork. At that time, people weren't really generally good breaking out of a chroot using exploits.

17. immibis ◴[05 Jul 25 07:52 UTC] No.44470900{3}[source]▶

>>44466710 #

CGI-based protocols transfer a bunch of metadata from the front end - such as the client IP address - without any injection or double-parsing vulnerabilities. Using HTTP twice means having more code and a greater security risk.

By the way if you're using nginx, then instead of FastCGI you might prefer SCGI, which does one connection per request and no multiplexing, so it's much simpler.

replies(1): >>44472219 #

18. petee ◴[05 Jul 25 12:11 UTC] No.44472219{4}[source]▶

>>44470900 #

I always wished that FastCGI's Filter & Authorizer roles became popular, it's a nice separation of duties

19. ryao ◴[06 Jul 25 03:26 UTC] No.44477576{3}[source]▶

>>44465544 #

If you use secure string functions, you can generally avoid buffer overflows in C. The problem is that not everyone does. That said, exploiting buffer overflows in programs whose source code and binaries are not public is very difficult. It likely can be done, but most people would likely go after easier targets. If you deploy AddressSanitizer in production, you can get the program to terminate whenever a buffer overflows occurs, at the expense of additional overhead when there is no buffer overflow.

Memory leaks are considered a feature in short lived programs, since not freeing memory in favor of relying on the kernel to free it at program exit lets them run faster.