←back to thread

2024 points randlet | 1 comments | | HN request time: 0s | source
Show context
adamnemecek ◴[] No.17516031[source]
Even though I write python only if it’s going to be like 200 loc tops, it’s insane how gracefully the language has aged. It came out in 91, thus predating for example Java. Fun fact, I believe the very first google scraper was written in it.

At the same time I can’t wait for the next language to replace it.

replies(4): >>17516304 #>>17516428 #>>17516829 #>>17518450 #
petters ◴[] No.17516829[source]
Wasn't the first Google scraper written in Java? I saw an old Usenet post by Page from about 1997 asking about user agents in Java.

Perhaps that was the second scraper.

replies(1): >>17516900 #
adamnemecek ◴[] No.17516900[source]
> Perhaps that was the second scraper.

I believe that is the case yes. However correct me if m wrong.

replies(1): >>17517318 #
secure ◴[] No.17517318[source]
Yes, Larry Page’s first crawler was in Java, and Scott Hassan rewrote it in Python.

From https://www.vanityfair.com/news/2018/07/valley-of-genius-exc...:

> Scott Hassan: In the fall of ’95, for some reason, I started hanging out with Larry in his office. . . . At the time, Larry was trying to download a hundred pages simultaneously. And I was fixing some of the bugs that he was having with Java itself, and this went on for weeks, if not months. And I remember thinking, Wow, this is insane!, because I was spending a lot of time fixing this underlying tool. And so one weekend, I just took all his code, I took his whole entire thing, and threw it all out, and rewrote the thing that he’s been working on for months very quickly—over a weekend—because I was just sick and tired of it. I knew I could get the thing working if I used a language I knew very well, called Python. I wrote it in such a way that it could download 32,000 pages simultaneously. So Larry went from barely downloading a 100, to doing 32,000 [pages] simultaneously on a single machine.

replies(1): >>17519862 #
cutler ◴[] No.17519862[source]
32,000 pages simultaneously - in 1995? Async has only recently been added to Python, no?
replies(2): >>17519885 #>>17520105 #
mathgenius ◴[] No.17520105{3}[source]
Using select: https://en.wikipedia.org/wiki/Select_(Unix)
replies(3): >>17520187 #>>17520198 #>>17521231 #
1. gcb0 ◴[] No.17520187{4}[source]
is that a fact or are you guessing?