Most active commenters
  • adamnemecek(4)

←back to thread

2024 points randlet | 18 comments | | HN request time: 0.9s | source | bottom
1. adamnemecek ◴[] No.17516031[source]
Even though I write python only if it’s going to be like 200 loc tops, it’s insane how gracefully the language has aged. It came out in 91, thus predating for example Java. Fun fact, I believe the very first google scraper was written in it.

At the same time I can’t wait for the next language to replace it.

replies(4): >>17516304 #>>17516428 #>>17516829 #>>17518450 #
2. anothergoogler ◴[] No.17516304[source]
Python hasn't changed much, but the world around it has. For better or worse, when jobs in the language were few and far between, it was a friendlier and less competitive community.
3. sametmax ◴[] No.17516428[source]
People often don't realize this. We hear a lot that python is moving too fast or too slow. To say that disregard the delicate balance the language had to dance on for 20 years. It's crazy.
replies(1): >>17516476 #
4. adamnemecek ◴[] No.17516476[source]
Agree but it’s closer to 30.
replies(1): >>17518959 #
5. petters ◴[] No.17516829[source]
Wasn't the first Google scraper written in Java? I saw an old Usenet post by Page from about 1997 asking about user agents in Java.

Perhaps that was the second scraper.

replies(1): >>17516900 #
6. adamnemecek ◴[] No.17516900[source]
> Perhaps that was the second scraper.

I believe that is the case yes. However correct me if m wrong.

replies(1): >>17517318 #
7. secure ◴[] No.17517318{3}[source]
Yes, Larry Page’s first crawler was in Java, and Scott Hassan rewrote it in Python.

From https://www.vanityfair.com/news/2018/07/valley-of-genius-exc...:

> Scott Hassan: In the fall of ’95, for some reason, I started hanging out with Larry in his office. . . . At the time, Larry was trying to download a hundred pages simultaneously. And I was fixing some of the bugs that he was having with Java itself, and this went on for weeks, if not months. And I remember thinking, Wow, this is insane!, because I was spending a lot of time fixing this underlying tool. And so one weekend, I just took all his code, I took his whole entire thing, and threw it all out, and rewrote the thing that he’s been working on for months very quickly—over a weekend—because I was just sick and tired of it. I knew I could get the thing working if I used a language I knew very well, called Python. I wrote it in such a way that it could download 32,000 pages simultaneously. So Larry went from barely downloading a 100, to doing 32,000 [pages] simultaneously on a single machine.

replies(1): >>17519862 #
8. vram22 ◴[] No.17518450[source]
>Fun fact, I believe the very first google scraper was written in it.

Fun "fact": I need to check my facts, but it could be that the very first Google was written it it, too :)

Checked:

https://www.google.co.in/search?q=what+language+was+google+s...

https://en.wikipedia.org/wiki/Google_Search

https://en.wikipedia.org/wiki/History_of_Google

replies(1): >>17521638 #
9. brettcannon ◴[] No.17518959{3}[source]
28 if you want the exact number. :) Guido started developing Python in December 1989 and went public with it on Usenet in February 1991 (Unicode 1.0 was standardized in October later that year to give perspective of how far back that was in the tech world).
10. cutler ◴[] No.17519862{4}[source]
32,000 pages simultaneously - in 1995? Async has only recently been added to Python, no?
replies(2): >>17519885 #>>17520105 #
11. boyter ◴[] No.17519885{5}[source]
Using threads maybe? Id be curious to see how you would achieve that in Python easily back in 1995.
12. mathgenius ◴[] No.17520105{5}[source]
Using select: https://en.wikipedia.org/wiki/Select_(Unix)
replies(3): >>17520187 #>>17520198 #>>17521231 #
13. gcb0 ◴[] No.17520187{6}[source]
is that a fact or are you guessing?
14. adamnemecek ◴[] No.17520198{6}[source]
select doesnt scale well.
15. scandox ◴[] No.17521231{6}[source]
Doesn't select have a maximum of 1024 file descriptors it can handle at any one time? Or some such?
replies(1): >>17531379 #
16. kgwgk ◴[] No.17521638[source]
That’s what he said :-) (first google scraper = scraper used in original google implementation)
replies(1): >>17523490 #
17. vram22 ◴[] No.17523490{3}[source]
Okay, I thought he meant a product that scraped Google sites.
18. __sr__ ◴[] No.17531379{7}[source]
By default, yes. But it can be increased. See limit(1) and ulimit(1).