Most active commenters
  • ChuckMcM(4)
  • NetOpWibby(4)

←back to thread

768 points cyndunlop | 24 comments | | HN request time: 0.42s | source | bottom
1. ChuckMcM ◴[] No.43106098[source]
As a systems enthusiast I enjoy articles like this. It is really easy to get into the mindset of "this must be perfect".

In the Blekko search engine back end we built an index that was 'eventually consistent' which allowed updates to the index to be propagated to the user facing index more quickly, at the expense that two users doing the exact same query would get slightly different results. If they kept doing those same queries they would eventually get the exact same results.

Systems like this bring in a lot of control systems theory because they have the potential to oscillate if there is positive feedback (and in search engines that positive feedback comes from the ranker which is looking at which link you clicked and giving it a higher weight) and it is important that they not go crazy. Some of the most interesting, and most subtle, algorithm work was done keeping that system "critically damped" so that it would converge quickly.

Reading this description of how user's timelines are sharded and the same sorts of feedback loops (in this case 'likes' or 'reposts') sounds like a pretty interesting problem space to explore.

replies(7): >>43106334 #>>43106982 #>>43107018 #>>43107888 #>>43110527 #>>43114706 #>>43116290 #
2. culi ◴[] No.43106334[source]
What became of Blekko?
replies(1): >>43106738 #
3. an_ko ◴[] No.43106738[source]
> It was acquired by IBM in March 2015, and the service was discontinued.

https://en.wikipedia.org/wiki/Blekko

Perhaps GP has a more interesting answer though.

replies(1): >>43108177 #
4. gregw134 ◴[] No.43106982[source]
Would you be willing to share more about how you guys did click ranking at Blekko? It's an interesting problem.
5. snailmailman ◴[] No.43107018[source]
I guess I hadn’t considered that search engines could be reranking pages on the fly as I click them. I’ve been seeing my DuckDuckGo results shuffle around for a while now thinking it’s an awful bug.

Like I click one page, don’t find what I want, and go back thinking “no, I want that other result that was below” and it’s an entirely different page with shuffled results, missing the one that I think might have been good.

replies(4): >>43107341 #>>43107425 #>>43114322 #>>43115728 #
6. PaulHoule ◴[] No.43107341[source]
That's connected with a basic usability complaint about current web interfaces, that ads and recommended content aren't stable. You very well might want to engage with an ad after you are done engaging what you wanted to engage with but you might never see it again. Similarly, you might see two or three videos that you want to click on on the side of a YouTube video you're watching but you can only click on one (though if you are thinking ahead you can open these in another tab.)

On top of that immediate frustration, the YouTube style interface here

https://marvelpresentssalo.com/wp-content/uploads/2015/09/id...

collects terrible data for recommendations because, even though it gives them information that you liked the thumbnail for a video, they can't come to any conclusion about whether or not you liked any of the other videos. TikTok, by focusing on one video at a time, collects much better information.

replies(1): >>43112444 #
7. cgriswald ◴[] No.43107425[source]
I don't use DDG, but in my (very limited, just now) testing it doesn't seem to shuffle results unless you reload the page in some way. Is it possible you're browser is reloading the page when you go back? If so, setting DDG to open links in new tabs might fix this problem.
replies(1): >>43109129 #
8. dwedge ◴[] No.43107888[source]
Similar to how Google images loads lower quality blurred thumbnails towards the bottom of the window at first so that the user thinks they loaded faster
9. ChuckMcM ◴[] No.43108177{3}[source]
That's the correct answer, IBM wanted the crawler mostly to feed Watson. Building a full search engine (crawler, indexer, ranker, API, web application) for the English language was a hell of an accomplishment but by the time Blekko was acquired Google was paying out tens of billions of dollars to people to send them and only them their search queries. For a service that nominally has to live on advertising revenue getting humans to use it was the only way to be net profitable, and you can't spend billions buying traffic and hope to make it back on advertising as the #3 search engine in the English speaking markets.

There are other ways to monetize search (look at Kagi for example) than advertising. Blekko missed that window though. (too early, Google needed to get a crappy as it is today to make the value of a spam free search engine desirable)

replies(2): >>43108467 #>>43111123 #
10. chrisweekly ◴[] No.43108467{4}[source]
Not my Q but thanks for the interesting history.

Also, (for other readers), I'm a huge fan of Kagi. Highly recommended.

replies(1): >>43111842 #
11. snailmailman ◴[] No.43109129{3}[source]
Interesting. Maybe something in my configuration is affecting it. I’ll have to look into it
12. aqueueaqueue ◴[] No.43110527[source]
This is less a question of perfection and one of trade off's. Laws of physics put a limit on how efficiently you can keep data in NYC and London in perfect sync, so you choose CAP-style trade-offs. There are also $/SLO trade-offs. Each 9 costs more money.

I like your example it is very interesting. If I get to work on (or even hear someone in my team is working on) such interesting problems and I can hear about it, I get happy.

Interesting problems are rare because like a house you might talk about brick vs. Timber frame once, but you'll talk about cleaning the house every week!

13. NetOpWibby ◴[] No.43111123{4}[source]
Blekko was gone by the time I learned about it. Recently (past few years) I emailed someone who worked on Blekko to get his opinion on a search engine concept I still have yet to start. His advice was to not bother competing with Google (obviously) LOL!

I don’t know if anyone’s embarked on a P2P search engine but that’s essentially my concept. Anyhoo, thanks for the inspiration!

replies(2): >>43111232 #>>43113831 #
14. ChuckMcM ◴[] No.43111232{5}[source]
Peer to peer would be tough, you really need a 10G network connection to some tier 1 provider, and about 2500 machines to distribute the crawling/serving load. (that is if you want to do a full stack search engine). And while you can run that infrastructure for on the order of $100K/month (not counting depreciation) that means you need roughly $5K/day in revenue from that cluster. At $10 RPM ($10 revenue per thousand queries) you're looking at a minimum of 500,000 'real' search queries during 'English time' (roughly 7AM to 11PM GMT). That's 31,250 queries per hour or ~9 queries per second (average).

And that just pays to keep the lights on at the colocation center. If you're paying off the development costs (30 - 50 developers over 2 - 3 years) and the cost of an office somewhere. You'll want at least double that revenue or you'll go broke before you break even.

Ideally you are the 'go to' place for people looking to buy something as those queries make money. People researching Douglas Fairbanks for a high school essay consume queries but don't generate ad revenue.

It isn't for the faint of heart.

replies(1): >>43111774 #
15. NetOpWibby ◴[] No.43111774{6}[source]
When you don't know what you don't know...wow.

I know "search is hard" in the general sense but context is lacking (not a lot of details online from ex-search teams). It's always been apparent to me that you must have some other high-grossing product if you want to get into search or video, if only to pay for the servers.

Thank you for providing your context!

16. NetOpWibby ◴[] No.43111842{5}[source]
I really thought Neeva was gonna make it. I'm glad Kagi swooped in when they exited.
17. 4ggr0 ◴[] No.43112444{3}[source]
> though if you are thinking ahead you can open these in another tab

or add it to the "Watch Later" playlist :) so you can watch it...later.

18. immibis ◴[] No.43113831{5}[source]
Darknet Lantern is a decentralized searchable directory. It's probably not going to take off, but it could inspire something else. Servers spider other servers with the same software, and synchronized their data.
replies(2): >>43115427 #>>43118693 #
19. numeri ◴[] No.43114322[source]
This behavior started happening for me in the last few months. If I click on a result, then go back, I have different search results.

I've found a workaround, though – click back into the DDG search box at the top of the page and hit enter. This then returns the original search results.

20. genewitch ◴[] No.43114706[source]
PID techniques useful?
21. NetOpWibby ◴[] No.43115427{6}[source]
I’ve never heard of this before but it looks interesting. Thanks for the tip!
22. gtfiorentino ◴[] No.43115728[source]
Hi - I work on search at DuckDuckGo. Do you mind sharing a bit more detail about this issue? What steps would allow us to reproduce what you're seeing?
23. gopher_space ◴[] No.43116290[source]
> Some of the most interesting, and most subtle, algorithm work was done keeping that system "critically damped" so that it would converge quickly.

Looking back at my early work with microservices I'm wondering how much time I would have saved by just manually setting a tongue weight.

24. ChuckMcM ◴[] No.43118693{6}[source]
Yup, directory services are a lot easier to do peer-to-peer. Pinboard.in is a good shared directory (sort of Yahoo! without the editorial). They can yield excellent quality when you're searching for something that someone has 'indexed' with them, but poor recall when it comes to the set of all possible answers.

Doing it peer to peer without editorial allows sites to 'get into' the index easily which has its own plusses and minuses.