←back to thread

1743 points caspii | 2 comments | | HN request time: 0.408s | source
Show context
ilamont ◴[] No.27428272[source]
Same story for various Wordpress plugins and widgety things that live in site footers.

Google has turned into a cesspool. Half the time I find myself having to do ridiculous search contortions to get somewhat useful results - appending site: .edu or .gov to search strings, searching by time periods to eliminate new "articles" that have been SEOed to the hilt, or taking out yelp and other chronic abusers that hijack local business results.

replies(19): >>27428410 #>>27428439 #>>27428441 #>>27428466 #>>27428594 #>>27428652 #>>27428717 #>>27428807 #>>27429076 #>>27429483 #>>27429797 #>>27429818 #>>27429843 #>>27429859 #>>27430023 #>>27430207 #>>27430285 #>>27430707 #>>27430783 #
elchupanebre ◴[] No.27430207[source]
The reason for that is actually rational: when Amit Singhal was in charge the search rules were written by hand. Once he was fired, the Search Quality team switched to machine learning. The ML was better in many ways: it produced higher quality results with a lot less effort. It just had one possibly fatal flaw: if some result was wrong there was no recourse. And that's what you are observing now: search quality is good or excellent most of the time while sometimes it's very bad and G can't fix it.
replies(5): >>27430295 #>>27430301 #>>27430306 #>>27430308 #>>27430753 #
cookiengineer ◴[] No.27430753[source]
> G can't fix it.

Yes, they can. They should simply stop measuring only positives, and start measuring negatives - e.g. people that press the back button of their browser, or click the second, third, fourth result afterwards...which should hint the ML classifiers that the first result was total crap in the first place.

But I guess this is exactly what happens if you have a business model where leads to sites where you provide ads give you a weird ethics, as your company profits from those scammers more than from legit websites.

From an ML point of view google's search results are the perfect example of overfitting. Kinda ironic that they lead the data science research field and don't realize this in their own product, but teach this flaw everywhere.

replies(1): >>27430831 #
quantumofalpha ◴[] No.27430831[source]
They have been already doing this for a loooong time, it's a low hanging fruit.

Take a look sometime at the wealth of data google serp sends back about your interactions with it

replies(2): >>27430846 #>>27430908 #
1. friendzis ◴[] No.27430908[source]
The fact that they do collect data does not mean that they use that data in any meaningful way or at all.

They ought to see humongous bounce rates with those fake SEOd pages. Normally, that would suggest shit tier quality and black-hat SEO, which is in theory punishable. Yet, they throw that data away and still rank those sites higher up.

You mean to say that no one at Google has even heard of "external SEO", which is nothing more than fancy way of saying link farming? They do know, this is punishable according to their own rules, yet it works, because either they cannot fix it or do not care to.

replies(1): >>27433262 #
2. quantumofalpha ◴[] No.27433262[source]
They'll never tell how they use the data for obvious reasons and I also can't go into any details. But any obvious thing you can think of almost certainly has been tried, they've been doing it for 20+ years and ranking alone is staffed with several hundreds of smart engineers. Mining clickthrough logs is a fairly old topic itself, has been around since at least early 2000s.