Most active commenters
  • lupire(5)
  • emodendroket(4)
  • wlesieutre(3)
  • aarchi(3)
  • gerdesj(3)
  • tempestn(3)
  • eyelidlessness(3)
  • ineedasername(3)
  • cookiengineer(3)
  • quantumofalpha(3)

←back to thread

1743 points caspii | 139 comments | | HN request time: 2.446s | source | bottom
1. ilamont ◴[] No.27428272[source]
Same story for various Wordpress plugins and widgety things that live in site footers.

Google has turned into a cesspool. Half the time I find myself having to do ridiculous search contortions to get somewhat useful results - appending site: .edu or .gov to search strings, searching by time periods to eliminate new "articles" that have been SEOed to the hilt, or taking out yelp and other chronic abusers that hijack local business results.

replies(19): >>27428410 #>>27428439 #>>27428441 #>>27428466 #>>27428594 #>>27428652 #>>27428717 #>>27428807 #>>27429076 #>>27429483 #>>27429797 #>>27429818 #>>27429843 #>>27429859 #>>27430023 #>>27430207 #>>27430285 #>>27430707 #>>27430783 #
2. jamiek88 ◴[] No.27428410[source]
Ugh Pinterest results.
replies(2): >>27428673 #>>27429655 #
3. duskwuff ◴[] No.27428439[source]
Free WordPress* themes are particularly bad in this regard. Since they're expected to contain HTML anyway, it's altogether too easy for the author of a theme to include a couple of links to a site they want to promote. Some themes take this to the next level by obfuscating the code that generates the promotional links, and/or including other code which makes the site not work properly if the links are removed.

*: and themes for other web applications, but mostly WordPress these days

4. naikrovek ◴[] No.27428441[source]
google isn't the cesspool, people who want to appear at the top of a list of search results are doing whatever it takes to create a cesspool, because that's what it takes to earn more money.

being willing to make other things in order to have more money always creates cesspools.

replies(3): >>27428507 #>>27428552 #>>27428886 #
5. wingworks ◴[] No.27428466[source]
I really don't like how easy it is to fake a "new" article on Google. You can just re-publish an old article and stick a new date on it and Googles takes it on face value and uses the new date.
replies(2): >>27429860 #>>27430653 #
6. Retric ◴[] No.27428507[source]
Google is a cesspool because it’s their job to fix it and they failed. I stopped using Google search because of how far it’s fallen.
7. beepbooptheory ◴[] No.27428552[source]
If its the only way to make money, it doesn't really feel like the burden is on the people to make a cleaner pool
replies(1): >>27429649 #
8. XorNot ◴[] No.27428594[source]
Also phone problems: Google a problem with a phone and the top hit will be a whole bunch of churned out articles with generic copy on the cause (sometimes there are bugs in the software, so reboot your phone).
replies(1): >>27428715 #
9. newacct583 ◴[] No.27428652[source]
> Google has turned into a cesspool.

All these same sites appear near the top of Bing searches too. There's nothing particularly Google-specific to this story. It's about SEO hacking that will work against anyone with a PageRank-style system.

replies(3): >>27428739 #>>27428757 #>>27429080 #
10. wlesieutre ◴[] No.27428673[source]
I swear, Pinterest must have employees working undercover in the Image Search team for Google to have let them destroy image search results the way they have.

It's literally never the original source for anything, but you can bet it's most of the first 10 pages of results. Then it doesn't even let you right click to open the image file, and dumps you to a login prompt if you click on anything. THAT'S NOT EVEN YOUR IMAGE STOP TELLING ME WHAT I CAN DO WITH IT.

replies(2): >>27428795 #>>27429452 #
11. duskwuff ◴[] No.27428715[source]
Any technical issue, really. There's a ton of autogenerated content out there with low-effort troubleshooting tips. A lot of it is used as lead generation for scammy antivirus/antimalware/"cleaner" software, paid tech support, or outright tech support scams.
replies(4): >>27428831 #>>27429722 #>>27429760 #>>27430638 #
12. aarchi ◴[] No.27428717[source]
Anecdotally DuckDuckGo seems to have fewer sponsored sites than Google. DDG also makes it easy to block low-quality sites because it adds a data-domain attribute to the root of every search result. I recently started this mini uBlock Origin filter list for that (suggestions welcome!):

    ! Hide low-quality results on DuckDuckGo
    duckduckgo.com##[data-domain="w3schools.com"]
    duckduckgo.com##[data-domain$=".w3schools.com"]
    duckduckgo.com##[data-domain="w3schools.in"]
    duckduckgo.com##[data-domain$=".w3schools.in"]
    duckduckgo.com##[data-domain="download.cnet.com"]
    !! Stack Exchange mirrors
    duckduckgo.com##[data-domain="exceptionshub.com"]
    duckduckgo.com##[data-domain="intellipaat.com"]
replies(3): >>27430899 #>>27431653 #>>27433528 #
13. worble ◴[] No.27428739[source]
I think it's high time we had a webring resurgence. It's impossible to get anywhere with plain search anymore, what we need is curated websites that other domain owners are happy to say "I endorse the people running this site, so if like my stuff you'll like them too"
replies(3): >>27428818 #>>27430011 #>>27430786 #
14. jimbob45 ◴[] No.27428757[source]
This is my view too. Yes, I’d love to go back to a time when Google’s algorithms were unknown enough for SEO to be futile but those days are gone and the problem isn’t limited to Google.
15. kemotep ◴[] No.27428795{3}[source]
And if it is not a pintrest link it is an amp link which is equally bad in my experience. I just want to link a picture. Not a link to a page that might have the picture but might also have the entire article/reddit discussion and not the image which I was searching for.
replies(2): >>27428882 #>>27432081 #
16. colordrops ◴[] No.27428807[source]
Google Search is ripe for disruption. It's been over 20 years now and they are not dynamic or interesting at all anymore.
replies(4): >>27428814 #>>27428840 #>>27429036 #>>27429066 #
17. emodendroket ◴[] No.27428814[source]
It's so easy to do better! Just look at what a rousing success Cuil was.
replies(1): >>27428942 #
18. emodendroket ◴[] No.27428818{3}[source]
Isn't that what people go to social media for?
replies(1): >>27429062 #
19. initplus ◴[] No.27428831{3}[source]
These results are incredibly frustrating. Google should de-rank these autogenerated tech troubleshooting sites.

Yes, I clicked the link because it exactly referenced my issue. But it's not helpful to just see the same 5 tips copy pasted from elsewhere by an algorithm.

replies(2): >>27429606 #>>27430181 #
20. LeoPanthera ◴[] No.27428840[source]
I still think that the "Yahoo!" style web directory is a good model. A catalogue of hand-curated links has increasing value as the quality of Google results goes down.

I was briefly going to write "I'm surprised that DMOZ[1] still exists" but it says "Copyright 2017 AOL" at the bottom so maybe it doesn't.

Edit: ...and using the search box results in a 404 so I guess it's really dead huh.

Edit 2: Apparently this is the successor! https://curlie.org/en

[1]: https://dmoz-odp.org

replies(3): >>27428976 #>>27429016 #>>27429077 #
21. wlesieutre ◴[] No.27428882{4}[source]
When I'm reverse image searching something it's often to find the original artist of an illustration, photo, or whatever. I want to know who made it, see their other work, and find it in its original quality without 15 generations of jpg recompression artifacts.

But no, Pinterest has better SEO than the artist does, so it's just endless reposts upon reposts and never the original work.

Occasionally you get lucky and it's not the sort of image that Pinterest users share. Then you might actually find where it came from.

replies(3): >>27429275 #>>27429857 #>>27429918 #
22. prepend ◴[] No.27428886[source]
Google’s mission was “organize the world’s information and make it useful” and they are doing a poorer job now than historically.

Of course there are scammers, that’s part of what makes organizing so hard.

Cynically, I think that Google is worse as filtering scammers is because they care less now. Half the page is ads so they make money either way.

23. kortilla ◴[] No.27428942{3}[source]
Nobody said it would be easy. Industries ripe for disruption are often very hard to break into. Being ripe for disruption is more about giving up on innovating so you stagnate.
24. Apocryphon ◴[] No.27428976{3}[source]
The creation and maintenance of such a directory might additionally be more feasible now because sadly there are much fewer personal or independent websites instead of content hosted on large platforms.
25. 0xbadcafebee ◴[] No.27429016{3}[source]
I just tried to use both to look up pharmacies via navigation.. With Dmoz after my second try I was able to find CVS, but I wasn't able to find it with Curlie..

It's not a bad idea to have a curated dataset of information. But clearly there are much better ways to navigate said information, which would include search, but also dynamic filters, predictive text, sorting algorithms, context awareness, etc. All of which... is built into modern search engines.

So perhaps what we really want is a Wikipedia/OpenStreetMaps of curated, indexed, semantic content/links, that anyone can consume and write their own search interface for. Basically, an open data warehouse of website information.

26. lemmiwinks ◴[] No.27429036[source]
The irony being that 20 (more like 25?) years Yahoo search was ripe for disruption... by Google :)

Halt and Catch Fire [1] (As a nerd, I can say it's one of the few TV series that got the hackers spirit correctly) had a few episodes about the Google disruption.

Like some people often say here, things come and go in circles...

[1]: https://en.wikipedia.org/wiki/Halt_and_Catch_Fire_(TV_series...

27. bottled_poe ◴[] No.27429062{4}[source]
Social media is gamed the same way? Sharebots, etc
replies(1): >>27429082 #
28. rickspencer3 ◴[] No.27429066[source]
Neeva.com

I am in the pre-release program. The hardest initial thing to get used to was not immediately scrolling down to the bottom to avoid all of the spam.

I suspect that their methods are not much different than Google, but the experience has been so much better.

replies(3): >>27429391 #>>27429548 #>>27429912 #
29. gerdesj ◴[] No.27429076[source]
"Google has turned into a cesspool."

That's a bit harsh but I agree that it is starting to fail to live up to the expectations I had with Google when it came out and destroyed Altavista in a spectacular shower of sparks.

Could I tender: "uBlacklist" as a stop gap, amongst others as we await Google being given a right old kicking?

Despite being a staunch Arch Linux user I have to deal with rather a lot of MS Windows related stuff. Being able to filter out that bloody awful Microsoft Social thing gets me closer to decent results. The majority of the next 10-100 results will be CnP clones of someone's blog but a human is able to get in reasonably quickly. I'm toying with blocking Stackoverflow and other cough slatwarts to see if results get better for me.

In my opinion: the www has hit a crossroads or perhaps a Spaghetti Junction or a Magic Roundabout for the last five years or so and continuing. However the exits are connected to the entrances on these road systems (take a look at them - they are real junctions. The MR is particularly terrifying but it works really well.)

I still won't use words like cesspool for this but I am increasingly losing my patience over the standard of results from Google. Those featured things (not the Ads - that's fine) at the top which add #blah_blah to the URL to colour search terms yellow is not working for me. The quality of the returns featured in a box are often rubbish too. It would be nice to be able to turn all that stuff off.

I understand that Google are trying to "be" the internet to try and keep the stock ticker pointing north but there seems to be a point when they have overreached themselves and I think that was passed several years ago. I also increasingly feel that Google thinks that it knows best and has removed many choices from their various UIs - that comes across as a bit arrogant.

Many years ago I left Altavista behind for Google. I will move again if I feel I have to. Of course that's not much in the grand scheme of things and I'll probably only take around 100,000 people with me but they have friends - still probably not a big deal.

replies(5): >>27429367 #>>27429643 #>>27429710 #>>27429831 #>>27430321 #
30. mschuster91 ◴[] No.27429077{3}[source]
> A catalogue of hand-curated links has increasing value as the quality of Google results goes down.

Who will pay for its creation, maintenance and hosting? Who will judge ranking, disputes, hacks?

Who will have an eye on discrimination issues? Whose jurisdiction will be relevant (think GDPR or the Australian press "gag order" law in the case of that cleric accused of fondling kids)?

Who will take care that the humans who will get exposed to anything from generic violence over vore/gore to pedo content get access to counseling and be fairly paid? Facebook, the world's largest website, hasn't figured out that one ffs.

These questions are ... relatively easy to bypass with an automated engine (all issues can be explained away as "it was the algorithm" and IT-illiterate judges and politicians will accept this), but as soon as you have meaningful human interaction in the loop, you suddenly have humans that can be targeted by lawsuits, police measures and other abuse.

replies(2): >>27429482 #>>27429730 #
31. Mediterraneo10 ◴[] No.27429080[source]
Indeed. I recently noticed this while relying on DDG for documentation for Common Lisp, a language I still learning. The top-ranking site for any Common Lisp function was an SEO scam site, where clearly someone had hired freelancers to take preexisting CLisp documentation and rewrite it – in poor-quality English – until it would no longer be detectable as copyright violation, then loaded it with ads.

(I just checked and this copycat documentation site has, thankfully, now been pushed down a bit in DDG results.)

replies(2): >>27430451 #>>27430948 #
32. emodendroket ◴[] No.27429082{5}[source]
Do you suppose Web rings wouldn't be when there's money in it? There was plenty of that when they were just for fun.
replies(1): >>27429314 #
33. IggleSniggle ◴[] No.27429275{5}[source]
THIS. So much this. Time was when you could actually discover the provenance of an image. Almost every time, when I’m doing a reverse image search, that is my intent. It used to work. It seldom does these days.
replies(1): >>27436254 #
34. IggleSniggle ◴[] No.27429314{6}[source]
It’s fundamentally about trust models. That is to say, about the audience.

Everybody gave up trusting webrings because Google provided better results. Now that Google results are shit, there’s room for other information vendors to come along, even if it’s in narrow areas.

Actually, HN is already this for me in some respects.

replies(1): >>27429867 #
35. emptyparadise ◴[] No.27429367[source]
I'm amazed that there isn't anything like uBlock Origin for search results.
replies(3): >>27429646 #>>27429653 #>>27444791 #
36. kmonsen ◴[] No.27429391{3}[source]
I'm also testing neeva, do you know what they use to get the search results?
replies(1): >>27431203 #
37. bobcostas55 ◴[] No.27429452{3}[source]
Really makes you wonder if the people at google actually use their own product. Anyone who has ever used google image search in the past couple of years will have noticed that it's filled to the brim with garbage results from pinterest.
replies(1): >>27430265 #
38. rchaud ◴[] No.27429482{4}[source]
It doesn't need to be a corporate enterprise that has to worry about all those things. People already share directories of links via Google Docs, Notion notebooks and the like.
39. luke2m ◴[] No.27429483[source]
I don’t like google and don’t really want to defend it, but this is more of a lots of crappy websites problem than a google problem.
replies(1): >>27429492 #
40. worik ◴[] No.27429492[source]
Google, to justify its huge capital worth, should deal with that crap. Why else bother?
41. luke2m ◴[] No.27429548{3}[source]
I would rather not have a required sign in to a search engine, but looks interesting.
replies(1): >>27429842 #
42. ◴[] No.27429606{4}[source]
43. smegger001 ◴[] No.27429643[source]
I wish i could have 2010 google search as a alternative to 2021 google search.
replies(3): >>27429672 #>>27429817 #>>27430191 #
44. gerdesj ◴[] No.27429646{3}[source]
"My eyes are bent, my back is grey etc"

I think we have loads of tools to play with but fundamentally there is a problem when you are fighting with your search engine to find stuff you want to find.

My laptop (Arch) still has Chromium as default with uBlock Origin, Privacy Badger, uBlacklist and a few others running. I will be moving back to FF and running a sync server because I am that pissed off and able to do so. I'll also take a few others with me (between 2 and rather more)

When I say move back to FF, I'm talking about something like reverting a 10-15 years change.

I've always had FF available but it fell short back in the day for long enough for me to move to the Goggle thing. Now I think I'll go back.

Noone at G will lament their loss, I'm not even a rounding error. I'm sure that all is fine there.

replies(1): >>27431137 #
45. naikrovek ◴[] No.27429649{3}[source]
there is never only a single way to make money. some ways are easier. some ways let you take advantage of others; these are of the variety that create cesspools.
46. aarchi ◴[] No.27429653{3}[source]
If you're referring to user-curated search result blocking, that's very easy with DuckDuckGo and uBlock Origin (just block elements like [data-domain="w3schools.com"]; see my comment to the GP). I don't know of any large extant lists like this though.
replies(1): >>27429715 #
47. ajsnigrutin ◴[] No.27429655[source]
I'd expect a company like google, who tracks what kind of socks you have on everyday, to also track their own search engine... users mistakingly clicks on pinterest link, user immediatly clicks back, and looks for something else... is it so hard to assume, that they don't want pinterest results, because they're useless, and somehow lower their seo score? Nooo, of course not, just put the pinterest results near the top, until users puts "-pinterest" in the search bar.
48. gerdesj ◴[] No.27429672{3}[source]
How so? I haven't seen much change apart from that crappy yellow streak of piss thing that dribbles on pages.

How do you recall 2010 search? (I suspect I've lost it a bit - I'm 50.5 years old)

replies(1): >>27429878 #
49. oska ◴[] No.27429710[source]
I appreciate a lot of what you're saying in this comment but I disagree with this sentiment:

> not the Ads - that's fine

In my strongly held opinion, push advertising is not fine and it's the root cause of all the problems you are discussing. We will only exit this mess that the web has become when everyone blocks push advertising by default. People should only see advertising when they are interested in being advertised to, e.g. sites you consciously choose to go to that advertise products & services, like the old Yellow Pages phonebooks.

50. derefr ◴[] No.27429715{4}[source]
That won't do much if every result on the first page is blocked. Ideally a filter list like this could be pushed to the server side as a per-user preference to go with your query, so that if e.g. the top 10000 results were all filtered out, then you wouldn't have to click through (or infinite-scroll autoload) 100 empty pages before getting anything.
replies(2): >>27430183 #>>27430231 #
51. toeget ◴[] No.27429722{3}[source]
That's why I append reddit, stackoverflow, superuser when I search for technical solutions. At least those sites are still full of user-generated content with good answers upvoted to the top.
replies(1): >>27429774 #
52. derefr ◴[] No.27429730{4}[source]
> as soon as you have meaningful human interaction in the loop, you suddenly have humans that can be targeted by lawsuits, police measures and other abuse.

In theory, you could have a curated directory whose hosting works like ThePirateBay, and whose maintainership is entirely anonymous authors operating over Tor (even though the directory itself holds nothing the average person would find all that objectionable.)

Of course, there's no business model in that...

replies(1): >>27431739 #
53. PoignardAzur ◴[] No.27429760{3}[source]
The last few weeks I've started noticing a very specific type of SEO that pops up when I'm doing technical search, where the first page will be a Stack Overflow result, and the 3rd or 4th result will be from some content farm, copy-pasted from SO, sometimes translated in French.

It's a little unsettling.

replies(3): >>27429986 #>>27430312 #>>27430366 #
54. PoignardAzur ◴[] No.27429774{4}[source]
You know, I was joking the last few times the subject came up, but I'm getting seriously worried that the more people mention using that kind of trick on HN, the faster advertisers will catch on and start building reddit-based SEO strategies.

Not sure how we should react :/

replies(3): >>27430059 #>>27430344 #>>27430510 #
55. paulpauper ◴[] No.27429797[source]
I wish duckduckgo had better results. google still better
56. tempestn ◴[] No.27429817{3}[source]
Problem is, I expect 2010 google search would be considerably worse now than it was in 2010, because "SEO" has had another decade to evolve.
replies(2): >>27430810 #>>27431016 #
57. cyanydeez ◴[] No.27429818[source]
dont forget adding quotes to things to stop the random "did you mean to spell this?" crap

basically, like everything in modernity, its a race to the bottom of the infinite dullards of popular

58. p5a0u9l ◴[] No.27429831[source]
Comparing Google now to Alta Vista is not very helpful. They don't get to rest on their laurels. Search is less helpful now, and it's not clear to me that they care enough to do something about it.
replies(1): >>27430334 #
59. texasbigdata ◴[] No.27429842{4}[source]
That just implies locking into an ad supported model. Personally, would prefer to pay. Stewart Russel wrote in his book that when surveying humans the value they ascribed to not being able to google fo a year was something like $17,000 per year. Just some absurd number.
replies(1): >>27429915 #
60. torbital ◴[] No.27429843[source]
I can't remember the last time I searched on Google without appending "reddit" to the end.
61. tempestn ◴[] No.27429857{5}[source]
And the interesting thing about that is, you'd think it would be (relatively speaking) straightforward for Google to keep track of the first place a given image was indexed (or possibly the first few places, or everywhere it was seen over the first X period of time since you couldn't guarantee the very first would always be the original). Assuming that original was still online, it would seem to be the place to direct searchers to, regardless of pagerank or whatever.
62. normac2 ◴[] No.27429859[source]
Hmn. I would agree about all crap being mixed in there, but in terms of overall results (both wrt. SEO crap and other irrelevant stuff), my experience has been that the quality troughed something like 2-3 years ago and then came back (my guess is that they're incorporating all of the AI they've been doing throughout the company into search). To me it feels like it's about 80% of its best right now.

I bet it's that we do different types of searches.

63. BigJono ◴[] No.27429860[source]
I ran into this for the first time yesterday when trying to find out new info about a footy player. Some article from 15 years ago talking about how he had a good first game, tagged as 5th june 2021. Like, wtf?
replies(1): >>27430766 #
64. emodendroket ◴[] No.27429867{7}[source]
In my opinion this site is not really so different from, say, Reddit, beyond having more focused rules and being smaller. So I don't think my idea that social media have supplanted the Web ring is wide of the mark.
65. smegger001 ◴[] No.27429878{4}[source]
In general i had more relevent results on my first search qurry compared to now admitedly thats hard to prove as i can't rerun the search side by side for a comparison now.

additionally ads were firmly separated into a colored box away from actual results

replies(1): >>27430100 #
66. justinbaker84 ◴[] No.27429912{3}[source]
I just signed up for a trial with them after reading this post.
67. justinbaker84 ◴[] No.27429915{5}[source]
It is not an ad supported model - it is a subscription model. I just signed up for it.
68. jsjohnst ◴[] No.27429918{5}[source]
Try using tineye.com. It has noise too, but seems to be easier to find the original source than Google these days, at least for me anyway.
69. ihnorton ◴[] No.27429986{4}[source]
It can be worse than that when those sites get a full multi-line result billing whereas the original stackoverflow answer gets a single-line subheading under some other SO result.
70. cortesoft ◴[] No.27430011{3}[source]
If they can inject a random link into a page, why couldn't they also be able to inject a web ring link?
71. ◴[] No.27430023[source]
72. Camillo ◴[] No.27430059{5}[source]
Oh, it's no secret. Google's autocomplete will actually suggest appending "reddit" to certain queries. For example, let's take one of the most SEO-spammy queries imaginable, "best mattress 2021". Google will suggest:

- best mattress 2021

- best mattress 2021 consumer reports

- best mattress 2021 reddit

- best mattress 2021 for back pain

- best mattress 2021 wirecutter

etc.

But of course Reddit is already rife with shills. Not sure about CR.

replies(2): >>27430122 #>>27430305 #
73. wernercd ◴[] No.27430100{5}[source]
As mentioned, I removing the think the rose colored glasses won't put lipstick on this pig. Google Search (and not sure how Bing or similar would do better, baring their censorship problems) is increasingly a minefield...

This is the same problem with something like WoW classic... you can get the game that existed 15 years ago. But even if it is the exact same game, the world itself isn't. Online walkthroughs, videos, modding knowledge, theory crafting, etc. Those things are much more fleshed out today so even if the system didn't change 1 bit, WoW Original vs WoW Classic are really two separate games.

Likewise... if you dropped Google Original down today? I'd love to see how fast it would get owned by these sorts of operations that have had a decade+ of practice in skills like CEO that didn't exist in 2010.

You had more relevant results? That wouldn't change because companies live and die off of SEO now and didn't then. Highlighted ads are such a small thing on the website when compared to getting a full front page of the same Stack Overflow answers in 20 different websites that all have SO cloned and reskinned.

74. sixothree ◴[] No.27430122{6}[source]
I remember in the late 2000's I had a CR account. I had two weeks left on the period I had paid for. But when I cancelled the account... poof. My access was revoked immediately. Very much not consumer friendly. I was done enough with their crap that I didn't even bother with an email.
replies(2): >>27430200 #>>27430914 #
75. minikites ◴[] No.27430181{4}[source]
>These results are incredibly frustrating. Google should de-rank these autogenerated tech troubleshooting sites.

Why? Google makes money from advertisements either way, it's not in their interest to improve search results. If anything, terrible search results make users more likely to click on ads, which now look better by comparison.

replies(2): >>27430296 #>>27430481 #
76. bombcar ◴[] No.27430183{5}[source]
https://millionshort.com/ tries something like this.
77. narrator ◴[] No.27430191{3}[source]
Yandex.com is 2010 Google search, IMHO. It's not filtered at all and seems to have that pure pagerank feel of the old Google search engine, while the modern Google seems to be hand tweaked quite a bit to only quote "authoritative sources". Search for a politically controversial topics all you want on Google and you will not have your first couple of pages being debunking or fact check sites. Compare Google's search results for "who is zhengli shi" vs. the Yandex.com results for example. You can even find Putin scandals and "Tank Man" on there, even though it's a search engine based in Russia.
78. Gracana ◴[] No.27430200{7}[source]
FWIW I signed up for CR recently when I was car shopping, and I canceled my subscription within the first month. They assured me that I would still have access for the remainder of the period. Of course, you're forced to subscribe rather than buy access for a set period, and they sent me a couple dozen emails during the time I was signed up, so they're not completely innocent... but at least that part felt reasonable.
79. elchupanebre ◴[] No.27430207[source]
The reason for that is actually rational: when Amit Singhal was in charge the search rules were written by hand. Once he was fired, the Search Quality team switched to machine learning. The ML was better in many ways: it produced higher quality results with a lot less effort. It just had one possibly fatal flaw: if some result was wrong there was no recourse. And that's what you are observing now: search quality is good or excellent most of the time while sometimes it's very bad and G can't fix it.
replies(5): >>27430295 #>>27430301 #>>27430306 #>>27430308 #>>27430753 #
80. aarchi ◴[] No.27430231{5}[source]
DDG will add more results, if enough are hidden. If I search "w3schools" with my filter, there are only two results on the first page that are not hidden, so it immediately displays the second page below. It seems that they planned for this use case.
81. visarga ◴[] No.27430265{4}[source]
I have fallen in love with Yandex image similarity search (search by providing a query image, not text). You can find so much more with it, it's like Pinterest but without the crap. For example I could find images for my ML model but also furniture ideas for my house and check if my kid is objectively cuter than average (lol, yeah, objectively!).
82. lupire ◴[] No.27430285[source]
> Half the time I find myself having to do ridiculous search contortions to get somewhat useful results - appending site: .edu or .gov

A great opportunity for students and public servants to sell premium URLs.

83. robbrown451 ◴[] No.27430295[source]
I wouldn't call that rational. There is no reason you can't apply human weighting on top of ML.

Honestly, I don't believe for a minute they "can't fix it." They do this sort of thing all the time, for instance when ML shows dark skinned people for a search for gorilla, they obviously have recourse.

replies(1): >>27430378 #
84. lupire ◴[] No.27430296{5}[source]
The entire reason Google is the most successful search engine is that people don't use search engines that behave this way.
replies(1): >>27430955 #
85. coliveira ◴[] No.27430301[source]
My impression is that the ML algorithms at Google have the goal of increasing profitability from search. If that is the case, the quality of search will tend to be secondary to displaying pages that bring more revenue.
86. lupire ◴[] No.27430305{6}[source]
Don't search for "best". That's specifically requesting spam.
replies(1): >>27430904 #
87. humaniania ◴[] No.27430306[source]
"Request manual review of search results" button?
replies(1): >>27430440 #
88. jeromegv ◴[] No.27430308[source]
Blatantly false that Google has "no recourse", Google can put on penalty and bring domains down.
89. lupire ◴[] No.27430312{4}[source]
That's a years old scam, but occasionally a new site pops through Google's filters.
replies(1): >>27430962 #
90. Spooky23 ◴[] No.27430321[source]
I don’t think Google is the cesspool, I think Google is a search engine for an internet that is the cesspool.

We’re moving to the vision of information services that were pioneered by AOL, Prodigy, etc. Honestly, we’re there already.

replies(1): >>27430379 #
91. lupire ◴[] No.27430334{3}[source]
You mean besides spending far more on people and computers than any other company, perhaps combined?
replies(1): >>27430534 #
92. eyelidlessness ◴[] No.27430344{5}[source]
Prefer resources that have some governance and aren’t entirely crowdsourced. For example if I’m looking for web tech answers my first search is ‘[whatever topic] mdn’.
93. eyelidlessness ◴[] No.27430366{4}[source]
If you start getting a little esoteric in your searches you’ll get tons of results that are clearly crawled from personal blogs, and hosted on personal-blog-looking domains that redirect to godawful garbage. Especially bad on mobile because Google truncates the URLs.
94. htrp ◴[] No.27430378{3}[source]
You do know that Google basically slapped a patch on that one right?

https://www.theverge.com/2018/1/12/16882408/google-racist-go...

replies(2): >>27430763 #>>27438366 #
95. eyelidlessness ◴[] No.27430379{3}[source]
We were already there when Google was the hot thing all the nerds loved. At the time their search was a way to cut through that, not the primary window into it. The cesspool isn’t Google, now it’s just hosted by them.
96. bhartzer ◴[] No.27430440{3}[source]
Since this is now the top spot here on H/N I suspect it just got the attention of some Googlers who I’m sure will review it.

They may not give the site a manual action, though. They’d rather tweak the algorithm so it naturally doesn’t rank. Google’s algo should be able to see stuff like this.

I know that I’ve seen sites tank in the rankings because they got too many links too quickly. It could be that the link part of the algorithm hasn’t fully analyzed the links yet.

I’d be interested in seeing what the Majestic link graph says about this site, ahrefs doesn’t have tier 2 and tier 3 link data.

97. birktj ◴[] No.27430451{3}[source]
Note that as I quite recently learned DDG has support for a bunch of bang-commands listed at [1]. There are a bunch of them for documentation sites for all kinds of programming languages, including a couple for lisp it seems like.

[1]: https://duckduckgo.com/bang_lite.html

98. ineedasername ◴[] No.27430481{5}[source]
Google became very popular very quickly because it gave much better results much faster. The more that Google allows quality to decline, the faster they approach a non-recoverable tipping point. Just ask Yahoo how quickly that can happen. Google may seem entrenched, but they have a shaky hold on search that is only as strong as its result quality. They are entrenched in advertising, but only because that's where searchers go to search.

Users may be entrenched in other Google products-- Gmail, gcal, docs, etc-- but not search. Someone using all those other Google products could change their default search engine and have zero impact on the rest of their digital life.

I'm shopping around for a preferred alternative right now, I just haven't settled yet.

replies(2): >>27430671 #>>27434392 #
99. na85 ◴[] No.27430510{5}[source]
Reddit has been gamed by guerilla advertisers for years, everyone knows it, and the admins there don't seem to care/are unable to do anything about it.

r/HailCorporate used to be about calling out stealth marketing/advertising but it's morphed into just discussing how things can inadvertently act as an advertisement aka society is full of branding and consumerism. It's a shame because it used to be a very high quality sub.

100. p5a0u9l ◴[] No.27430534{4}[source]
You're giving their entire search budget credit for dealing with spam results? My observation is that it's bad and has been for some time. They are either unable or unwilling to solve the problem.
101. bashtoni ◴[] No.27430638{3}[source]
I keep getting results to a site 'gitmemory.com' which is just GitHub issues scraped. Super annoying that they outrank the actual GitHub issues they've taken the content from.
replies(1): >>27430968 #
102. sellyme ◴[] No.27430653[source]
You can also do the opposite: post something today and say it was up on your site in 2003.

Makes it really difficult to find old pages about something that recently exploded in popularity, because the age filter just doesn't work.

103. kevin_thibedeau ◴[] No.27430671{6}[source]
That was pre-IPO Google. That company doesn't exist anymore. Money is their God now. Every Googlers high salary depends on it.
replies(1): >>27431154 #
104. ping_pong ◴[] No.27430707[source]
Google is a cesspool because the spammers and SEO-hackers are in full force, and Google is only reactive to these threats these days. I mean, does it really matter if they are making hundreds of billions of dollars a year? They seem to be doing something right.

The only time something will change is when traffic starts decreasing to their site, but it's good enough such that people won't change. Look at Facebook, I don't know anyone who uses it as much as they used to 10 years ago, but it's making the most money it ever has. Why on earth would any behavior change? From their points of view, everyone is happy with it!

105. cookiengineer ◴[] No.27430753[source]
> G can't fix it.

Yes, they can. They should simply stop measuring only positives, and start measuring negatives - e.g. people that press the back button of their browser, or click the second, third, fourth result afterwards...which should hint the ML classifiers that the first result was total crap in the first place.

But I guess this is exactly what happens if you have a business model where leads to sites where you provide ads give you a weird ethics, as your company profits from those scammers more than from legit websites.

From an ML point of view google's search results are the perfect example of overfitting. Kinda ironic that they lead the data science research field and don't realize this in their own product, but teach this flaw everywhere.

replies(1): >>27430831 #
106. brigandish ◴[] No.27430763{4}[source]
I’m confused. I read that article and it has this:

> But, as a new report from Wired shows, nearly three years on and Google hasn’t really fixed anything. The company has simply blocked its image recognition algorithms from identifying gorillas altogether — preferring, presumably, to limit the service rather than risk another miscategorization.

Is that not an example of human intervention in ML?

107. lethologica ◴[] No.27430766{3}[source]
I have been seeing this a lot recently too. Especially with the first result or two. Or the section up top that gives you a partial answer without having to click through. All of them always seem to have been freshly written like some made to order meal at a restaurant. It’s just too suspicious really.
108. cookiengineer ◴[] No.27430783[source]
I also noticed that Apple users see way more fake online shop results than Linux users, from the same IP, with regularly cleared browser cache and identical search terms.

Those fake shops are part of discussions in politics right now. Usually they're registered in Ireland or Malta as companies due to their specific banking laws. They make millions with those scams and people can't differ between legit online shops and fake ones - because the legit ones actually look crappier than the fake ones when it comes to the website designs.

In Germany, we have at least for hardware the "geizhals" website which is kind of an index for all kinds of electronics shops and they try to verify as much as possible.

But for other online shop sectors (e.g. clothing or home stuff) I wouldn't trust anything. Even on Amazon I got scammed a lot and heard absurd things from others...like getting packages with no content in them and Amazon refusing to see that the seller is a scammer etc.

109. stevenicr ◴[] No.27430786{3}[source]
Id like to see. and be happy to post some various web rings and blogrolls..

One of the things that killed them imho is when google started penalizing sites that linked to some other sites.

This was compounded by the expired-domain market..

wordpress even took out linkrolls around that time, people that had them in sidebar widgets would have them disappear unless they installed a new plugin to bring them back.

Webrings that auto-add the "nofollow tag" I guess could make them okay for people again.

Might be cool to have a github type page with a list of rings to reccomend.. a script auto-pulls it into your page, adding nofollow - and then other people could copy your list or clone/fork..

110. bigger_cheese ◴[] No.27430810{4}[source]
There was already SEO stuff going on back then people were less aware of it. I can remember during height of the Iraq war people manipulated google to display George Bush as the top result for "Miserable Failure" and there were other exercises like that happening.

It's hard for me to pick a sweet spot for the internet in many ways I feel like I've grown up with it.

I can remember the web of circa 1995 to 1997 with Gif's that wouldn't render properly in internet explorer, HTML marquee scrolling text and the dreaded blink tag being used everywhere. You needed to play search engine bingo with Altavista, Metacrawler, Yahoo, Infoseek, Lycos etc etc. And it was a crap shoot if search engines would give you useful results.

I can remember the web of 1998 to 2000 where every web developer seemed to discover html frames at the same time. We had good search with Google but pop up ads were so rife that the internet was borderline unusable. I can remember all the free webmail sites like hotmail, yahoo etc. ICQ chat was massive (whatever happened to that - it was a staple of my teen internet).

In Early 2000's Firefox came along and saved the internet by virtue of its built in popup blocking. But there was a mishmap of "Applets" and "Plugins" everywhere Flash Player, Java Applets, Real Player etc. Video (and audio) on the web was terrible half the time it would complain about missing codecs, it would buffer forever and if something did load it would be the size of a postage stamp and look pixelated as all hell. I remember Gmail came out and everyone went gaga over it's interface.

Last period that real stands out is the mid to late 00's with development of big Social Media sites, Facebook, Twitter, Youtube etc. The web got more and more javascript heavy. Web video streaming finally became useable. Google Chrome came out and flash player finally died despite Microsoft trying to revive it with Silverlight.

I kind of feel like this last 10 years are a continuation with increased surveillance and tracking.

111. quantumofalpha ◴[] No.27430831{3}[source]
They have been already doing this for a loooong time, it's a low hanging fruit.

Take a look sometime at the wealth of data google serp sends back about your interactions with it

replies(2): >>27430846 #>>27430908 #
112. cookiengineer ◴[] No.27430846{4}[source]
Please provide proof for this theory that google measures this also.
replies(1): >>27430862 #
113. quantumofalpha ◴[] No.27430862{5}[source]
I worked in ranking for two major search engines. They all measure this, this is a really low hanging fruit - how much time it took you to come up with this idea? Why do you think so lowly of people who put decades of life into their systems that they didn't think of it?

Technically just open google serp in developer tools, network tab, set preserve/persist logs option, and watch the requests flowing back - all your clicks and back navigations are reported back for analysis. Same on other search engines. Only DDG doesn't collect your clicks/dwell time - but that's a distinguishing feature of their brand, they stripped themselves of this valuable data on purpose.

replies(2): >>27430985 #>>27431337 #
114. bassdropvroom ◴[] No.27430899[source]
Great tip! I've been using DDG's official addon but this means one less addon. Thanks!
115. ehnto ◴[] No.27430904{7}[source]
I use colloquial language to try and target actual human reviews on forums. "Are audio-technica any good?"

Mostly works, but Google drops keywords pretty quickly now so you still get lots of spam or shopping sites.

116. friendzis ◴[] No.27430908{4}[source]
The fact that they do collect data does not mean that they use that data in any meaningful way or at all.

They ought to see humongous bounce rates with those fake SEOd pages. Normally, that would suggest shit tier quality and black-hat SEO, which is in theory punishable. Yet, they throw that data away and still rank those sites higher up.

You mean to say that no one at Google has even heard of "external SEO", which is nothing more than fancy way of saying link farming? They do know, this is punishable according to their own rules, yet it works, because either they cannot fix it or do not care to.

replies(1): >>27433262 #
117. abawany ◴[] No.27430914{7}[source]
I've been trying to unsubscribe from CR email spam for months now to no avail. Looking at the browser tools, it seems that their api can't handle the fact that I registered with a single letter first/last name so therefore my attempts to unsubscribe silently fail. There also appears to be no way to change my name since the api for that also fails on the single letter first/last name. I wish ungood things to happen to the people who 'designed' this Kafkaesque rubbish and in the meantime, thank GMail's mark-as-spam feature for throwing away their unrelenting pablum to the memory hole. This experience has led to me canceling my print subscription to CR plus my donations to their organization.
118. abhinav22 ◴[] No.27430948{3}[source]
For learning Common Lisp, I highly recommend https://github.com/ashok-khanna/common-lisp-by-example
119. MiguelX413 ◴[] No.27430955{6}[source]
They obviously do use search engines, like Google, that behave this way.
120. kuschku ◴[] No.27430962{5}[source]
I haven't gotten real SO as google result in years, only those content farms, constantly. Nowadays the same even happens for github issues, they're also mostly outranked by content farms copying from them.

If I search on mobile, often all my results are these content farms. (Google used in English from Germany)

121. sdoering ◴[] No.27430968{4}[source]
How is this not just spam and duplicate content. I remember when I was punished by G for duplicate content on my very small private blog when I was using jekyll and had the markdown sources and the code stored in GitHub. I didn't know of the canonical tag back than and was punished because the GitHub domain had more trust.

It is sad, bit nowadays I often just directly jump onto page 3 at Google or use other "tricks" to get okayish results.

122. skinkestek ◴[] No.27430985{6}[source]
So they do collect it, they only ignore it - just like the 10 - 30 (or more) clicks I've spent on the tiny tiny [x] in the top corner of scammy-looking-dating-site-slash-mail-order-bride ads that they served me for a decade?
123. eitland ◴[] No.27431016{4}[source]
I think matt_cutts or someone who was active at the same time used to say that.

But it still doesn't defend not blocking sites that doesn't contain anything except autogenerated content.

And it still doesn't defend ignoring my keywords.

replies(1): >>27431488 #
124. eitland ◴[] No.27431137{4}[source]
> I'm not even a rounding error. I'm sure that all is fine there.

I'm already here :-)

If 5 or so devs read it and change too and they start mentioning it then we have a fast chain reaction.

Just look at WhatsApp or even Microsoft or IBM: they seemed unstoppable but are very nuch just another alternative today.

125. ineedasername ◴[] No.27431154{7}[source]
Yep, not disagreeing. My point is that a short term pursuit of money over at least a reasonable quality of search will destroy what they have built very quickly if quality gets low enough to make it easy for an upstart rival to have obviously better search results. And the evidence for that is in the history of their own rise to search dominance.,
126. ColinHayhurst ◴[] No.27431203{4}[source]
Bing
127. friendzis ◴[] No.27431337{6}[source]
Again, this is not about data being collected, we do know how much data Google collects, it is all about what is being done with the data and by extension how good the end result is.

This touches the broader subject of systems engineering and especially validation. As far as I am aware, there are currently no tools/models for validation of machine learning models and the task gets exponentially harder with degrees of freedom given to the ML system. The more data Google collects and tries to use in ranking, the less bounded ranking task is and therefore less validatable, therefore more prone to errors.

Google is such a big player in search space that they can quantify/qualify behavior of their ranking system, publish that as SEO guidelines and have majority of good-faith actors behave in accordance, reinforcing the quality of the model - the more good-faith actors actively compete for the top spot, the more top results are of good-faith actors. However, as evidenced by the OP and other black hat SEO stories, the ranking system can be gamed and datums which should produce negative ranking score are either not weighted appropriately or in some cases contribute to positive score.

Google search results are notoriously plagued with Pinterest results, shop-looking sites which redirect to chinese marketplaces and similar. It looks like the only tool Google has to combat such actors is manual domain-based blacklisting, because, well, they would have done something systematic about it. It seems to me that the ranking algorithm at Google is given so many different inputs that it essentially lives its own life and changes are no longer proactive, but rather reactive, because Google does not have sufficient tools to monitor black hat SEO activity to punish sites accordingly.

128. tempestn ◴[] No.27431488{5}[source]
No, the keyword ignoring stems more from catering to the majority of people who don't know how to logically formulate a search for a search engine that expects every word to match. Most people will intuitively just try to ask the search engine a question (even if not literally phrased as such), and so Google has adapted to fill that need. Which even for those of us who would prefer something a bit more clear cut, is honestly handy a lot of the time.

I think using +plus +before +keywords still works for situations when you don't want any words ignored?

Certainly agree it seems like they could do a better job of burying auto-generated sites though. (Although I'm sure it's a difficult problem!)

129. raverbashing ◴[] No.27431653[source]
Great idea. Though I've noticed DDG promotes "blogspam" articles more often than the authoritative sources.

Let's say, if I search for a python builtin library, I want to go to the python website, not some "Python 101" blog post about it.

130. mschuster91 ◴[] No.27431739{5}[source]
TPB is not a good example since they're allowing everything except pedo content, thus drastically shrinking their moderation workload.

A site that wants to be compliant to the law in the major jurisdictions (US, EU) can't operate that way, not with NetzDG, copyright and other laws in play.

131. eythian ◴[] No.27432081{4}[source]
I find this helps: https://addons.mozilla.org/nl/firefox/addon/view-image/

It puts the "view image" button back.

132. quantumofalpha ◴[] No.27433262{5}[source]
They'll never tell how they use the data for obvious reasons and I also can't go into any details. But any obvious thing you can think of almost certainly has been tried, they've been doing it for 20+ years and ranking alone is staffed with several hundreds of smart engineers. Mining clickthrough logs is a fairly old topic itself, has been around since at least early 2000s.
133. zem ◴[] No.27433528[source]
pinterest.com would clean up another large chunk of crap
134. minikites ◴[] No.27434392{6}[source]
>The more that Google allows quality to decline, the faster they approach a non-recoverable tipping point. Just ask Yahoo how quickly that can happen.

Do you think we're in the same situation now as we were fully 20 years ago? I don't. Facebook killed MySpace, but Facebook is now too big to be disrupted, same with Google. The word "google" is a verb now. This is why the quality of their search results doesn't matter, people are too entrenched to switch now, which was not true in 2001.

replies(1): >>27435017 #
135. ineedasername ◴[] No.27435017{7}[source]
With respect to getting users to switch, Facebook and MySpace are much more complicated services in terms of user interactions and the need for network effects. It is literally a text box you type into, and it's usefulness does not directly depend on how many other people use it.

In that respect, not much has changed in 20 years. Switching your search bar is a very low friction activity, and if quality of results is too low then people will look elsewhere. There's only so many times someone will tolerate seeing the exact same copy/paste useless answers to questions as most of the first page of results.

-#-#-#-#-#-#-

In General:

The tech industry is filled with examples of companies that had an entrenched product end up failing very rapidly. I think Google probably understands this well enough to ensure search quality remains better than a scrappy under funded startup can accomplish, but then again Google achieved search dominance by coming up with a different way to determine results, relevancy, etc. There's no reason to believe that someone couldn't come up with something superior now either.

I think the most significant threat to that possibility is 1) FAANG companies buying up many of the most talented people. 2) If a competitor did come along, buying them up as well.

But it's also hard to predict the anti-trust future. Microsoft had an extremely long run as the most dominant web browser for longer than Chrome has held that crown, but they got knocked down very quickly. I doubt that would have happened as easily if not for their anti-trust issues. Of course it doesn't help that IE grew into a slow bloated mess, but in that respect, refer back to what I said about search quality: Microsoft was entrenched, if sliding, in the browser space even after its anti trust issues, but it let it's quality slip too much for users to accept. Given viable options, users switched.

That switch was truly remarkable due to the much higher friction. IE still cam bundled with Windows, Chrome did not. Every home computer with Chrome requires a user to ignore the option right in front of them and choose Chrome instead. Now just think about how much easier it is to use a different search engine.

I'm not saying Google is doomed, but 20 years of market dominance guarantees nothing. The "big 3" US automakers owned the market for longer than Google's founders have been alive, but those days are now just another cautionary tale of poor quality and unassailable arrogance.

136. wlesieutre ◴[] No.27436254{6}[source]
In my recent experience, Bing, Tineye, and Yandex are all better at finding image sources than Google Images. But who knows how long that will last.
137. robbrown451 ◴[] No.27438366{4}[source]
Yes but then they fixed it right.
replies(1): >>27441012 #
138. htrp ◴[] No.27441012{5}[source]
Fixing it right would be re-training the ML algo.... they basically told the algo to never ID anything as a gorilla (even actual gorillas)
139. drilldrive ◴[] No.27444791{3}[source]
I used to have an automatic google search-domain blocker. It was just front-end though so if a page would have website domains that were useless, it would only have 1 or 2 results on it unfortunately. Something a little better integrated would be nicer.