I hope someone figures out which other campaigns were run with these tools. Also, whether you can find output with the link injections in source code, like on GitHub or distro packages.
I hope someone figures out which other campaigns were run with these tools. Also, whether you can find output with the link injections in source code, like on GitHub or distro packages.
The network tab in devtools isn't loading Google Analytics on you site. I think the bigger conspiracy is that Google isn't giving high search result rankings to websites that don't include Google Analytics. Part of the reason is they use time on site after following through a search result link as a dimension of quality for that search result. If that makes sense? They give 10 search results and their algorithm can tell if the search result satisfies the end user's request if they don't go back to the search results but rather continue on that site.
Lastly, clicking through a search result to your site might not give the searching user what they are looking for. Amazon discovered every time a person has to click they are far less likely to purchase an item so they created one click. Your competition makes it visually clear what their site does. You probably would get far more retention on the original click to your site if you have an image of what the end product looks like in a hero, front and center (with all the meta tags described in Google's document on SEO of course.) That way people won't click back to the search results page which Google is tracking as a dimension.
[0] https://static.googleusercontent.com/media/www.google.dk/en/...
Doesn't say much for Google's ability to determine relevancy in linking or recognizing suspicious link growth. Or perhaps it just takes some time ...
I've been making websites for 24 years. Making a website has always been quite hard, especially for a nontechnical user, and there has always been scammers happy to take their money. What's worse is that a lot of the time the scammers believe they're actually selling a good service. There have always been people happy to chuck any old rubbish up on a domain and call it a website, even if it was full of scammy links, stuffed keywords the same color as the background or in tiny text, with JS that overwrote your browser history and blocked the back button, with no context menu, etc etc.
Its annoying, and sad, for those of us who care and consider ourselves professional. But it definitely wasn't any better years ago.
While I like the thought progression you're going through, this is a "not really." Google has confirmed a number of times over the past 15+ years (going back to the Matt Cutts era) and even in the document you linked that the meta description does nothing to influence ranking in the SERPs. However, the meta title does influence ranking.
I'm on mobile, so unable to dig in right now - but my guess is either this has something to do with the meta title, or the specific anchor text of the backlinks that are getting inserted via the app in question.
Aside from that, agree 100% with your other assessments.
In 2015 I was fired because some issues on a site that I was working on because some friction with the company owner. Two months before I was fired I reported that some links to others sites non related to our service was on the initial page (some porn and some scams pages). After that I heard from my ex-coworkers that a manager from another area from the company told that I was fired because I was linking porn on some pages from our service. I didn’t knew at the time that those tools existed, but only today I realized that it is an option.
I was really sad with that manager and didn’t understood the reason to lie to my friends the reason of my demission. But is nice to know what may have caused the issue. Better late than never hahaha.
Socialite: “My goodness, Mr. Churchill… Well, I suppose… we would have to discuss terms, of course…”
Churchill: “Would you sleep with me for five pounds?”
Socialite: “Mr. Churchill, what kind of woman do you think I am?!”
Churchill: “Madam, we’ve already established that. Now we are haggling about the price.”
http://weblog.raganwald.com/2007/07/haggling-about-price.htm...
Using google search console you can determine if a manual action has been applied to your own website: https://support.google.com/webmasters/answer/9044175?hl=en
Rather than determine the ranks, these actions remove / punish offending websites from the ranks, effectively making room for 'good' actors.
Manual actions often come after a a significant change in ranking algorithm or policy, and can be reverted / resolved in some cases. This usually requires removing or disavowing (in the case of unauthorized or unresponsive sites) the links pointing to a website.
Google has turned into a cesspool. Half the time I find myself having to do ridiculous search contortions to get somewhat useful results - appending site: .edu or .gov to search strings, searching by time periods to eliminate new "articles" that have been SEOed to the hilt, or taking out yelp and other chronic abusers that hijack local business results.
And, buying or otherwise, I am not sure what the mechanism is for bringing this to Googles attention.
I doubt there is another acquisition channel for a project like this that would compare to SEO (and not just Google).
Why is Shopify worth $150 billion? Well, other than the bubble, this effect is why. People can't easily build their own ecommerce sites, can't integrate everything they need to, in a way that doesn't cost them a small fortune.
Wix is a pretty mediocre service, clunky and slow. It's worth $15 billion? How in the world does that happen. Well, building sites is super difficult for most people. The opportunity to make that problem better is, apparently, huge.
*: and themes for other web applications, but mostly WordPress these days
being willing to make other things in order to have more money always creates cesspools.
This unknown exchange of value for “free” products and services is what everyone from Facebook and Google down to malware-like browser extensions do to extract difficult-to-acquire resources.
People don’t understand how their personal data, internet connection (residential proxy network node), or in this case, publicly displayed website are being monetized or used indirectly for monetization.
People don’t know or are tricked into allowing themselves or their resources to serve as an ugly cost externality to some other clean-looking business endeavor.
wow that's amazing, I guess I sort of quit reading blogs like this when all the RSS readers died.
All these same sites appear near the top of Bing searches too. There's nothing particularly Google-specific to this story. It's about SEO hacking that will work against anyone with a PageRank-style system.
It's literally never the original source for anything, but you can bet it's most of the first 10 pages of results. Then it doesn't even let you right click to open the image file, and dumps you to a login prompt if you click on anything. THAT'S NOT EVEN YOUR IMAGE STOP TELLING ME WHAT I CAN DO WITH IT.
Well, unfortunately this is basically how every freemium tool works. They have some way of advertising, in exchange for free use of the tool.
Even reputable CMS tools like WordPress include back links to wordpress on a new site and themes.
Although, this is much less common with open-source free tools, as the community resists these kinds of changes.
No such thing as a free lunch!
! Hide low-quality results on DuckDuckGo
duckduckgo.com##[data-domain="w3schools.com"]
duckduckgo.com##[data-domain$=".w3schools.com"]
duckduckgo.com##[data-domain="w3schools.in"]
duckduckgo.com##[data-domain$=".w3schools.in"]
duckduckgo.com##[data-domain="download.cnet.com"]
!! Stack Exchange mirrors
duckduckgo.com##[data-domain="exceptionshub.com"]
duckduckgo.com##[data-domain="intellipaat.com"]Yes, I clicked the link because it exactly referenced my issue. But it's not helpful to just see the same 5 tips copy pasted from elsewhere by an algorithm.
As others have pointed out and the author acknowledges, he is technically injecting links when his users embed their scoreboard on their website through an auto-included link-back to his site.
Now, I don't frown upon this. It is not deceptive and its placement is more than relevant.
The same cannot be said for the scheme the author uncovered. But whether it is violating Google's TOS is another question. I'm not sure of the answer.
I was briefly going to write "I'm surprised that DMOZ[1] still exists" but it says "Copyright 2017 AOL" at the bottom so maybe it doesn't.
Edit: ...and using the search box results in a 404 so I guess it's really dead huh.
Edit 2: Apparently this is the successor! https://curlie.org/en
[1]: https://dmoz-odp.org
Any notes on how to reproduce?
Could someone inject links into content in such a way that you cannot find the link in your own source or even your hosting stack?
But no, Pinterest has better SEO than the artist does, so it's just endless reposts upon reposts and never the original work.
Occasionally you get lucky and it's not the sort of image that Pinterest users share. Then you might actually find where it came from.
Of course there are scammers, that’s part of what makes organizing so hard.
Cynically, I think that Google is worse as filtering scammers is because they care less now. Half the page is ads so they make money either way.
He was a failed military commander pre-PM, even ridiculed after the failed landing at Gallipoli:
https://en.wikipedia.org/wiki/Gallipoli_campaign
Yet we most remember him as a fierce leader from the top, and he did so while under extreme stress due to U-boat attacks strangling shipping to Britain, and puzzlement from the German alien-level advanced technology (V-2 hypersonic ballistic missiles, long-range radio navigation, etc.)
(His scientific advisor told him the above technologies were disinformation, and couldn't be real.)
It's important to realize that while French and British public opinion became soft on the value of their culture and nationhood, Churchill knew better. Just like the pernicious impact of cultural Marxism today in the US, which must be fought as a war on every front.
Google just seems to give way too much weight to domain name matches with the search keyword.
It's not a bad idea to have a curated dataset of information. But clearly there are much better ways to navigate said information, which would include search, but also dynamic filters, predictive text, sorting algorithms, context awareness, etc. All of which... is built into modern search engines.
So perhaps what we really want is a Wikipedia/OpenStreetMaps of curated, indexed, semantic content/links, that anyone can consume and write their own search interface for. Basically, an open data warehouse of website information.
Halt and Catch Fire [1] (As a nerd, I can say it's one of the few TV series that got the hackers spirit correctly) had a few episodes about the Google disruption.
Like some people often say here, things come and go in circles...
[1]: https://en.wikipedia.org/wiki/Halt_and_Catch_Fire_(TV_series...
I am in the pre-release program. The hardest initial thing to get used to was not immediately scrolling down to the bottom to avoid all of the spam.
I suspect that their methods are not much different than Google, but the experience has been so much better.
That's a bit harsh but I agree that it is starting to fail to live up to the expectations I had with Google when it came out and destroyed Altavista in a spectacular shower of sparks.
Could I tender: "uBlacklist" as a stop gap, amongst others as we await Google being given a right old kicking?
Despite being a staunch Arch Linux user I have to deal with rather a lot of MS Windows related stuff. Being able to filter out that bloody awful Microsoft Social thing gets me closer to decent results. The majority of the next 10-100 results will be CnP clones of someone's blog but a human is able to get in reasonably quickly. I'm toying with blocking Stackoverflow and other cough slatwarts to see if results get better for me.
In my opinion: the www has hit a crossroads or perhaps a Spaghetti Junction or a Magic Roundabout for the last five years or so and continuing. However the exits are connected to the entrances on these road systems (take a look at them - they are real junctions. The MR is particularly terrifying but it works really well.)
I still won't use words like cesspool for this but I am increasingly losing my patience over the standard of results from Google. Those featured things (not the Ads - that's fine) at the top which add #blah_blah to the URL to colour search terms yellow is not working for me. The quality of the returns featured in a box are often rubbish too. It would be nice to be able to turn all that stuff off.
I understand that Google are trying to "be" the internet to try and keep the stock ticker pointing north but there seems to be a point when they have overreached themselves and I think that was passed several years ago. I also increasingly feel that Google thinks that it knows best and has removed many choices from their various UIs - that comes across as a bit arrogant.
Many years ago I left Altavista behind for Google. I will move again if I feel I have to. Of course that's not much in the grand scheme of things and I'll probably only take around 100,000 people with me but they have friends - still probably not a big deal.
Who will pay for its creation, maintenance and hosting? Who will judge ranking, disputes, hacks?
Who will have an eye on discrimination issues? Whose jurisdiction will be relevant (think GDPR or the Australian press "gag order" law in the case of that cleric accused of fondling kids)?
Who will take care that the humans who will get exposed to anything from generic violence over vore/gore to pedo content get access to counseling and be fairly paid? Facebook, the world's largest website, hasn't figured out that one ffs.
These questions are ... relatively easy to bypass with an automated engine (all issues can be explained away as "it was the algorithm" and IT-illiterate judges and politicians will accept this), but as soon as you have meaningful human interaction in the loop, you suddenly have humans that can be targeted by lawsuits, police measures and other abuse.
(I just checked and this copycat documentation site has, thankfully, now been pushed down a bit in DDG results.)
It's 2021 and surprisingly for all the billion dollar A.I. it can still be gamed with a bunch of unrelated links with little or no connection from the article to the site.
Also it's pretty unnatural and shady to get these backlinks. For my own SaaS site almost every blogger I contacted for a review just straight up asked me money in exchange for link. What the software did was of no consequence to this exchange. Most sites which have these "list of 10 XYZ" are just similar money making scams yet they rank so highly on Google.
P.S. And likewise I too get dozens of emails daily with "offers" from free article to actual dollar amounts just for putting a paid link. These SEO guys are just relentless because such shenanigans are working great at beating Google so far.
And then you get re-direct to some prize-winning spam site.
I love getting a search result that includes Google Books because those are usually useful. That’s what Google was best at, bringing in things that weren’t regular web pages.
Everybody gave up trusting webrings because Google provided better results. Now that Google results are shit, there’s room for other information vendors to come along, even if it’s in narrow areas.
Actually, HN is already this for me in some respects.
If you're decided on googling for a suggestion of a tool, at least include "open source". Even if you're searching for proprietary tools, you'll probably find the traditional "it has better X, Y, compared to proprietary tool W" review.
Let’s face it... the early internet was interesting because the only people who could use it (and publish on it) were smart eccentrics. That was its charm. The technological hurdle served as the curator: you might have been a crazy white supremacist, anarchist, conspiracy theorist, or ‘expert’ in how to grow radishes or some other bizarrely eclectic field... but all of them were necessarily a bit smarter than the average bear just by virtue of knowing how to host content and access it; not a trivial task in the late 90’s.
Maybe it’s time to think up some convoluted alternate network that is a royal pain-in-the-ass to use. Perhaps there the eclectic and useful content creators will once again arise (and searching their trove will be a snap as most everything there will be fresh, unique, and interesting.) It will exist, I suppose, for a few years before tools are made to enable grandma to easily use it.
more importantly, what ive also learned is that Bing search results are less of an affiliate link cesspool because fewer SEO spammers are working at gaming Bing's results.
Or paid the entity running the malware HTML editor. It's probably injecting links to a variety of sites who paid them for placement.
https://www.researchsquare.com/article/rs-8615/v1
(It's on page 24, at the bottom of the References section.)
Google (and others) keep up the narrative that they're important so that black and grey hat SEO folks keep focusing effort in the wrong places.
Source: ran the web spam detection team on a different well known search engine
You can't pretend this isn't funny as fuck lol.
Wow, embarrassing for Kaspersky as a computer security focused site to be a victim of this.
When I searched for "Rubiks" as it said to do, I couldn't find it though. Has the Kaspersky post been changed?
I think we have loads of tools to play with but fundamentally there is a problem when you are fighting with your search engine to find stuff you want to find.
My laptop (Arch) still has Chromium as default with uBlock Origin, Privacy Badger, uBlacklist and a few others running. I will be moving back to FF and running a sync server because I am that pissed off and able to do so. I'll also take a few others with me (between 2 and rather more)
When I say move back to FF, I'm talking about something like reverting a 10-15 years change.
I've always had FF available but it fell short back in the day for long enough for me to move to the Goggle thing. Now I think I'll go back.
Noone at G will lament their loss, I'm not even a rounding error. I'm sure that all is fine there.
> not the Ads - that's fine
In my strongly held opinion, push advertising is not fine and it's the root cause of all the problems you are discussing. We will only exit this mess that the web has become when everyone blocks push advertising by default. People should only see advertising when they are interested in being advertised to, e.g. sites you consciously choose to go to that advertise products & services, like the old Yellow Pages phonebooks.
Anyway so how would you explain the rankings of sites in this article? I thought all that was going for these guys were just the insane amount of links pointing to their site.
In theory, you could have a curated directory whose hosting works like ThePirateBay, and whose maintainership is entirely anonymous authors operating over Tor (even though the directory itself holds nothing the average person would find all that objectionable.)
Of course, there's no business model in that...
It's a little unsettling.
Not sure how we should react :/
I bet it's that we do different types of searches.
I was just talking to my SO about this the other day when we were trying to find an air purifier for allergies. I'm the kind of person that likes to compare products a ton before dropping more than about ~$100 on anything. The way the internet has become in the last 10-15 years has made this increasingly more difficult. You really have to dig to find in-depth unbiased content on anything someone stands to make money from. For every 1 good review there are 100 'top 10 best ranked' blogspam sites..
additionally ads were firmly separated into a colored box away from actual results
but yea "narrative"
- best mattress 2021
- best mattress 2021 consumer reports
- best mattress 2021 reddit
- best mattress 2021 for back pain
- best mattress 2021 wirecutter
etc.
But of course Reddit is already rife with shills. Not sure about CR.
This is the same problem with something like WoW classic... you can get the game that existed 15 years ago. But even if it is the exact same game, the world itself isn't. Online walkthroughs, videos, modding knowledge, theory crafting, etc. Those things are much more fleshed out today so even if the system didn't change 1 bit, WoW Original vs WoW Classic are really two separate games.
Likewise... if you dropped Google Original down today? I'd love to see how fast it would get owned by these sorts of operations that have had a decade+ of practice in skills like CEO that didn't exist in 2010.
You had more relevant results? That wouldn't change because companies live and die off of SEO now and didn't then. Highlighted ads are such a small thing on the website when compared to getting a full front page of the same Stack Overflow answers in 20 different websites that all have SO cloned and reskinned.
Why? Google makes money from advertisements either way, it's not in their interest to improve search results. If anything, terrible search results make users more likely to click on ads, which now look better by comparison.
I wonder how much of modern search crappiness is because much of the good content that used to be in small blogs is now locked away behind facebook’s logins.
If they're not owned by the same entity, then this blog post is rather odd: https://html-online.com/articles/scoreboard/
(To be fair, that entire blog seems odd...)
Honestly, I don't believe for a minute they "can't fix it." They do this sort of thing all the time, for instance when ML shows dark skinned people for a search for gorilla, they obviously have recourse.
We’re moving to the vision of information services that were pioneered by AOL, Prodigy, etc. Honestly, we’re there already.
https://www.theverge.com/2018/1/12/16882408/google-racist-go...
They may not give the site a manual action, though. They’d rather tweak the algorithm so it naturally doesn’t rank. Google’s algo should be able to see stuff like this.
I know that I’ve seen sites tank in the rankings because they got too many links too quickly. It could be that the link part of the algorithm hasn’t fully analyzed the links yet.
I’d be interested in seeing what the Majestic link graph says about this site, ahrefs doesn’t have tier 2 and tier 3 link data.
Users may be entrenched in other Google products-- Gmail, gcal, docs, etc-- but not search. Someone using all those other Google products could change their default search engine and have zero impact on the rest of their digital life.
I'm shopping around for a preferred alternative right now, I just haven't settled yet.
r/HailCorporate used to be about calling out stealth marketing/advertising but it's morphed into just discussing how things can inadvertently act as an advertisement aka society is full of branding and consumerism. It's a shame because it used to be a very high quality sub.
Once I discovered that everything I would ever need was better explained on the MDN my life as a webdeveloper strongly improved.
Makes it really difficult to find old pages about something that recently exploded in popularity, because the age filter just doesn't work.
The only time something will change is when traffic starts decreasing to their site, but it's good enough such that people won't change. Look at Facebook, I don't know anyone who uses it as much as they used to 10 years ago, but it's making the most money it ever has. Why on earth would any behavior change? From their points of view, everyone is happy with it!
>Google issues a manual action against a site when a human reviewer at Google has determined that pages on the site are not compliant with Google's webmaster quality guidelines. Most manual actions address attempts to manipulate our search index. Most issues reported here will result in pages or sites being ranked lower or omitted from search results without any visual indication to the user.
2. There is a qualitative difference between life-changing money and day-job money.
3. "I won't risk my life any amount" is a dumb ideal, because everything has a risk of death.
Yes, they can. They should simply stop measuring only positives, and start measuring negatives - e.g. people that press the back button of their browser, or click the second, third, fourth result afterwards...which should hint the ML classifiers that the first result was total crap in the first place.
But I guess this is exactly what happens if you have a business model where leads to sites where you provide ads give you a weird ethics, as your company profits from those scammers more than from legit websites.
From an ML point of view google's search results are the perfect example of overfitting. Kinda ironic that they lead the data science research field and don't realize this in their own product, but teach this flaw everywhere.
> But, as a new report from Wired shows, nearly three years on and Google hasn’t really fixed anything. The company has simply blocked its image recognition algorithms from identifying gorillas altogether — preferring, presumably, to limit the service rather than risk another miscategorization.
Is that not an example of human intervention in ML?
Those fake shops are part of discussions in politics right now. Usually they're registered in Ireland or Malta as companies due to their specific banking laws. They make millions with those scams and people can't differ between legit online shops and fake ones - because the legit ones actually look crappier than the fake ones when it comes to the website designs.
In Germany, we have at least for hardware the "geizhals" website which is kind of an index for all kinds of electronics shops and they try to verify as much as possible.
But for other online shop sectors (e.g. clothing or home stuff) I wouldn't trust anything. Even on Amazon I got scammed a lot and heard absurd things from others...like getting packages with no content in them and Amazon refusing to see that the seller is a scammer etc.
One of the things that killed them imho is when google started penalizing sites that linked to some other sites.
This was compounded by the expired-domain market..
wordpress even took out linkrolls around that time, people that had them in sidebar widgets would have them disappear unless they installed a new plugin to bring them back.
Webrings that auto-add the "nofollow tag" I guess could make them okay for people again.
Might be cool to have a github type page with a list of rings to reccomend.. a script auto-pulls it into your page, adding nofollow - and then other people could copy your list or clone/fork..
https://en.wikipedia.org/wiki/Accelerated_Mobile_Pages
* forgive my RAS syndrome
It's hard for me to pick a sweet spot for the internet in many ways I feel like I've grown up with it.
I can remember the web of circa 1995 to 1997 with Gif's that wouldn't render properly in internet explorer, HTML marquee scrolling text and the dreaded blink tag being used everywhere. You needed to play search engine bingo with Altavista, Metacrawler, Yahoo, Infoseek, Lycos etc etc. And it was a crap shoot if search engines would give you useful results.
I can remember the web of 1998 to 2000 where every web developer seemed to discover html frames at the same time. We had good search with Google but pop up ads were so rife that the internet was borderline unusable. I can remember all the free webmail sites like hotmail, yahoo etc. ICQ chat was massive (whatever happened to that - it was a staple of my teen internet).
In Early 2000's Firefox came along and saved the internet by virtue of its built in popup blocking. But there was a mishmap of "Applets" and "Plugins" everywhere Flash Player, Java Applets, Real Player etc. Video (and audio) on the web was terrible half the time it would complain about missing codecs, it would buffer forever and if something did load it would be the size of a postage stamp and look pixelated as all hell. I remember Gmail came out and everyone went gaga over it's interface.
Last period that real stands out is the mid to late 00's with development of big Social Media sites, Facebook, Twitter, Youtube etc. The web got more and more javascript heavy. Web video streaming finally became useable. Google Chrome came out and flash player finally died despite Microsoft trying to revive it with Silverlight.
I kind of feel like this last 10 years are a continuation with increased surveillance and tracking.
Take a look sometime at the wealth of data google serp sends back about your interactions with it
Technically just open google serp in developer tools, network tab, set preserve/persist logs option, and watch the requests flowing back - all your clicks and back navigations are reported back for analysis. Same on other search engines. Only DDG doesn't collect your clicks/dwell time - but that's a distinguishing feature of their brand, they stripped themselves of this valuable data on purpose.
They ought to see humongous bounce rates with those fake SEOd pages. Normally, that would suggest shit tier quality and black-hat SEO, which is in theory punishable. Yet, they throw that data away and still rank those sites higher up.
You mean to say that no one at Google has even heard of "external SEO", which is nothing more than fancy way of saying link farming? They do know, this is punishable according to their own rules, yet it works, because either they cannot fix it or do not care to.
If I search on mobile, often all my results are these content farms. (Google used in English from Germany)
It is sad, bit nowadays I often just directly jump onto page 3 at Google or use other "tricks" to get okayish results.
SEO used to be extremely gameable (seniority of site, keyword stuffing, backlinks), but these levers aren't as obvious now, if at all.
I only ask because when I click on these links, I get a while bunch of legitimate text, but noting actually useful. Am I missing something?
Nowadays, unnatural links are mostly ignored.
But it still doesn't defend not blocking sites that doesn't contain anything except autogenerated content.
And it still doesn't defend ignoring my keywords.
But in the meantime, yep... It sucks.
The format goes like this: Lately people are searching for XYZ but is it safe to search for XYZ? What experts say for XYZ? To find out continue to read our article.
Then it's followed by wall of text made of keywords(in sentences that don't make sense), if you are lucky there would be the opening hours(which are often not accurate) somewhere down the text.
But that doesn't stop there. Even actual news articles are written for the consumption of the Google bot, the sentences often don't make sence, they are repeated multiple times with the synonyms of one of the words, making it into a lengthy article that doesn't have any meat beyond the title.
I argue that the problem is not SEO experts with low ethics, the problem is the way the business is structured. SEO experts don't do it for the sake of the art but because they are paid to do it. They are paid to do it because it has a positive ROI on bringing eyeballs and people pay Google for eyeballs, then Google pays those who generate the eyeballs.
Isn't it better for Google and everyone involved if you can't find what you are looking for, continuing your search brings more eyeballs? It's not like you are going to switch to Bing? You are also not going to abandon the internet and go to a library.
I'm already here :-)
If 5 or so devs read it and change too and they start mentioning it then we have a fast chain reaction.
Just look at WhatsApp or even Microsoft or IBM: they seemed unstoppable but are very nuch just another alternative today.
I've noticed a rise of that as well. With some searches such spam is all I've received. But that's really a problem in all languages Google supports I think.
There's even malware that infects websites and generates such content, not sure what's the point of that. Anyone knows?
One day Google may introduce multiple search rankings, where one of them is SEO and another is the "useful things". But I don't hold my breath.
This touches the broader subject of systems engineering and especially validation. As far as I am aware, there are currently no tools/models for validation of machine learning models and the task gets exponentially harder with degrees of freedom given to the ML system. The more data Google collects and tries to use in ranking, the less bounded ranking task is and therefore less validatable, therefore more prone to errors.
Google is such a big player in search space that they can quantify/qualify behavior of their ranking system, publish that as SEO guidelines and have majority of good-faith actors behave in accordance, reinforcing the quality of the model - the more good-faith actors actively compete for the top spot, the more top results are of good-faith actors. However, as evidenced by the OP and other black hat SEO stories, the ranking system can be gamed and datums which should produce negative ranking score are either not weighted appropriately or in some cases contribute to positive score.
Google search results are notoriously plagued with Pinterest results, shop-looking sites which redirect to chinese marketplaces and similar. It looks like the only tool Google has to combat such actors is manual domain-based blacklisting, because, well, they would have done something systematic about it. It seems to me that the ranking algorithm at Google is given so many different inputs that it essentially lives its own life and changes are no longer proactive, but rather reactive, because Google does not have sufficient tools to monitor black hat SEO activity to punish sites accordingly.
From personal experience, I switched to another tool (DDG) a couple of years ago. When I occasionally try Google, for 95% of common requests I'm appalled by the results: the top is only SEO garbage. For very specific and precise searches (where people are not trying to game the system), Google is still the best, though.
We had a compititor who spams his page full with SEO garbage Words, our Software is used 100 times more than his software, more people search for our software, click it and use it, link it, but who is on 1st place in search results? Right, the SEO spammer, with the slower page, full of shiny SEO words that has nothing todo with the software.
@google i wait for working AI that detects such garbage sites!
Surely some kind of fairly trivial NN/Not very deep learning system can classify HTML content so that out of context links (like "Learn how to solve a Rubic Cube" in a Seventh Day Adventists sabbath lesson) and content that is copied is ignored or marked down.
Whilst I'm sure GPT-3 could be used to create more realistic looking fake content - this would eliminate 99% of the script kiddies creating low value SEO spamming sites.
Good thing /etc/hosts has no size limit.
Google's old link-based authority algorithm, pagerank, isn't alaysing the same web anymore. I think there's barely any signal in links these days.
The first major event was Google itself. Once you use something as a metric, it becomes currency. SEO vs anti-spam became a defining cat and mouse game. This kind of stuff was born then, and antispam was meant to curb it.
The second major event was user generated content. The old link pages and blogrolls die slowly. Comments, twitter, and such become the way links are shared. High signal, but extremely spam prone. Google tapped out of this early, and mostly ignore user generated content.
The third major event is facebook, and facebook like ways of doing things. This made most regular people's content unindexable. Search for esoteric keywords used to return a lot of forum results. Still does, to an extent. The thread is usually years, or decades old. What's left on the open web is a subset, a non random subset.
Wikipedia is one of the last sites that does "hypertext" the way pagerank assumes the web works.
In any case, I feel like search (or what search used to be) is in decline. There isn't as much web to search anymore, in a sense. The broad brush way of doing antispam (eg user generated content is just ignored) makes more sense. Why deal with all that noise/spam, just to search what's left of the old web.
What's left? User behaviour, a la analytics. That's makes for more feedback loops and winner takes most dynamics. Localisation became localisation to your bubble. Meanwhile "officialness" measures aren't against google's ethic/aesthetic anymore. They got burned by the "fake news^" crisis, and the quick fix was officialness. In for a penny. In for a pound.
Meanwhile, web search is increasingly just another thing that google search does. It searches "your" data, content of your devices, search history and NN generated whatnot. It searches news, ads, returns answers to questions, does math... There's nothing new about seo scams, antispam just isn't Google's primary solution anymore. Just default to other ways of returning results.
I'm calling it. Web search is dead. Long live the new websearch.
^Circa 2015 usage, not the current
I think using +plus +before +keywords still works for situations when you don't want any words ignored?
Certainly agree it seems like they could do a better job of burying auto-generated sites though. (Although I'm sure it's a difficult problem!)
Now, when the website needs to not only contain content, but also be its own advertisement, writing it in a way that will maximize virality is the natural course of action to make sure the site actually gets seen.
This will likely be true until a method of finding webpages that is not based on automated scraping or the page itself.
IIRC with PageRank there were very specific values associated with 'toolbar PageRank', e.g. a PR7 link could be sold for $1K a month. Understandable because at that time there was no context to PageRank at all, it was simply about being linked to by an "authority". This was 20 years ago though.
Great.
Nobody cares about the content apparently. Nobody checks if the generated HTML makes sense. It's all about spinning the wheel.
Sigh.
The "fiddle with H1" or "write X amount of words" or "buy Y number of links with a % of anchor text" is silly.
Semantic HTML has been created to help screen readers and browsers understand content organization, it having been hijacked by SE is just a side-effect.
Let's say, if I search for a python builtin library, I want to go to the python website, not some "Python 101" blog post about it.
e.g. Google always had problems indexing Flash websites. It historically had issues with sites heavily relying on Javascript. Nowadays it's less of a problem, at least for Googlebot.
A site that wants to be compliant to the law in the major jurisdictions (US, EU) can't operate that way, not with NetzDG, copyright and other laws in play.
2. Is one’s sense of right and wrong for all things driven by the amount of money involved? Are there some things money can’t buy?
3. Would you elaborate on this assertion? I don’t quite understand.
What about trust-based systems. You choose who you trust and get information that they found not to be SEO-garbage, like trust-rings. When the system can't do it alone, user-centric feedback may work. That could give interesting inputs besides the ones Google already gets using its standard metrics.
I found uBlacklist from this thread, and the subscription functionality enables some collaborative effort.
So I've started making a list, but unfortunately there aren't many uBlacklist subscription lists out there yet.
Be interested to see how far this could go: https://github.com/rjaus/awesome-ublacklist/
Sleeping with someone isn't exactly in the realm of moral repugnancy. It's not about right and wrong, the implication is that she is a whore. But she's not, because she wouldn't take money for sex as her job.
> Would you elaborate on this assertion? I don’t quite understand.
The article you linked is claiming that "the very real possibility of being killed" should disqualify the job, no matter what the pay. But that's an extremely myopic way of evaluating risk. Would he refuse on principle to commute an extra 15 minutes, even if it adds up to the same risk after 20 years? It doesn't look like it. But to take that risk all at once in exchange for 20 years of pay or even 50 years of pay means you've been "corrupted".
He's decided that some risks to yourself are fine and some risks to yourself are unacceptable based on arbitrary measures and not what actually keeps you safest.
Overly hard stances for risk mitigation lead to a lot of really bad conclusions and contradictions.
I changed the default search engine from Google to Bing and DDG in all browsers. Google does have better results, so sometimes I still need to use them. But for 90% of generic queries such as the weather, product information, or finding a company's website, Bing is good enough.
Here's an example of one https://html-cleaner.com/
The result was astonishing: In the first page most results were similar, except for the order. Specifically a first result in Google was only second in the first page in the company's search engine. But in overall the difference was mostly in the presentation, not in the results.
There was something Spartan in Google's page UI that made it more credible and informative. At the time for most people including academics, they were the good guys and us (Telcos) the bad boys.
I guess academics advices were very influential on young adults who will shape the world the next years.
I guess also the erratic management by France Telecom was for something in the demise of Voila.fr
It puts the "view image" button back.
I suspect this is actually one of those fundamentally hard problems.
1. Old domain names bought solely for their old SEO rank.
2. Apps on mobile app stores are sold, and updates begin to include shady privacy-invading malware.
3. Old free software projects on various registries (npm etc.) are sold, with the same result as (2).
The problem is that people will always try to game the system :/
Recipes would ultimately be a list of ingredients, concise instructions and maybe a picture or two. It should be trivial to train a classifier to detect SEO spam in this context.
I think Google doesn't really have an incentive to do this, as SEO spam typically includes ads which can contain Google ads or analytics/Google Tag Manager which helps Google, thus prioritizing better results would work against their bottom line.
Otherwise, it seems really like a cat and mouse game. Another option may be to force SEO to be indistinguishable from the best content. Is that the current goal?
Edit: Wow, this is much bigger than just those two sites. Looks like half the internet is down. https://downdetector.com/
[1] https://searchengineland.com/googles-cutts-we-dont-ban-sites...
I am not sure what you'd use this tool for possibly for scraping work but beautiful soup is probably better for that
Entertainment/news sites are chock full of pages like "<whatever>, what we know so far, release date, cast, will it be renewed, has it been cancelled..." pages that spend many paragraphs saying "we know nothing, randomly plucking crap out of thin air we could guess something-or-other but that remains to be confirmed". A new news story, film, show, or even just a hint of something, and the pages go up to try capture early clicks. Irritatingly they are often not updated quickly when real information becomes available or that information changes (particularly over the last year that has affected release dates). I have several sites DNS blocked because that annoys me less than getting one of these useless/out-of-date pages more often than not when I follow one of their links.
I mean it's pretty reasonable, if a site has been around a long time it's going to be generally 'good'.
It's the same thing as the tweaks you have to perform for SEO optimisation, some have questionable value to the end user but you jump through the hoops anyway because it's what is done, by pleasing the robots you're rewarded with a higher search position.
A year later Google's John Mueller, a trends analyst who often also acts as a liasion between Google and the webmaster community, stated that Google might automatically apply a 'nofollow' attribute to these types of links, effectively killing their ability to siphon SEO link value to improve themselves: https://www.seroundtable.com/google-auto-nofollow-widget-lin...
We have noted in our agency research for clients several similar usages over the past few years that appear to be giving websites positive value instead of either being ignored or penalized, including a WordPress plugin that injects links on government and collegiate websites. The way Google assigns value based on links has changed quite a bit over the past 5 years and there is a chance they no longer penalize for widget links (unlikely) OR that their ability to detect them has degraded significantly (my guess is the later).
One thing is for certain, Google absolutely retains the ability to manually devalue links and penalize a website for violating their guidelines. They do not enjoy negative press or communinity discussions on search quality like this one and in the past have taken swift action when such issues arised in the media.
At our agency we advise clients against this type of link building as it has no long-term value for a brand and could cause long-term pain instead. SEO should be used to help new brands gain a competitive advantage against more established incumbents such as a startup taking on Amazon or a new SaaS tool providing valuable data to an industry.
The only way would be to keep finding links like Wiki Game and hoping to get closer to the intended target. Luckily there are huge robots who have done this for you and can tell you which links lead to your destination.
By the way, I develop proprietary software. Hope that someone reads at Google and stop indexing all those pirate websites where people steal from others. Not torrents, talking about those websites where they even sell you paid access to stolen stuff.
Serously Google? You can't filter "nulled"?
BTW, news websites in question are not doing it only for opening times but for any popular search phrase they can come up. Would be such a shame if outlets like BBC, WSJ and others adopted that kind of SEO.
I also wrote a tutorial on how you can build an infecting proxy too [2]. Doesn't work anymore though since HTTPS is everywhere. Thank god
[1] https://blog.haschek.at/2015-analyzing-443-free-proxies [2] https://blog.haschek.at/2013/05/why-free-proxies-are-free-js...
Maybe it's just because I'm searching for technical stuff but DDG and Google are both a big source of frustration for me,
DDG thinks I mistype most of my queries and will desperately try to correct my 'mistake' because "surely nobody is really searching for documentation about ARM32 bootloaders, they just mistyped when they were really trying to look for a webshop that sells 32 different ARMchairs and ARMy boots.".
Google will understand my input at least half of the time but uses that power to show me the power of websites that do some article/keyword scraping and run GPT on it, or this great new Medium blogpost with two paragraphs of someone copying a Wikipedia summary of what ARM is and copy pasting build instructions from a GitHub README.
I've tried searching github.com itself but that's just a nice way to find out that apparently most of the data they store is just scraped websites, input for ML models or dictionaries and they will happily show me all 9K forks of the one repo that contains the highest density of these keywords.
/rant
Do you think we're in the same situation now as we were fully 20 years ago? I don't. Facebook killed MySpace, but Facebook is now too big to be disrupted, same with Google. The word "google" is a verb now. This is why the quality of their search results doesn't matter, people are too entrenched to switch now, which was not true in 2001.
In that respect, not much has changed in 20 years. Switching your search bar is a very low friction activity, and if quality of results is too low then people will look elsewhere. There's only so many times someone will tolerate seeing the exact same copy/paste useless answers to questions as most of the first page of results.
-#-#-#-#-#-#-
In General:
The tech industry is filled with examples of companies that had an entrenched product end up failing very rapidly. I think Google probably understands this well enough to ensure search quality remains better than a scrappy under funded startup can accomplish, but then again Google achieved search dominance by coming up with a different way to determine results, relevancy, etc. There's no reason to believe that someone couldn't come up with something superior now either.
I think the most significant threat to that possibility is 1) FAANG companies buying up many of the most talented people. 2) If a competitor did come along, buying them up as well.
But it's also hard to predict the anti-trust future. Microsoft had an extremely long run as the most dominant web browser for longer than Chrome has held that crown, but they got knocked down very quickly. I doubt that would have happened as easily if not for their anti-trust issues. Of course it doesn't help that IE grew into a slow bloated mess, but in that respect, refer back to what I said about search quality: Microsoft was entrenched, if sliding, in the browser space even after its anti trust issues, but it let it's quality slip too much for users to accept. Given viable options, users switched.
That switch was truly remarkable due to the much higher friction. IE still cam bundled with Windows, Chrome did not. Every home computer with Chrome requires a user to ignore the option right in front of them and choose Chrome instead. Now just think about how much easier it is to use a different search engine.
I'm not saying Google is doomed, but 20 years of market dominance guarantees nothing. The "big 3" US automakers owned the market for longer than Google's founders have been alive, but those days are now just another cautionary tale of poor quality and unassailable arrogance.
The best is minus operands acting more like plus or quotes.
A decade from now, Google will have made no improvement.
Most people don't click on ads, so getting visitors to your site from organic search terms is more likely to convert them into returning users.
So, if Google altered their algorithm such that "recipe" content had to be shorter-form in order to perform better in SERPs, how would this change anything? The sites that profit from search traffic would be the ones with their fingers on the pulse of the algorithm, and the resources to instantly alter their content in order to ensure that they continued to rank for the terms that were driving traffic.
The result is that ACTUALLY USEFUL articles are buried on page 5. Any slightly helpful bit of content in the top articles are repeated (using different grammar of course) in all the other "top" articles.
This effect isn't limited to web searches, either. Social media is way worse - at least Google pays you in presumably useful web traffic. Facebook and Twitter want to trap you on platform as long as possible. Even platforms like YouTube which pay their creators have this problem. So does Amazon, which encourages dropshipping cost-optimized products from China under weird, fly-by-night brand names. Their business model is to outsource the financial risk of creating new works to someone else so they can get "content" (or in the case of Amazon, actual products) for cheaper.
In the olden days, a publisher was a corporation that took on the financial burden and legal risk of publishing your work; with the caveat that only a limited number of things would be published. Thanks to a number of 90s era liability limitations, online service providers were given broad leeway on pretty much everything a traditional publisher would need to worry about: defamation, product liability, copyright infringement, and so on. This flipped the publisher model on it's head, creating the "platform model": one where you publish everything with no up-front cost or prior restraint, monopolize your creators' audiences, and make your money by taking cuts of whatever revenue streams your creators happen to establish after-the-fact.
Publishers had financial incentives to make their creative works more valuable. Platforms do just the opposite: their financial incentive is to devalue content. How do they do this? First off, they call it "content", as a generic catch-all term for anything their users publish. Second, they have no quality control mechanism, allowing literally anyone to submit content and have it promoted by their platform. Third, they run their platforms off of algorithms that use user-submitted feedback (reviews, upvotes, and so on) to judge group tastes in lieu of actually having taste. And finally, sometimes they'll just outright take money away from their creators in favor of their own stuff.
The reason why people were even putting high-value content on the web for free was because nobody knew how any of this would play out. Advertisers were paying far too much for banner ads, so it made perfect sense to just put all your content online, make sure people could see it, and get a lot of money. You used to be able to run a whole YouTube channel purely off of AdSense revenue! That's all gone away, now. Advertising networks pay out a lot less than they did even a decade ago, and at least in the case of Google, are also competing against their own creators for ad space to sell.
(This also implies that we will never actually go back to "the web as it used to be" until everyone alive has died and we can repeat the mistakes of the past. Hell, if you ask the copyright maximalist nutters, we've already repeated the mistakes of the past - publishers of centuries past acted a lot more like Internet platforms do today than modern publishers did pre-Internet.)
Developers paste their data to online websites too frequently these days.
https://html-online.com/editor/
In case you cannot view it the banner across the site says now "Goodbye!
This site has been penalized for unnatural link building and will be removed from Google Search
Please bookmark if you wish to continue use of the site.
We are sorry and are working on fixing the problem to recover from the penalty. "
They are only sorry they got caught
It would need an option to ignore any form of news media in search results.