I do not want to comment on number 20. I really wished that I joined CERN 10 years earlier but then it is the mistake of my parents :)
[1] https://chromewebstore.google.com/detail/google-scholar-pdf-...
More on Chester and his co-author status: https://en.wikipedia.org/wiki/F._D._C._Willard
22. Switching on sort by date will impose a filter to papers published within the year, and you cannot do anything about that.
I literally get only 1-3 real spam mails per month without any filter.
!!! And here I thought it's been broken for years, and a sign of decay due to lack of internal support.
Btw, Anurag's last name is misspelt under the picture. It reads "Achurya" instead of "Acharya"
Edit: They fixed it
Maybe I should start using random words though? Wonder if someone will go bananas seeing their brand's name on my domain.
Actually, I am surprised _any_ spammy website these days would even honor the part after the +, and not just directly send to the real mailbox name.
Google spanks everyone else on robustness and responsiveness
Has he still been working on it in the 10 years since this article? His name is in the byline of the new blog post, but it's not clear from that how much he's been working on it.
---
'Two-, Three-, and Four-Atom Exchange Effects in bcc 3He' by J. H. Hetherington and F. D. C. Willard [0, 1, 2]
[0] https://xkeys.com/media/wysiwyg/smartwave/porto/category/abo...
Interestingly, it highlighted the words as it read. I haven't seen that before online. Not sure how useful it is (especially for anyone interested in this particular topic), but I thought it was a neat innovation nevertheless.
https://www.theverge.com/2023/11/27/23978591/google-drive-de...
they remembered google scholar exists
it's a great product and I don't trust google at all not to break it or mess with it
I am referring to robustness at scale and every day: Google released auto-save years before MS. MS pales in comparison in the UX.
Note: I have no vested interest in Google, not ex-googler, etc.
[1] https://support.google.com/drive/thread/245861992/drive-for-...
https://youtu.be/DZ2Bgwyx3nU?t=315
I recommend you watch the rest of the video, on the subject of open/closed and enclosure of infrastructure.
An important feature request would be a view where only peer-reviewed publications (specifically, not ArXiv and other pre-print archives) are included in the citation counts, and self-citations are also excluded.
A way to download all citation sources would also be a great nice-to-have.
In most universities here in New Zealand, articles have to be published in a journal indexed by Elsevier's Scopus. Not in a Scopus-indexed journal, it does not count anymore than a reddit comment. This gives Elsevier tremendous power. But in CS/ML/AI most academics and students turn to Google Scholar first when doing searches.
My guess for a while has been that it was back to two of them! if that!
Having pretty wide journal access through my institution means I don’t need to reach out to sci-hub.
Still, you'd think they'd do a cutoff of e.g. 500 or 1,000 items rather than filter by the past year.
So I can't help but wonder if it's a contractual limitation insisted on by publishers? Since the publishers also don't want all their papers being spidered via Scholar? It feels kind of like a limitation a lawyer came up with.
I had a domain for a while that people got spam "from" all the time. It had nothing to do with me and there was nothing I could do about it.
(the above is a joke comparing old school library work to search engines circa 2000; I didn't actually do all those steps. I'd usually just find the most recent review article and read the papers it cited).
Of the "too big to block outright" spam senders, behind Twilio Sendgrid and Weebly, Google is currently #3. Amazon is a close #4. None of the top four currently have useful abuse reporting mechanisms... Sendgrid used to be OK, but they no longer seem to take any action. Google doesn't even accept abuse reports, which is ironic because "does not accept or act upon abuse reports" is criteria for being blocked by Google.
Most spam from Google is fake invoices and 419 scams. This is trivially filtered on my end, which makes it perplexing Google doesn't choose to do so. I can guarantee that exactly 0% of Gmail users sending out renewal invoices for "N0rton Anti-Virus" are legitimate.
AFAICT Scholar remains because Anurag built up massive cred in the early years (he was a critically important search engineer) with Larry Page and kept his infra costs and headcount really small, while also taking advantage of search infra).
Gmail is unlikely to let spam through.
But that doesn't make its spam filter great; it's also very prone to blocking personal communication on the grounds that it must actually have been spam. The principle of gmail's spam filter is just "don't let anything through".
It would be much better to get more spam and also not have my actual communications disappear.
And a "malicious" actor can get away with pretending to be another company by spoofing the username if they know your domain works like that. I don't think this has reached spammers' repertoire yet, but I wouldn't be surprised.
Eventually I'd like to have a way of generating random email addresses that accept mail on demand, and put everything else in quaraintine automatically.
It can take 1-2 weeks to go away and be able to use it. There's no way to get in contact with anyone. Tried the Chrome extension email, support forums.
It's a good reality check. There's no real support behind it and it can go away just like Google Reader did.
I think the motivations behind it are laudable, but they should not be the answer to the actual problem.
One of my biggest gripes right now is that we heavily rely on Microsoft Teams. A lot of our work laptops still are stuck on 8gb of ram. I find Microsoft Teams can easily suck back a full gig or more or ram, especially when in a video call. From my understanding, Teams is running essentially like an Electron app (except using an Edge browser packaged).
I have no problem with web based apps, but man, some optimization is called for.
[0] https://en.wikipedia.org/wiki/Andre_Geim
[1] https://repository.ubn.ru.nl//bitstream/handle/2066/249681/2...
It advertises itself as "from all fields of science" -- does that includes fields like economics? Sociology? Political science? What about law journals? In other words, is the coverage as broad? And if it doesn't include certain fields, where is the "science" line drawn?
And I'm curious if people find it to be as useful (or more) just in terms of UX, features, etc.
It's crazy I can boot a kernel, with an entire graphics and network stack, X and a terminal in less than 200 MB but then the Teams webapp uses a massive amount of resources and grinds everything else to a halt.
Word 365 also becomes incredibly laggy on long documents with tons of comments, whereas Google Docs is just fine. But, apparently, this is also a thing on modern hardware. I guess these days Microsoft has little attention to detail.
It seems like Scholar has an overall upward trend, although their methodology notes make it hard to compare some periods directly:
https://trends.google.com/trends/explore?date=all&q=%2Fm%2F0...
I'm basically assuming this is the rate of growth of graduate school, and no competing products have had any real effect?
Honestly, if we compare Google to Amazon, Microsoft, Apple, and Meta, isn't Google the least evil one?
Another interesting thing is little popup form at the end of post asking me if my opinion of Google changed for the better after reading the post. I mean maybe a bit, b the form definitely knocked the score back down.
I think overall many companies have gotten lazy/sloppy when it comes to optimization. Game dev is even worse for this. I like how Microsoft products integrate with each other, but often the whole thing feels sloppy and unoptimized.
As for coverage, I think it focuses more on the life sciences, but I'm not positive about that.
The Google of today is far more boring and less helpful.
Now for the less great.
They are pushing the concept of "Highly Influential Citations" [1] as their default metric, which to the best of my knowledge is based on a singular workshop publication that produced a classifier trained on about 500 training samples to classify citations. I am a very harsh critic of any metrics for scientific impact. But this is just utter madness. Guaranteeing that this metric is not grossly misleading is nearly impossible and it feels like the only reason they picked it is because Etzioni (AI2 head) is the last author of the workshop paper. It should have been at best a novelty metric and certainly not the default one.
[1]: https://webflow.semanticscholar.org/faq/influential-citation...
Recently, they introduced their Semantic Reader functionality and are now pushing it as a default way to access PDFs on the website. Forcing you to click on a drop down to access plain PDFs. It may or may not be a great tool, but it feels somewhat obvious that they are attempting to use shady patterns to push you in the direction they want.
Lastly, they have started using Google Analytics. Which is not great, but I can understand why they go for the industry default.
Overall, I use them nearly daily and they are the best offering out there for my area of research. Although, I at times feel tempted to grab the data and create an alternative (simpler) frontend with fewer distractions and "modern" web nonsense.
Google has a decent job not turning fully into an Oracle for example.
Unironically the plot of MGS5 the Phantom Pain literally happened IRL. Skullface would be proud!
- Google Search
- YouTube (more debateable, but I think it's a marvel)
- Google Books
- ChromeBooks
- Android
- Google Calendar
- Google Earth
- Google Drive
- Google Docs
- Waze
- Android Auto
- Google Pay
- Kubernetes
- Go
- VP8 / VP9
I'd rather take all those products than leave them.
A simple way to make a step away from encouraging bibliometrics (which would be a step in the right direction) would be to list publications by date (most recent first) on authors pages rather than by citations count, or at least to let either users and/or authors choose the default sorting they want to use (when visiting a page for users, for their page by default for authors).
https://people.cs.rutgers.edu/~watrous/plus-signs-in-email-a...
My uni (Northampton) has access to a LOT of journals... but has a blindspot in management, specifically accountancy focus journals; am doing my lit review for my MSc dissertation and the number of times I hit a dead end is frustrating.
Sci-hub and Annas-Archive are also not interested in that segment, so double whammy.
But surprisingly Archive.org was able to help me out a bit, so thanks for that.
https://en.wikipedia.org/wiki/University_of_Oxford_v._Ramesh...
Bibliometrics, in use for over 150 years now, is not a game. That's like arguing there is no value in the PageRank algorithm, and no validity to trying to find out which journals or researchers or research teams publish better content using evidence to do so.
> which benefits the big publishers
Ignoring that it helps small researchers seems short sighted.
> A simple way to make a step ... would be to list publications by date
It's really that hard to click "year" and have that sorted?
It's almost a certainty when someone is looking for a scholar, they are looking for more highly cited work than not, so the default is probably the best use of reader times. I absolutely know when I look up an author, I am interested in what other work they did that is highly regarded more than any other factor. Once in a while I look to see what they did recently, which is exactly one click away.
The friction is tremendously higher than on-demand downloadable options: LibGen, SciHub, ZLibrary, Anna's Archive, or even sources such as ArXiv, SocArXiv, SSRN, which are far more fragmentary and limited.
I actually respect this style a lot. There is a firehose of papers coming onto google scholar each day. You type in some keyword you get 500 hits. This cut that down substantially for him in a way where he never missed anything big (reading nature and science), kept up with what the field has been doing (reading the more niche specific journals and keeping up with the labs who put out this niche work), and seeing what was coming up in the pipeline from the conferences or what sort of research new grants were requesting. I'm not sure that scholar would have helped much.
The only one I would take from your list would be Kubernetes and Google Earth, and Kubernetes being more of a dev tool would really count as far as impact and usefulness to society (Go would fit there).
Google Books _could_ have been great, but Google didn't take care of it. Same with Google Reader.
BUT... I'm not in formal academia, I care very little about publishing research myself (at least not from a bibliometric perspective. For me "publishing" might be writing a blog post or maybe submitting a pre-print somewhere) so I'm just not part of that whole (racket|game|whatever-you-want-to-call-it).
The problem is my researchgate account was connected to my academic account. It’s been a while since I graduated so I’ve lost access to my own publications and page.
But I used to use researchgate and requests in researchgate quite a bit.
But Google remains focused on popularity because that is optimal for advertising, where large audiences are the only ones that matter and there is this insidious competition for top ranking (no expectation that anyone would ever want to dig deep into search results). That sort of focus is not ideal for non-commercial research, IMHO.
It works great for its audience, likely better than any other product. Do you think your desire for rare outweighs the masses that don’t? If you want rare, why even use a tool designed for relevant? Go dig through the stacks at your favorite old library, bookstore, cellar, wherever.
I’d suspect if you were handed random low citation count articles you’d soon find they are not gems. They’re not cited for a reason.
Heck, want low citation count items? Go find a list of journal rankings (well crap, more rankings…) in the field you’re interested in, take the lowest rated ones, and go mine those crap journals for gems. Voila! Problem solved.
And I bet you find why they’re low ranked searching for gems in slop.
That said, I personally don't have any problem with Google Scholar since you can, as you say, trivially sort by date.
My conclusion is that any such system needs to be "complete" or almost complete to be useful. By system, I mean a service or some handcrafted system where I could track anything. In all fairness, Sci-Hub partially fits the bill here and it's a big plus to society.
But the point is Google Scholar is complete in the sense that with a high probability I will find any paper I'm looking for along with reliable metadata. That's great, but the fact that they go above and beyond to prevent sharing that data is IMO backwards, against all academic research principles and this should raise questions within the research communities that rely on it.
One of the nice things about Openstreetmap is that it doesn't do that weird behind the scenes manipulation.
This makes little sense to me. The citation count gives you an idea of what others are looking at and building upon. As far as I’ve seen, having a low citation count isn’t an uncommon phenomena, but having a high citation count is. In terms of information gained while triaging papers to read, a low citation count gives you almost no information.
To think that as an outsider to a field you are qualified to discover 'gems' (and between the lines here is a bit of an assumption that one is more qualified than researchers in the field, who are of course trying to discover 'gems') seems misguided.
But I am educated in my chosen field and I read the same books and journals and attend the same conferences, as the people you're referring to. The biggest difference is only in incentives and imposed constraints. I have a lot more freedom since I'm not operating within the "publish or perish" paradigm.
Assuming there’s some “incentives and imposed constraints” anywhere uniform to academics that you’re magically free from that lets you turn low cited papers into gems at a higher rate than all of academia combined is the most self delusional, simplistic, aggrandizing belief I’ve heard in a long time.