20 years of Google Scholar

1. random3 ◴[18 Nov 24 22:08 UTC] No.42177658[source]▶

Fun fact about Google Scholar: it’s "free", but it’s just another soulless Google product - no clear strategy, no support, and a fragile proprietary dependency in what should be an open ecosystem. This creates inherent risks for the academic community. We need the equivalent of arXiv for Google Scholar

replies(8): >>42177738 #>>42178221 #>>42178675 #>>42179796 #>>42180759 #>>42181058 #>>42181064 #>>42183137 #

2. afandian ◴[18 Nov 24 22:20 UTC] No.42177738[source]▶

>>42177658 (TP) #

The Invest in Open site has a good directory of open tools.

https://infrafinder.investinopen.org/solutions

3. kergonath ◴[18 Nov 24 23:14 UTC] No.42178221[source]▶

>>42177658 (TP) #

Yes. On one hand I’d like Google to improve things a bit. There are some rough edges, which is a shame because it indexes some things that are not in Scopus or Web of Knowledge, like theses and preprint repositories. On the other hand I worry that some manager somewhere would kill it if they realised that it is still around.

replies(2): >>42178417 #>>42178860 #

4. random3 ◴[18 Nov 24 23:32 UTC] No.42178417[source]▶

>>42178221 #

Every 1-2 months when Chrome updates I get banned by their throttling mechanism because I their extension makes too many requests and they see "unusual traffic"

It can take 1-2 weeks to go away and be able to use it. There's no way to get in contact with anyone. Tried the Chrome extension email, support forums.

It's a good reality check. There's no real support behind it and it can go away just like Google Reader did.

I think the motivations behind it are laudable, but they should not be the answer to the actual problem.

replies(1): >>42191443 #

5. sitkack ◴[19 Nov 24 00:04 UTC] No.42178675[source]▶

>>42177658 (TP) #

And that is semantic scholar, https://www.semanticscholar.org/

replies(4): >>42178841 #>>42179369 #>>42181081 #>>42189606 #

6. mapmeld ◴[19 Nov 24 00:24 UTC] No.42178841[source]▶

>>42178675 #

For people unfamiliar, Semantic Scholar is run by the Allen Institute and has been researching accurate AI summarization and semantic search for years. Also they have support for author name changes.

replies(1): >>42179115 #

7. griomnib ◴[19 Nov 24 00:27 UTC] No.42178860[source]▶

>>42178221 #

I’m fairly sure they only exist because Larry/Sergei might give half a fuck if they killed it outright, and it has a small enough team that the cost savings for killing aren’t enough for Ruth to want to make that argument.

8. crazygringo ◴[19 Nov 24 01:07 UTC] No.42179115{3}[source]▶

>>42178841 #

How does it compare with Google Scholar?

It advertises itself as "from all fields of science" -- does that includes fields like economics? Sociology? Political science? What about law journals? In other words, is the coverage as broad? And if it doesn't include certain fields, where is the "science" line drawn?

And I'm curious if people find it to be as useful (or more) just in terms of UX, features, etc.

replies(2): >>42179753 #>>42179807 #

9. bugglebeetle ◴[19 Nov 24 01:49 UTC] No.42179369[source]▶

>>42178675 #

OpenAlex is a really good here too, including their API. They’re also the inheritors of the Microsoft Academic Graph, fully open source and open data:

https://openalex.org

replies(1): >>42223641 #

10. Onawa ◴[19 Nov 24 03:08 UTC] No.42179753{4}[source]▶

>>42179115 #

Semantic Scholar's search is pretty good, but there are also a variety of other (paid) projects that expand on its API. Look at tools like Scite and LitMaps for what's possible with the semantic scholar dataset.

As for coverage, I think it focuses more on the life sciences, but I'm not positive about that.

11. kettlecorn ◴[19 Nov 24 03:16 UTC] No.42179796[source]▶

>>42177658 (TP) #

I miss the Google of yesteryear which had an altruistic streak and felt that enriching the world's ability to share and process information would ultimately accrue benefit to Google as well.

The Google of today is far more boring and less helpful.

replies(1): >>42180043 #

12. ninjin ◴[19 Nov 24 03:19 UTC] No.42179807{4}[source]▶

>>42179115 #

They are substantially smaller in coverage, but have higher quality in my experience. Remarkably, they are also willing to correct their data if you notify them. This of course in is stark contrast to Google Scholar where the metadata of papers is frequently wildly inaccurate. On top of this, Semantic Scholar shares their underlying data (although you need to request an API key). Overall, they have been growing slowly and steadily over the years and I have a lot of respect for what their team is doing for researchers such as myself.

Now for the less great.

They are pushing the concept of "Highly Influential Citations" [1] as their default metric, which to the best of my knowledge is based on a singular workshop publication that produced a classifier trained on about 500 training samples to classify citations. I am a very harsh critic of any metrics for scientific impact. But this is just utter madness. Guaranteeing that this metric is not grossly misleading is nearly impossible and it feels like the only reason they picked it is because Etzioni (AI2 head) is the last author of the workshop paper. It should have been at best a novelty metric and certainly not the default one.

[1]: https://webflow.semanticscholar.org/faq/influential-citation...

Recently, they introduced their Semantic Reader functionality and are now pushing it as a default way to access PDFs on the website. Forcing you to click on a drop down to access plain PDFs. It may or may not be a great tool, but it feels somewhat obvious that they are attempting to use shady patterns to push you in the direction they want.

Lastly, they have started using Google Analytics. Which is not great, but I can understand why they go for the industry default.

Overall, I use them nearly daily and they are the best offering out there for my area of research. Although, I at times feel tempted to grab the data and create an alternative (simpler) frontend with fewer distractions and "modern" web nonsense.

replies(1): >>42182914 #

13. smgit ◴[19 Nov 24 04:10 UTC] No.42180043[source]▶

>>42179796 #

Its a hard job to maintain systems in an altruistic state, cause opportunists and parasites are drawn in larger and larger numbers to where ever resources accumulate.

Google has a decent job not turning fully into an Oracle for example.

replies(1): >>42180075 #

14. insane_dreamer ◴[19 Nov 24 04:20 UTC] No.42180075{3}[source]▶

>>42180043 #

That’s a really really low bar

15. BlindEyeHalo ◴[19 Nov 24 08:17 UTC] No.42181064[source]▶

>>42177658 (TP) #

computer science has dblp.org which indexes all the relevant journals.

16. valusson ◴[19 Nov 24 08:19 UTC] No.42181081[source]▶

>>42178675 #

It's nice, but OpenAlex is better. https://explore.openalex.org/ It also has a free API and people have built python libraries to access it. https://pypi.org/project/pyalex/

17. crazygringo ◴[19 Nov 24 12:57 UTC] No.42182914{5}[source]▶

>>42179807 #

Thank you so much!

18. ◴[19 Nov 24 13:23 UTC] No.42183137[source]▶

>>42177658 (TP) #

19. random3 ◴[20 Nov 24 00:30 UTC] No.42189606[source]▶

>>42178675 #

I did a test across all Google Scholar alternatives I could find a few months ago. I got the same feelign like after Google Reader seized to exist. Literally nothing filled the gap.

My conclusion is that any such system needs to be "complete" or almost complete to be useful. By system, I mean a service or some handcrafted system where I could track anything. In all fairness, Sci-Hub partially fits the bill here and it's a big plus to society.

But the point is Google Scholar is complete in the sense that with a high probability I will find any paper I'm looking for along with reliable metadata. That's great, but the fact that they go above and beyond to prevent sharing that data is IMO backwards, against all academic research principles and this should raise questions within the research communities that rely on it.

replies(1): >>42189645 #

20. sitkack ◴[20 Nov 24 00:36 UTC] No.42189645{3}[source]▶

>>42189606 #

The biggest use of Google Scholar is in finding multiple sources or academic troves for a paper that is already accessible on semanticscholar.

There is no one stop shopping, you need to use all of them.

21. kergonath ◴[20 Nov 24 07:09 UTC] No.42191443{3}[source]▶

>>42178417 #

I agree entirely.

22. sitkack ◴[23 Nov 24 20:27 UTC] No.42223641{3}[source]▶

>>42179369 #

I have been trying out openalex, it is very very good!