Most active commenters
  • artski(5)

←back to thread

121 points artski | 18 comments | | HN request time: 1.023s | source | bottom

When I came across a study that traced 4.5 million fake GitHub stars, it confirmed a suspicion I’d had for a while: stars are noisy. The issue is they’re visible, they’re persuasive, and they still shape hiring decisions, VC term sheets, and dependency choices—but they say very little about actual quality.

I wrote StarGuard to put that number in perspective based on my own methodology inspired with what they did and to fold a broader supply-chain check into one command-line run.

It starts with the simplest raw input: every starred_at timestamp GitHub will give. It applies a median-absolute-deviation test to locate sudden bursts. For each spike, StarGuard pulls a random sample of the accounts behind it and asks: how old is the user? Any followers? Any contribution history? Still using the default avatar? From that, it computes a Fake Star Index, between 0 (organic) and 1 (fully synthetic).

But inflated stars are just one issue. In parallel, StarGuard parses dependency manifests or SBOMs and flags common risk signs: unpinned versions, direct Git URLs, lookalike package names. It also scans licences—AGPL sneaking into a repo claiming MIT, or other inconsistencies that can turn into compliance headaches.

It checks contributor patterns too. If 90% of commits come from one person who hasn’t pushed in months, that’s flagged. It skims for obvious code red flags: eval calls, minified blobs, sketchy install scripts—because sometimes the problem is hiding in plain sight.

All of this feeds into a weighted scoring model. The final Trust Score (0–100) reflects repo health at a glance, with direct penalties for fake-star behaviour, so a pretty README badge can’t hide inorganic hype.

I added for the fun of it it generating a cool little badge for the trust score lol.

Under the hood, its all uses, heuristics, and a lot of GitHub API paging. Run it on any public repo with:

python starguard.py owner/repo --format markdown It works without a token, but you’ll hit rate limits sooner.

Please provide any feedback you can.

1. the__alchemist ◴[] No.43964589[source]
> It checks contributor patterns too. If 90% of commits come from one person who hasn’t pushed in months, that’s flagged.

IMO this is a slight green flag; not red.

replies(5): >>43964616 #>>43964685 #>>43964713 #>>43970992 #>>43971728 #
2. lispisok ◴[] No.43964616[source]
It's gonna flag most of the clojure ecosystem
replies(1): >>43966836 #
3. sethops1 ◴[] No.43964685[source]
I have to agree - the highest quality libraries in my experience are the ones maintained that one dedicated person as their pet project. There's no glory, no money, no large community, no Twitter followers - just a person with a problem to solve and making the solution open source for the benefit of others.
4. artski ◴[] No.43964713[source]
Fair take—it's definitely context-dependent. In some cases, solo-maintainer projects can be great, especially if they’re stable or purpose-built. But from a trust and maintenance standpoint, it’s worth flagging as a signal: if 90% of commits are from one person who’s now inactive, it could mean slow responses to bugs or no updates for security issues. Doesn’t mean the project is bad—just something to consider alongside other factors.

Heuristics are never perfect and it's all iterative but it's all about understanding the underlying assumptions and taking the knowledge you get out of it with your own context. Probably could enhance it slightly by a run through an LLM with a prompt but I prefer to keep things purely statistical for now.

replies(3): >>43964778 #>>43964815 #>>43965473 #
5. delfinom ◴[] No.43964778[source]
The problem is your audience is:

> CTOs, security teams, and VCs automate open-source due diligence in seconds.

The people that probably have less brain cells than the average programmer to understand the nuance in the flagging.

replies(1): >>43964890 #
6. 85392_school ◴[] No.43964815[source]
It could also mean that the project is stable. Since you only look at the one repository's commit activity, a stable project with a maintainer who's still active on GitHub in other places would be "less trustworthy" than a project that's a work in progress.
replies(3): >>43965467 #>>43966659 #>>43966930 #
7. artski ◴[] No.43964890{3}[source]
Lol yeah tbh - I just made it without really thinking of an audience, just was looking for a project to work on till I saw the paper and figured it would be cool to check it out on some repositories out there. That part is just me asking gpt to make the read me better.
8. ◴[] No.43965467{3}[source]
9. mlhpdx ◴[] No.43965473[source]
The signal here is how many unpatched vulnerabilities there are maybe multiplied by how long they’ve been out there. Purely statistical. And an actual signal.
10. artski ◴[] No.43966659{3}[source]
Not a bad idea tbh, maybe an additional how long issues are left open, would be a good idea. Though yeh thats why I was contemplating of not necessarily highlighting the actual number and more have a range e.g. 80-100 is good, 50-70 Moderate and so on.
replies(1): >>43968225 #
11. throwaway150 ◴[] No.43966836[source]
Yep, and it's not just Clojure. This will end up flagging projects across all non-mainstream ecosystems. Whether it's Vim plugins, niche command-line tools, academic research code, or hobbyist libraries for things like game development or creative coding, they'll likely get flagged simply because they're often maintained by individual developers. These devs build the projects, iterate quickly in the early stages, and eventually reach a point where the code is stable and no longer needs frequent updates.

It's a shame that this tool penalizes such projects, which I think are vital to a healthy open source ecosystem.

It's a nice project otherwise. But flagging stable projects from solo developers really sticks out like a sore thumb. :(

replies(1): >>43967278 #
12. kstrauser ◴[] No.43966930{3}[source]
I agree. I have a popular-ish project on GitHub that I haven't touched in like a decade. I would if needed, but it's basically "done". It works. It does everything it needs to, and no one's reported a bug in many, many years.

You could etch that thing into granite as far as I can tell. The only thing left to do is rewrite it in Rust.

13. artski ◴[] No.43967278{3}[source]
It would still count as "trustworthy" just wouldnt come out to 100/100 :(.
replies(1): >>43968454 #
14. InvisGhost ◴[] No.43968225{4}[source]
Be careful with this. Each project has different practices which could lead to false positives and false negatives. You may also create the wrong incentives, depending on how you measure and report things.
replies(1): >>43992205 #
15. mattgreenrocks ◴[] No.43968454{4}[source]
Ironically your chance of getting a PR through is about 10x higher on smaller one-man-show repos than more heavily trafficked corporate repos that require all manner of hoops to be jumped through for a PR.
16. j45 ◴[] No.43970992[source]
Not sure if this is a red flag.
17. 255kb ◴[] No.43971728[source]
Also, isn't that just 99% of OSS projects out there? I maintained a project for the past 7+ years, and despite 1 million downloads, tens of thousands of monthly active users, it's still mostly me, maintaining and committing. Yes, there is a bus factor, but it's a common and known problem in open-source. It would be better to try to improve the situation instead of just flagging all the projects. It's hard enough to find people ready to help and work on something outside their working hours on a regular basis...
18. mary-ext ◴[] No.43992205{5}[source]
it seems worthwhile to only mention it as a sidenote rather than a negative score