←back to thread

121 points artski | 1 comments | | HN request time: 0.335s | source

When I came across a study that traced 4.5 million fake GitHub stars, it confirmed a suspicion I’d had for a while: stars are noisy. The issue is they’re visible, they’re persuasive, and they still shape hiring decisions, VC term sheets, and dependency choices—but they say very little about actual quality.

I wrote StarGuard to put that number in perspective based on my own methodology inspired with what they did and to fold a broader supply-chain check into one command-line run.

It starts with the simplest raw input: every starred_at timestamp GitHub will give. It applies a median-absolute-deviation test to locate sudden bursts. For each spike, StarGuard pulls a random sample of the accounts behind it and asks: how old is the user? Any followers? Any contribution history? Still using the default avatar? From that, it computes a Fake Star Index, between 0 (organic) and 1 (fully synthetic).

But inflated stars are just one issue. In parallel, StarGuard parses dependency manifests or SBOMs and flags common risk signs: unpinned versions, direct Git URLs, lookalike package names. It also scans licences—AGPL sneaking into a repo claiming MIT, or other inconsistencies that can turn into compliance headaches.

It checks contributor patterns too. If 90% of commits come from one person who hasn’t pushed in months, that’s flagged. It skims for obvious code red flags: eval calls, minified blobs, sketchy install scripts—because sometimes the problem is hiding in plain sight.

All of this feeds into a weighted scoring model. The final Trust Score (0–100) reflects repo health at a glance, with direct penalties for fake-star behaviour, so a pretty README badge can’t hide inorganic hype.

I added for the fun of it it generating a cool little badge for the trust score lol.

Under the hood, its all uses, heuristics, and a lot of GitHub API paging. Run it on any public repo with:

python starguard.py owner/repo --format markdown It works without a token, but you’ll hit rate limits sooner.

Please provide any feedback you can.

Show context
the__alchemist ◴[] No.43964589[source]
> It checks contributor patterns too. If 90% of commits come from one person who hasn’t pushed in months, that’s flagged.

IMO this is a slight green flag; not red.

replies(5): >>43964616 #>>43964685 #>>43964713 #>>43970992 #>>43971728 #
1. sethops1 ◴[] No.43964685[source]
I have to agree - the highest quality libraries in my experience are the ones maintained that one dedicated person as their pet project. There's no glory, no money, no large community, no Twitter followers - just a person with a problem to solve and making the solution open source for the benefit of others.