(github.com)

136 points d-yoda | 2 comments | 05 Oct 25 13:22 UTC | HN request time: 0.457s | source

Hi HN! I built pyscn for Python developers in the vibe coding era. If you're using Cursor, Claude, or ChatGPT to ship Python code fast, you know the feeling: features work, tests pass, but the codebase feels... messy.

Common vibe coding artifacts:

• Code duplication (from copy-pasted snippets)

• Dead code from quick iterations

• Over-engineered solutions for simple problems

• Inconsistent patterns across modules

pyscn performs structural analysis:

• APTED tree edit distance + LSH

• Control-Flow Graph (CFG) analysis

• Coupling Between Objects (CBO)

• Cyclomatic Complexity

Try it without installation:

  uvx pyscn analyze .          # Using uv (fastest)
  pipx run pyscn analyze .     # Using pipx
  (Or install: pip install pyscn)

Built with Go + tree-sitter. Happy to dive into the implementation details!

1. brynary ◴[05 Oct 25 14:59 UTC] No.45481994[source]▶

>>45481298 (OP) #

This looks great! Duplication and dead code are especially tricky to catch because they are not visible in diffs.

Since you mentioned the implementation details, a couple questions come to mind:

1. Are there any research papers you found helpful or influential when building this? For example, I need to read up on using tree edit distance for code duplication.

2. How hard do you think this would be to generalize to support other programming languages?

I see you are using tree-sitter which supports many languages, but I imagine a challenge might be CFGs and dependencies.

I’ll add a Qlty plugin for this (https://github.com/qltysh/qlty) so it can be run with other code quality tools and reported back to GitHub as pass/fail commit statuses and comments. That way, the AI coding agents can take action based on the issues that pyscn finds directly in a cloud dev env.

replies(1): >>45482211 #

2. d-yoda ◴[05 Oct 25 15:24 UTC] No.45482211[source]▶

>>45481994 (TP) #

Thank you! 1.For tree edit distance, I referred to "APTED: A Fast Tree Edit Distance Algorithm" (Pawlik & Augsten, 2016), but the algorithm works as O(n²) so I also implemented LSH (classic one) for large codebases.The other analyses also use classical compiler theory and techniques. 2. Should be straightforward! tree-sitter gives us parsers for 40+ languages. CFG construction is just tracking control flow, and the core algorithm stays the same.

I focused on Python first because vibe coding with Python tends to accumulate more structural issues. But the same techniques should apply to other languages as well.

Excited about the Qlty integration - that would make pyscn much more accessible and would be amazing!

↑

Show HN: Pyscn – Python code quality analyzer for vibe coders