(github.com)

990 points pierre | 1 comments | 20 Oct 25 06:26 UTC | HN request time: 0.227s | source

1. vladpowerman ◴[21 Oct 25 20:46 UTC] No.45661484[source]▶

The compression framing is super interesting. It makes me wonder if there’s an equivalent notion for source code - like how much “information” or entropy a commit contains vs. boilerplate churn.

I’ve been exploring Git activity analysis recently and ran into similar trade-offs: how do you tokenize real-world code and avoid counting noise?

↑

DeepSeek OCR