←back to thread

923 points zh2408 | 2 comments | | HN request time: 0s | source
Show context
ryao ◴[] No.43740101[source]
I would find this more interesting if it made tutorials out if the Linux, LLVM, OpenZFS and FreeBSD codebases.
replies(3): >>43740141 #>>43740152 #>>43743972 #
zh2408 ◴[] No.43740152[source]
The Linux repository has ~50M tokens, which goes beyond the 1M token limit for Gemini 2.5 Pro. I think there are two paths forward: (1) decompose the repository into smaller parts (e.g., kernel, shell, file system, etc.), or (2) wait for larger-context models with a 50M+ input limit.
replies(3): >>43740668 #>>43741788 #>>43745188 #
1. achierius ◴[] No.43741788[source]
Some huge percentage of that is just drivers. The kernel is likely what would be of interest to someone in this regard; moreover, much of that is architecture specific. IIRC the x86 kernel is <1M lines, though probably not <1M tokens.
replies(1): >>43743038 #
2. throwup238 ◴[] No.43743038[source]
The AMDGPU driver alone is 5 million lines - out of about 37 million lines total. Over 10% of the codebase is a driver for a single vendor, although most of it is auto generated per-product headers.