The highest quality codebase

(gricha.dev)

559 points Gricha | 2 comments | 08 Dec 25 21:33 UTC | HN request time: 0.586s | source

Show context

elzbardico ◴[11 Dec 25 16:46 UTC] No.46233707[source]▶

>>46197930 (OP) #

LLMs have this strong bias towards generating code, because writing code is the default behavior from pre-training.

Removing code, renaming files, condensing, and other edits is mostly a post-training stuff, supervised learning behavior. You have armies of developers across the world making 17 to 35 dollars an hour solving tasks step by step which are then basically used to generate prompt/responses pairs of desired behavior for a lot of common development situations, adding desired output for things like tool calling, which is needed for things like deleting code.

A typical human working on post-training dataset generation task would involve a scenario like: given this Dockerfile for a python application, when we try to run pytest it fails with exception foo not found. The human will notice that package foo is not installed, change the requirements.txt file and write this down, then he will try pip install, and notice that the foo package requires a certain native library to be installed. The final output of this will be a response with the appropriate tool calls in a structured format.

Given that the amount of unsupervised learning is way bigger than the amount spent on fine-tuning for most models, it is not surprise that given any ambiguous situation, the model will default to what it knows best.

More post-training will usually improve this, but the quality of the human generated dataset probably will be the upper bound of the output quality, not to mention the risk of overfitting if the foundation model labs embrace SFT too enthusiastically.

replies(1): >>46235033 #

hackernewds ◴[11 Dec 25 18:27 UTC] No.46235033[source]▶

>>46233707 #

> Writing code is the default behavior from pre-training

what does this even mean? could you expand on it

replies(2): >>46235667 #>>46239575 #

bongodongobob ◴[11 Dec 25 19:06 UTC] No.46235667[source]▶

>>46235033 #

He means that it is heavily biased to write code, not remove, condense, refactor, etc. It wants to generate more stuff, not less.

replies(2): >>46236717 #>>46240123 #

1. elzbardico ◴[12 Dec 25 02:13 UTC] No.46240123[source]▶

>>46235667 #

Because there are not a lot of high quality examples of code edition on the training corpora other than maybe version control diffs.

Because editing/removing code requires that the model output tokens for tools calls to be intercepted by the coding agent.

Responses like the example below are not emergent behavior, they REQUIRE fine-tuning. Period.

  I need to fix this null pointer issue in the auth module.
  <|tool_call|>
  {"id": "call_abc123", "type": "function", "function": {"name": "edit_file",     "arguments": "{"path": "src/auth.py", "start_line": 12, "end_line": 14, "replacement": "def authenticate(user):\n    if user is None:\n        return   False\n    return verify(user.token)"}"}}
  <|end_tool_call|>

replies(1): >>46242573 #

2. bongodongobob ◴[12 Dec 25 09:59 UTC] No.46242573[source]▶

>>46240123 (TP) #

I'm not disagreeing with any of this. Feels kind of hostile.

↑