←back to thread

Using LLMs at Oxide

(rfd.shared.oxide.computer)
694 points steveklabnik | 7 comments | | HN request time: 0.215s | source | bottom
Show context
john01dav ◴[] No.46178567[source]
> Wherever LLM-generated code is used, it becomes the responsibility of the engineer. As part of this process of taking responsibility, self-review becomes essential: LLM-generated code should not be reviewed by others if the responsible engineer has not themselves reviewed it. Moreover, once in the loop of peer review, generation should more or less be removed: if code review comments are addressed by wholesale re-generation, iterative review becomes impossible.

My general procedure for using an LLM to write code, which is in the spirit of what is advocated here, is:

1) First, feed in the existing relevant code into an LLM. This is usually just a few source files in a larger project

2) Describe what I want to do, either giving an architecture or letting the LLM generate one. I tell it to not write code at this point.

3) Let it speak about the plan, and make sure that I like it. I will converse to address any deficiencies that I see, and I almost always do.

4) I then tell it to generate the code

5) I skim & test the code to see if it's generally correct, and have it make corrections as needed

6) Closely read the entire generated artifact at this point, and make manual corrections (occasionally automatic corrections like "replace all C style casts with the appropriate C++ style casts" then a review of the diff)

The hardest part for me is #6, where I feel a strong emotional bias towards not doing it, since I am not yet aware of any errors compelling such action.

This allows me to operate at a higher level of abstraction (architecture) and remove the drudgery of turning an architectural idea into written, precise, code. But, when doing so, you are abandoning those details to a non-deterministic system. This is different from, for example, using a compiler or higher level VM language. With these other tools, you can understand how they work and rapidly have a good idea of what you're going to get, and you have robust assurances. Understanding LLMs helps, but thus not to the same degree.

replies(4): >>46179058 #>>46179214 #>>46181793 #>>46182160 #
1. ryandrake ◴[] No.46179058[source]
I've found that your step 6 takes the vast majority of the time I spend programming with LLMs. Like 10X+ the combined total of time steps 1-5 take. And that's if the code the LLM produced actually works. If it doesn't work (which happens quite often), then even more handholding and corrections are needed. It's really a grind. I'm still not sure whether I am net saving time using these tools.

I always wonder about the people who say LLMs save them so much time: Do you just accept the edits they make without reviewing each and every line?

replies(3): >>46179291 #>>46179296 #>>46179610 #
2. Jaygles ◴[] No.46179291[source]
I exclusively use the autocomplete in cursor. I hate reviewing huge chunks of llm code at one time. With the autocomplete, I’m in full control of the larger design and am able to quickly review each piece of llm code. Very often it generates what I was going to type myself.

Anything that involves math or complicated conditions I take extra time on.

I feel I’m getting code written 2 to 3 times faster this way while maintaining high quality and confidence

replies(2): >>46179555 #>>46183092 #
3. hedgehog ◴[] No.46179296[source]
You can have the tool start by writing an implementation plan describing the overall approach and key details including references, snippets of code, task list, etc. That is much faster than a raw diff to review and refine to make sure it matches your intent. Once that's acceptable the changes are quick, and having the machine do a few rounds of refinement to make sure the diff vs HEAD matches the plan helps iron out some of the easy issues before human eyes show up. The final review is then easier because you are only checking for smaller issues and consistency with the plan that you already signed off on.

It's not magic though, this still takes some time to do.

4. zeroonetwothree ◴[] No.46179555[source]
Maybe it subjectively feels like 2-3x faster but in studies that measure it we tend to see smaller improvements like in the range of 20-30% faster. It could be that you are an outlier, of course.
replies(1): >>46181994 #
5. mythrwy ◴[] No.46179610[source]
If it's stuff I have have been doing for years and isn't terribly complex I've found its generally quick to skim review. I don't need to read every line I can glance at it, know it's a loop and why, a function call or whatever. If I see something unusual I take that as an opportunity to learn.

I've seen LLMs write some really bad code a few times lately it seems almost worse than what they were doing 6 or 8 months ago. Could be my imagination but it seems that way.

6. Jaygles ◴[] No.46181994{3}[source]
2-3x faster on getting the code written. Fully completing a coding task maybe only 20-30% faster, if we count chasing down requirements, reviews, waiting for CI to pass so I can merge etc.
7. NKjNkaka ◴[] No.46183092[source]
This is my preferred way as well. And when you think about it, it makes sense. With advanced autocomplete you are:

1. Keeping the context very small 2. Keeping the scope of the output very small

With the added benefit of keeping you in the flow state (and in my experience making it more enjoyable).

To anyone that even hates LLMs give autocomplete a shot (with a keying to toggle it if it annoys you, sometimes it’s awful). It’s really no different than typing it manually wrt quality etc, so the speed up isn’t huge, but it feels a lot nicer.