Most active commenters
  • yarekt(6)
  • wiremine(5)
  • godelski(5)
  • pc86(5)
  • RHSeeger(4)
  • (3)
  • javier2(3)
  • TeMPOraL(3)
  • frizlab(3)

←back to thread

858 points cryptophreak | 103 comments | | HN request time: 0.4s | source | bottom
1. wiremine ◴[] No.42936346[source]
I'm going to take a contrarian view and say it's actually a good UI, but it's all about how you approach it.

I just finished a small project where I used o3-mini and o3-mini-high to generate most of the code. I averaged around 200 lines of code an hour, including the business logic and unit tests. Total was around 2200 lines. So, not a big project, but not a throw away script. The code was perfectly fine for what we needed. This is the third time I've done this, and each time I get faster and better at it.

1. I find a "pair programming" mentality is key. I focus on the high-level code, and let the model focus on the lower level code. I code review all the code, and provide feedback. Blindly accepting the code is a terrible approach.

2. Generating unit tests is critical. After I like the gist of some code, I ask for some smoke tests. Again, peer review the code and adjust as needed.

3. Be liberal with starting a new chat: the models can get easily confused with longer context windows. If you start to see things go sideways, start over.

4. Give it code examples. Don't prompt with English only.

FWIW, o3-mini was the best model I've seen so far; Sonnet 3.5 New is a close second.

replies(27): >>42936382 #>>42936605 #>>42936709 #>>42936731 #>>42936768 #>>42936787 #>>42936868 #>>42937019 #>>42937109 #>>42937172 #>>42937188 #>>42937209 #>>42937341 #>>42937346 #>>42937397 #>>42937402 #>>42937520 #>>42938042 #>>42938163 #>>42939246 #>>42940381 #>>42941403 #>>42942698 #>>42942765 #>>42946138 #>>42946146 #>>42947001 #
2. ikety ◴[] No.42936382[source]
do you use pair programming tools like aider?
3. shmoogy ◴[] No.42936605[source]
Have you tried cursor? I really like the selecting context -> cmd+l to make a chat with it - explain requirement, hit apply, validate the diff.

Works amazingly well for a lot of what I've been working on the past month or two.

replies(1): >>42936971 #
4. ryandrake ◴[] No.42936709[source]
I guess the things I don't like about Chat are the same things I don't like about pair (or team) programming. I've always thought of programming as a solitary activity. You visualize the data structures, algorithms, data paths, calling flow and stack, and so on, in your mind, with very high throughput "discussions" happening entirely in your brain. Your brain is high bandwidth, low latency. Effortlessly and instantly move things around and visualize them. Figure everything out. Finally, when it's correct, you send it to the slow output device (your fingers).

The minute you have to discuss those things with someone else, your bandwidth decreases by orders of magnitude and now you have to put words to these things and describe them, and physically type them in or vocalize them. Then your counterpart has to input them through his eyes and ears, process that, and re-output his thoughts to you. Slow, slow, slow, and prone to error and specificity problems as you translate technical concepts to English and back.

Chat as a UX interface is similarly slow and poorly specific. It has all the shortcomings of discussing your idea with a human and really no upside besides the dictionary-like recall.

replies(9): >>42936954 #>>42937127 #>>42938119 #>>42938564 #>>42939410 #>>42943038 #>>42944645 #>>42946579 #>>42946796 #
5. dataviz1000 ◴[] No.42936731[source]
I agree with you.

Yesterday, I asked o3-mini to "optimize" a block of code. It produced very clean, functional TypeScript. However, because the code is reducing stock option chains, I then asked o3-mini to "optimize for speed." In the JavaScript world, this is usually done with for loops, and it even considered aspects like array memory allocation.

This shows that using the right qualifiers is important for getting the results you want. Today, I use both "optimize for developer experience" and "optimize for speed" when they are appropriate.

Although declarative code is just an abstraction, moving from imperative jQuery to declarative React was a major change in my coding experience. My work went from telling the system how to do something to simply telling it what to do. Of course, in React—especially at first—I had to explain how to do things, but only once to create a component. After that, I could just tell the system what to do. Now, I can simply declare the desired outcome, the what. It helps to understand how things work, but that level of detail is becoming less necessary.

6. ◴[] No.42936768[source]
7. bongodongobob ◴[] No.42936787[source]
To add to that, I always add some kind of debug function wrapper so I can hand off the state of variables and program flow to the LLM when I need to debug something. Sometimes it's really hard to explain exactly what went wrong so being able to give it a chunk of the program state is more descriptive.
replies(1): >>42937006 #
8. jacob019 ◴[] No.42936868[source]
Totally agree. Chat is a fantastic interface because it stays out of my way. For me it's much more than a coding assistant. I get live examples of how to use tools, and help with boilerplate, which is a time saver and improvement over legacy workflows, but the real benefit is all the spitballing I can do with it to refine ideas and logic and help getting up to speed on tooling way outside of my domain. I spent about 3.5 hours chatting with o1 about RL architecture to solve some business problems. Now I have a crystal clear plan and the confidence to move forward in an optimal way. I feel a little weird now, like I was just talking to myself for a few hours, but it totally helped me work through the planning. For actual code, I find myself being a bit less interactive with LLMs as time goes, sometimes it's easier to just write the logic the way I want rather than trying to explain how I want it but the ability to retrieve code samples for anything with ease is like a superpower. Not to mention all the cool stuff LLMs can do at runtime via API. Yeah, chat is great, and I'll stick with writing code in Vim and pasting as needed.
9. throwup238 ◴[] No.42936954[source]
At the same time, putting your ideas to words forces you to make them concrete instead of nebulous brain waves. I find that the chat interface gets rid of the downsides of pair programming (that the other person is a human being with their own agency*) while maintaining the “intelligent” pair programmer aspect.

Especially with the new r1 thinking output, I find it useful to iterate on the initial prompt as a way to make my ideas more concrete as much as iterating through the chat interface which is more hit and miss due to context length limits.

* I don’t mean that in a negative way, but in a “I can’t expect another person to respond to me instantly at 10 words per second” way.

replies(1): >>42938001 #
10. gnatolf ◴[] No.42936971[source]
I haven't tried cursor yet, but how is this different from the copilot plugin in vscode? Sounds pretty similar.
replies(3): >>42938137 #>>42938200 #>>42939231 #
11. throwup238 ◴[] No.42937006[source]
I do the same for my QT desktop app. I’ve got an “Inspector” singleton that allows me to select a component tree via click, similar to browser devtools. It takes a screenshot, dumps the QML source, and serializes the state of the components into the clipboard.

I paste that into Claude and it is surprisingly good at fixing bugs and making visual modifications.

replies(2): >>42938390 #>>42939632 #
12. ◴[] No.42937019[source]
13. sdesol ◴[] No.42937109[source]
> 1. I find a "pair programming" mentality is key. I focus on the high-level code, and let the model focus on the lower level code. I code review all the code, and provide feedback. Blindly accepting the code is a terrible approach.

This is what I've found to be key. If I start a new feature, I will work with the LLM to do the following:

- Create problem and solution statement

- Create requirements and user stories

- Create architecture

- Create skeleton code. This is critical since it lets me understand what it wants to do.

- Generate a summary of the skeleton code

Once I have done the above, I will have the LLM generate a reusable prompt that I can use to start LLM conversations with. Below is an example of how I turn everything into a reusable prompt.

https://beta.gitsense.com/?chat=b96ce9e0-da19-45e8-bfec-a3ec...

As I make changes like add new files, I will need to generate a new prompt but it is worth the effort. And you can see it in action here.

https://beta.gitsense.com/?chat=b8c4b221-55e5-4ed6-860e-12f0...

The first message is the reusable prompt message. With the first message in place, I can describe the problem or requirements and ask the LLM what files it will need to better understand how to implement things.

What I am currently doing highlights how I think LLM is a game changer. VCs are going for moonshots instead of home runs. The ability to gather requirements and talk through a solution before even coding is how I think LLMs will revolutionize things. It is great that it can produce usable code, but what I've found it to be invaluable is it helps you organize your thoughts.

In the last link, I am having a conversation with both DeepSeek v3 and Sonnet 3.5 and the LLMs legitimately saved me hours in work, without even writing a single line of code. In the past, I would have just implemented the feature and been done with it, and then I would have to fix something if I didn't think of an edge case. With LLMs, it literally takes minutes to develop a plan that is extremely well documented that can be shared with others.

This ability to generate design documents is how I think LLMs will ultimately be used. The bonus is producing code, but the reality is that documentation (which can be tedious and frustrating) is a requirement for software development. In my opinion, this is where LLMs will forever change things.

14. frocodillo ◴[] No.42937127[source]
I would argue that is a feature of pair programming, not a bug. By forcing you to use the slower I/O parts of your brain (and that of your partner) the process becomes more deliberate, allowing you to catch edge cases, bad design patterns, and would-be bugs before even putting pen to paper so to speak. Not to mention that it immediately reduces the bus factor by having two people with a good understanding of the code.

I’m not saying pair programming is a silver bullet, and I tend to agree that working on your own can be vastly more efficient. I do however think that it’s a very useful tool for critical functionality and hard problems and shouldn’t be dismissed.

replies(3): >>42938206 #>>42938721 #>>42942166 #
15. javier2 ◴[] No.42937172[source]
Nah, a Chat is terrible for development. In my tears of working, i have only had the chance to start a new codebase 3-4 times. 90% of the time is spent modifying large existing systems, constantly changing them. The chat interface is terrible for this. It would be much better if it was more integrated with the codebase and editor
replies(2): >>42938082 #>>42941412 #
16. nonrandomstring ◴[] No.42937188[source]
> it's actually a good UI

Came to vote good too. I mean, why do we all love a nice REPL? That's chat right? Chat with an interpreter.

17. rpastuszak ◴[] No.42937209[source]
I've changed my mind on that as well. I think that, generally, chat UIs are a lazy and not very user friendly. However, when coding I keep switching between two modes:

1. I need a smart autocomplete that can work backwards and mimic my coding patterns

2. I need a pair programming buddy (of sorts, this metaphor doesn't completely work, but I don't have a better one)

Pair development, even a butchered version of the so called "strong style" (give the driver the highest level of abstraction they can use/understand) works quite well for me. But, the main reason this works is that it forces me to structure my thinking a little bit, allows me to iterate on the definition of the problem. Toss away the sketch with bigger parts of the problem, start again.

It also helps me to avoid yak shaving, getting lost in the detail or distracted because the feedback loop between me seeing something working on the screen vs. the idea is so short (even if the code is crap).

I'd also add 5.: use prompts to generate (boring) prompts. For instance, I needed a simple #tag formatter for one of my markdown sites. I am aware that there's a not-so-small list of edge cases I'd need to cover. In this case I'd write a prompt with a list of basic requirements and ask the LLM to: a) extend it with good practice, common edge cases b) format it as a spec with concrete input / output examples. This works a bit similar to the point you made about generating unit tests (I do that too, in tandem with this approach).

In a sense 1) is autocomplete 2) is a scaffolding tool.

replies(3): >>42937293 #>>42937450 #>>42938382 #
18. echelon ◴[] No.42937293[source]
I work on GenAI in the media domain, and I think this will hold true with other fields as well:

- Text prompts and chat interfaces are great for coarse grained exploration. You can get a rough start that you can refine. "Knight standing in a desert, rusted suit of armor" gets you started, but you'll want to take it much further.

- Precision inputs (mouse or structure guided) are best for fine tuning the result and honing in on the solution itself. You can individually plant the cacti and pose the character. You can't get there with text.

19. rafaelmn ◴[] No.42937341[source]
This only works for small self-contained problems with narrow scope/context.

Chat sucks for pulling in context, and the only worse thing I've tried is the IDE integrations that supposedly pull the relevant context for you (and I've tried quite a few recently).

I don't know if naive fine-tuning with codebase would work, I suspect there are going to be tools that let you train the AI on code in the sense that it can have some references in model, and it knows how you want your project code/structure to look like (which is often quite different from what it looks in most areas)

20. godelski ◴[] No.42937346[source]

  > I focus on the high-level code, and let the model focus on the lower level code.
Tbh the reason I don't use LLM assistants is because they suck at the "low level". They are okay at mid level and better at high level. I find it's actual coding very mediocre and fraught with errors.

I've yet to see any model understand nuance or detail.

This is especially apparent in image models. Sure, it can do hands but they still don't get 3D space nor temporal movements. It's great for scrolling through Twitter but the longer you look the more surreal they get. This even includes the new ByteDance model also on the front page. But with coding models they ignore context of the codebase and the results feel more like patchwork. They feel like what you'd be annoyed at with a junior dev for writing because not only do you have to go through 10 PRs to make it pass the test cases but the lack of context just builds a lot of tech debt. How they'll build unit tests that technically work but don't capture the actual issues and usually can be highly condensed while having greater coverage. It feels very gluey, like copy pasting from stack overflow when hyper focused on the immediate outcome instead of understanding the goal. It is too "solution" oriented, not understanding the underlying heuristics and is more frustrating than dealing with the human equivalent who says something "works" as evidenced by the output. This is like trying to say a math proof is correct by looking at just the last line.

Ironically, I think in part this is why chat interface sucks too. A lot of our job is to do a lot of inference in figuring out what our managers are even asking us to make. And you can't even know the answer until you're part way in.

replies(3): >>42937928 #>>42938207 #>>42938298 #
21. ic4l ◴[] No.42937397[source]
For me the o models consistently make more mistakes for me than Claude 3.5 Sonnet.
replies(1): >>42938094 #
22. gamedever ◴[] No.42937402[source]
What did you create? In my field, so far, I've found the chat bots not doing so well. My guess is the more likely you're making something other people make often, the more likely the bot will help.

Even then though, I asked o1-cursor to start a react app. It failed, mostly because it's out of date. It's instructions were for react 2 versions ago.

This seems like an issue. If the statistically most likley answer is old, that's not helpful.

replies(1): >>42938407 #
23. ryandrake ◴[] No.42937450[source]
> I've changed my mind on that as well. I think that, generally, chat UIs are a lazy and not very user friendly. However, when coding I keep switching between two modes:

> 1. I need a smart autocomplete that can work backwards and mimic my coding patterns

> 2. I need a pair programming buddy (of sorts, this metaphor doesn't completely work, but I don't have a better one)

Thanks! This is the first time I've seen it put this clearly. When I first tried out CoPilot, I was unsure of how I was "supposed" to interact with it. Is it (as you put it) a smarter autocomplete, or a programming buddy? Is it both? What was the right input method to use?

After a while, I realized that for my personal style I would pretty much entirely use method 1, and never method 2. But, others might really need that "programming buddy" and use that interface instead.

24. ls_stats ◴[] No.42937520[source]
>it's actually a good UI >I just finished a small project >around 2200 lines

why the top comments on HN are always people who have not read the article

replies(1): >>42938134 #
25. lucasmullens ◴[] No.42937928[source]
> But with coding models they ignore context of the codebase and the results feel more like patchwork.

Have you tried Cursor? It has a great feature that grabs context from the codebase, I use it all the time.

replies(3): >>42938048 #>>42939067 #>>42939425 #
26. cortesoft ◴[] No.42938001{3}[source]
> At the same time, putting your ideas to words forces you to make them concrete instead of nebulous brain waves.

I mean, isn’t typing your code also forcing you to make your ideas concrete

replies(1): >>42938229 #
27. larodi ◴[] No.42938042[source]
I would actually join you, as my longstanding view on coding is that it is best done in pairs. Sadly humans and programmers in particular are not so ready to work arms-by-arms, and it is even more depressing that it now turns AI is pairing us.

Perhaps there's gonna be post-AI programming movement where people actually stare at the same monitor and discuss while one of them is coding.

As a sidenote - we've done experiments with FOBsters, and when paired this way, the multiply their output. There's something about psychology of groups and how one can only provide maximum output when teaming.

Even for solo activities, and non-IT activities, such as skiing/snowboard, it is better to have a partner to ride with you and discuss the terrain.

28. pc86 ◴[] No.42938048{3}[source]
I can't get the prompt because I'm on my work computer but I have about a three-quarter-page instruction set in the settings of cursor, it asks clarifying questions a LOT now, and is pretty liberal with adding in commented pseudo-code for stuff it isn't sure about. You can still trip it up if you try, but it's a lot better than stock. This is with Sonnet 3.5 agent chats (composer I think it's called?)

I actually cancelled by Anthropic subscription when I started using cursor because I only ever used Claude for code generation anyway so now I just do it within the IDE.

replies(2): >>42947006 #>>43012127 #
29. pc86 ◴[] No.42938082[source]
Cursor does all of this, and agent chats let you describe a new feature or an existing bug and it will search the entire codebase and add relevant code to its context automatically. You can optionally attach files for the context - code files that you want to add to the context up front, documentation for third-party calls, whatever you want.

As a side note, "No, you're wrong" is not a great way to have a conversation.

replies(1): >>42938533 #
30. pc86 ◴[] No.42938094[source]
Same for me. I wonder if Claude is better at some languages than others, and o models are better at those weaker languages. There are some devs I know who insist Claude is garbage for coding and o3-* or o4-* are tier 1.
replies(2): >>42938222 #>>42940432 #
31. yarekt ◴[] No.42938119[source]
That's such a mechanical way of describing pair programming. I'm guessing you don't do it often (understandable if its not working for you).

For me pair programming accelerates development to much more than 2x. Over time the two of you figure out how to use each other's strengths, and as both of you immerse yourself in the same context you begin to understand what's needed without speaking every bit of syntax between each other.

In best cases as a driver you end up producing high quality on the first pass, because you know that your partner will immediately catch anything that doesn't look right. You also go fast because you can sometimes skim over complexities letting your partner think ahead and share that context load.

I'll leave readers to find all the caveats here

Edit: I should probably mention why I think Chat Interface for AI is not working like Pair programming: As much as it may fake it, AI isn't learning anything while you're chatting to it. Its pointless to argue your case or discuss architectural approaches. An approach that yields better results with Chat AI is to just edit/expand your original prompt. It also feels less like a waste of time.

With Pair programming, you may chat upfront, but you won't reach that shared understanding until you start trying to implement something. For now Chat AI has no shared understanding, just "what I asked you to do" thing, and that's not good enough.

replies(7): >>42938201 #>>42939091 #>>42939986 #>>42942735 #>>42944074 #>>42947003 #>>42954463 #
32. pc86 ◴[] No.42938134[source]
It's not clear to me in the lines you're quoting that the GP didn't read the article.
replies(1): >>42938429 #
33. cheema33 ◴[] No.42938137{3}[source]
> copilot plugin in vscode

Copilot, back when I used it, completely ignored context outside of the file I was working in. Copilot, as of a few weeks ago, the absolute dumbest assistant of all the various options available.

With cursor, I can ask it to make a change to how the app generates a JWT without even knowing which file or folder the relevant code is in. For very large codebases, this is very very helpful.

34. bboygravity ◴[] No.42938163[source]
Interesting to see the narrative on here slowly change from "LLM's will forever be useless for programming" to "I'm using it every day" over the course of the past year or so.

I'm now bracing for the "oh sht, we're all out of a job next year" narrative.

replies(2): >>42938264 #>>42938345 #
35. cruffle_duffle ◴[] No.42938200{3}[source]
Similar flow but much better user experience. At least that is how I’d describe it.
36. RHSeeger ◴[] No.42938201{3}[source]
I think it depends heavily on the people. I've done pair programming at a previous job and I hated it. It wound up being a lot slower overall.

For me, there's

- Time when I want to discuss the approach and/or code to something (someone being there is a requirement)

- Time when I want to rubber duck, and put things to words (someone being there doesn't hurt, but it doesn't help)

- Time when I want to write code that implements things, which may be based on the output of one of the above

That last bucket of time is generally greatly hampered by having someone else there and needing to interact with them. Being able to separate them (having people there for the first one or two, but not the third) is, for me, optimal.

replies(1): >>42945860 #
37. RHSeeger ◴[] No.42938206{3}[source]
You can do that without pair programming, though. Both through actual discussions and through rubber ducking.
38. wiremine ◴[] No.42938207[source]
> Tbh the reason I don't use LLM assistants is because they suck at the "low level". They are okay at mid level and better at high level. I find it's actual coding very mediocre and fraught with errors.

That's interesting. I found assistants like Copilot fairly good at low level code, assuming you direct it well.

replies(1): >>42940634 #
39. kristofferR ◴[] No.42938222{3}[source]
o4 doesn't exist (in public at least) yet.
replies(1): >>42942633 #
40. RHSeeger ◴[] No.42938229{4}[source]
Doing it in your native language can add an extra dimension to it, though. In a way, I would consider it like double checking your work on something like a math problem by solving it a different way. By having to express the problem and solution in clear language, it can really help you make sure your solution is a good one, and considers all the angles.
41. RHSeeger ◴[] No.42938264[source]
I think a lot of people have always thought of it as a tool that can help.

I don't want an LLM to generate "the answer" for me in a lot of places, but I do think it's amazing for helping me gather information (and cite where that information came from) and pointers in directions to look. A search engine that generates a concrete answer via LLM is (mostly) useless to me. One that gives me an answer and then links to the facts it used to generate that answer is _very_ useful.

It's the same way with programming. It's great helping you find what you need. But it needs to be in a way that you can verify it's right; or take it's answer and adjust it to what you actually need (based on the context it provides).

42. yarekt ◴[] No.42938298[source]
> A lot of our job is to do a lot of inference in figuring out what our managers are even asking us to make

This is why I think LLMs can't really replace developers. 80% of my job is already trying to figure out what's actually needed, despite being given lots of text detail, maybe even spec, or prototype code.

Building the wrong thing fast is about as useful as not building anything at all. (And before someone says "at least you now know what not to do"? For any problem there are infinite number of wrong solutions, but only a handful of ones that yield success, why waste time trying all the wrong ones?)

replies(2): >>42939668 #>>42940612 #
43. wiremine ◴[] No.42938345[source]
> "oh sht, we're all out of a job next year"

Maybe. My sense if we'd need to see 3 to 4 orders of magnitude improvements on the current models before we can replace people outright.

I do think we'll see a huge productivity boost per developer over the next few years. Some companies will use that to increase their throughput, and some will use it to reduce overhead.

replies(1): >>42940083 #
44. yarekt ◴[] No.42938382[source]
Oh yea, point 1 for sure. I call copilot regex on steroids.

Example: - copy paste a table from a pdf datasheet into a comment (it'll be badly formatted with newlines and whatnot, doesn't matter) - show it how to do the first line - autocomplete the rest of the table - Check every row to make sure it didn't invent fields/types

For this type of workflow the tools are a real time saver. I've yet to see any results for the other workflows. They usually just frustrate me by either starting to suggest nonsense code without full understanding, or its far too easy to bias the results and make them stuck in a pattern of thinking.

45. acrophiliac ◴[] No.42938390{3}[source]
That sounds cool. I could use that. Care to share your Inspector code?
46. wiremine ◴[] No.42938407[source]
The most recent one was a typescript project focused on zod.

I might be reading into your comment, but I agree "top-down" development sucks: "Give me a react that does X". I've had much more success going bottom-up.

And I've often seen models getting confused on versions. You need to be explicit, and even then then forget.

47. wiremine ◴[] No.42938429{3}[source]
Just confirming I did read the article in its entirety. Not reading it is like HN sin #1.
48. javier2 ◴[] No.42938533{3}[source]
Yeah, that is right. I'll give Cursor a try, because I believe we can do much better than these hopeless chat windows!
replies(1): >>42938814 #
49. bobbiechen ◴[] No.42938564[source]
I agree, chat is only useful in scenarios that are 1) poorly defined, and 2) require a back-and-forth feedback loop. And even then, there might be better UX options.

I wrote about this here: https://digitalseams.com/blog/the-ideal-ai-interface-is-prob...

50. TeMPOraL ◴[] No.42938721{3}[source]
I guess it depends on a person. My experience is close to that of 'ryandrake.

I've been coding long enough to notice there are times where the problem is complex and unclear enough that my own thought process will turn into pair programming with myself, literally chatting with myself in a text file; this process has the bandwidth and latency on the same order as talking to another person, so I might just as well do that and get the benefit of an independent perspective.

The above is really more of a design-level discussion. However, there are other times - precisely those times that pair programming is meant for - when the problem is clear enough I can immerse myself in it. Using the slow I/O mode, being deliberate is exactly the opposite of what I need then. By moving alone and focused, keeping my thoughts below the level of words, I can explore the problem space much further, rapidly proposing a solution, feeling it out, proposing another, comparing, deciding on a direction, noticing edge cases and bad design up front and dealing with them, all in a rapid feedback loop with test. Pair programming in this scenario would truly force me to "use the slower I/O parts of your brain", in that exact sense: it's like splitting a highly-optimized in-memory data processing pipeline in two, and making the halves communicate over IPC. With JSON.

As for bus factor, I find the argument bogus anyway. For that to work, pair programming would've to be executed with the same partner or small group of partners, preferably working on the same or related code modules, daily, over the course of weeks at least - otherwise neither them nor I are going to have enough exposure to understand what the other is working on. But it's not how pair programming worked when I've experienced it.

It's a problem with code reviews, too: if your project has depth[0], I won't really understand the whole context of what you're doing, and you won't understand the context of my work, so our reviews of each others' code will quickly degenerate to spotting typos, style violations, and peculiar design choices; neither of us will have time or mental capacity to fully understand the changeset before "+2 LGTM"-ing it away.

--

[0] - I don't know if there's a a better, established term for it. What I mean is depth vs. breadth in the project architecture. Example of depth: you have a main execution orchestrator, you have an external data system that handles integrations with a dozen different data storage systems, then you have math-heavy business logic on data, then you have RPC for integrating with GUI software developed by another team, then you have extensive configuration system, etc. - each of those areas is full of design and coding challenges that don't transfer to any other. Contrast that with an example of breadth: a typical webapp or mobile app, where 80% of the code is just some UI components and a hundred different screens, with very little unique or domain-specific logic. In those projects, developers are like free electrons in metal: they can pick any part of the project at any given moment and be equally productive working on it, because every part is basically the same as every other part. In those projects, I can see both pair programming and code reviews deliver on their promises in full.

replies(2): >>42941712 #>>42943013 #
51. pc86 ◴[] No.42938814{4}[source]
I've tried every LLM+IDE combo that I've heard about and Cursor is by far the best.
52. godelski ◴[] No.42939067{3}[source]
I have not. But I also can't get the general model to work well in even toy problems.

Here's a simple example with GPT-4o: https://0x0.st/8K3z.png

It probably isn't obvious in a quick read, but there are mistakes here. Maybe the most obvious is that how `replacements` is made we need to intelligently order. This could be fixed by sorting. But is this the right data structure? Not to mention that the algorithm itself is quite... odd

To give a more complicated example I passed the same prompt from this famous code golf problem[0]. Here's the results, I'll save you the time, the output is wrong https://0x0.st/8K3M.txt (note, I started command likes with "$" and added some notes for you)

Just for the heck of it, here's the same thing but with o1-preview

Initial problem: https://0x0.st/8K3t.txt

Codegolf one: https://0x0.st/8K3y.txt

As you can see, o1 is a bit better on the initial problem but still fails at the code golf one. It really isn't beating the baseline naive solution. It does 170 MiB/s compared to 160 MiB/s (baseline with -O3). This is something I'd hope it could do really well on given that this problem is rather famous and so many occurrences of it should show up. There's tons of variations out there and It is common to see parallel fizzbuzz in a class on parallelization as well as it can teach important concepts like keeping the output in the right order.

But hey, at least o1 has the correct output... It's just that that's not all that matters.

I stand by this: evaluating code based on output alone is akin to evaluating a mathematical proof based on the result. And I hope these examples make the point why that matters, why checking output is insufficient.

[0] https://codegolf.stackexchange.com/questions/215216/high-thr...

Edit: I want to add that there's also an important factor here. The LLM might get you a "result" faster, but you are much more likely to miss the learning process that comes with struggling. Because that makes you much faster (and more flexible) not just next time but in many situations where even a subset is similar. Which yeah, totally fine to glue shit together when you don't care and just need something, but there's a lot of missed value if you need to revisit any of that. I do have concerns that people will be plateaued at junior levels. I hope it doesn't cause seniors to revert to juniors, which I've seen happen without LLMs. If you stop working on these types of problems, you lose the skills. There's already an issue where we rush to get output and it has clear effects on the stagnation of devs. We have far more programmers than ever but I'm not confident we have a significant number more wizards (the percentage of wizards is decreasing). There's fewer people writing programs just for fun. But "for fun" is one of our greatest learning tools as humans. Play is a common trait you see in animals and it exists for a reason.

53. ionwake ◴[] No.42939091{3}[source]
this is so far removed from anything I have ever heard or experienced. But I know not everyone is the same and it is refreshing to view this comment.
54. RugnirViking ◴[] No.42939231{3}[source]
ya know what, after a couple times hearing this comment, I downloaded it literally yesterday. It does feel pretty different, at least the composer module and stuff. A bit improvement in ai tooling imo
55. AutistiCoder ◴[] No.42939246[source]
ChatGPT itself is great for coding.

GitHub Copilot is...not. It doesn't seem to understand how to help me as well as ChatGPT does.

56. nick238 ◴[] No.42939410[source]
Someone else (future you being a distinct person) will also need to grok what's going on when they maintain the code later. By living purely in a high-dimensional trans-enlightenment state and coding that way, means you may as well be building a half-assed organic neural network to do your task, rather than something better "designed".

Neural networks and evolved structures and pathways (e.g. humans make do with ~20k genes and about that many more in regulatory sequences) are absolutely more efficient, but good luck debugging them.

57. troupo ◴[] No.42939425{3}[source]
> It has a great feature that grabs context from the codebase, I use it all the time.

If only this feature worked consistently, or reliably even half of the time.

It will casually forget or ignore any and all context and any and all files in your codebase at random times, and you never know what set of files and docs it's working with at any point in time

58. rubymamis ◴[] No.42939632{3}[source]
Sounds awesome. I would love to hear more about this. Any chance you can share this or at least more details?
59. TeMPOraL ◴[] No.42939668{3}[source]
> For any problem there are infinite number of wrong solutions, but only a handful of ones that yield success, why waste time trying all the wrong ones?

Devil's advocate: because unless you're working in heavily dysfunctional organization, or are doing a live coding interview, you're not playing "guess the password" with your management. Most of the time, they have even less of a clue about how the right solution looks like! "Building the wrong thing" lets them diff something concrete against what they imagined and felt like it would be, forcing them to clarify their expectations and give you more accurate directions (which, being a diff against a concrete things, are less likely to be then misunderstood by you!). And, the faster you can build that wrong thing, the less money and time is burned to buy that extra clarity.

replies(3): >>42943092 #>>42943114 #>>43068049 #
60. freehorse ◴[] No.42939986{3}[source]
Pair programming is imo great when there is some sort of complementarity between the programmers. It may or may not accelerate output, but it can definitely accelerate learning which is often harder. But as you say, this is not what working with llms is about.
61. mirkodrummer ◴[] No.42940083{3}[source]
Whenever I read huge productivity boost for developers or companies I shiver. Software sucked more and more even before LLMs, I don't see it getting better just getting out faster maybe. I'm afraid in most cases it will be a disaster
62. knes ◴[] No.42940381[source]
IMHO, I would agree with you.

I think chat is a nice intermediary evolution between the CLI (that we use every day) and whatever comes next.

I work at Augment (https://augmentcode.com), which, surprise surprise, is an AI coding assistant. We think about the new modality required to interact with code and AI on a daily basis.

Beside increase productivity (and happiness, as you don't have to do mundane tasks like tests, documentations, etc), I personally believe that what AI can open up is actually more of a way for non-coders (think PMs) to interact with a codebase. AI is really good at converting specs, user stories, and so on into tasks—which today still need to be implemented by software engineers (with the help of AI for the more tedious work). Think of what Figma did between designers and developers, but applied to coding.

What’s the actual "new UI/UX paradigm"? I don’t know yet. But like with Figma, I believe there’s a happy ending waiting for everyone.

63. svachalek ◴[] No.42940432{3}[source]
I think Claude is incredible on JS/TS coding while GPT is highly python focused.
64. godelski ◴[] No.42940612{3}[source]

  > Building the wrong thing fast is about as useful as not building anything at all.
SAY IT LOUDER

Fully agree. Plus, you may be faster in the short term but you won't in the long run. The effects of both good code and bad code compound. "Tech debt" is just a fancy term for "compounding shit". And it is true, all code is shit, but it isn't binary; there is a big difference between being stepping in shit and being waist deep in shit.

I can predict some of the responses

  Premature optimization is the root of all evil
There's a grave misunderstanding in this adage[0], and I think many interpret it as "don't worry about efficiency, worry about output." But the context is that you shouldn't optimize without first profiling the code, not that you shouldn't optimize![1] I find it also funny revisiting this quote, because it seems like it is written by a stranger in a strange land, where programmers are overly concerned with optimizing their code. These days, I hear very little about optimization (except when I work with HPC people) except when people are saying to not optimize. Explains why everything is so sluggish...

[0] https://softwareengineering.stackexchange.com/a/80092

[1] Understanding the limitations of big O analysis really helps in understanding why this point matters. Usually when n is small, you can have worse big O and still be faster. But the constants we drop off often aren't a rounding error. https://csweb.wooster.edu/dbyrnes/cs200/htmlNotes/qsort3.htm

65. godelski ◴[] No.42940634{3}[source]
I have a response to a sibling comment showing where GPT 4o and o1-preview do not yield good results.

  > assuming you direct it well.
But hey, I admit I might not be good at this. But honestly, I've found greater value in my time reading the docs than spending trying to prompt engineer my way through. And I've given a fair amount of time to trying to get good at prompting. I just can't get it to work.

I do think that when I'm coding with an LLM it _feels_ faster, but when I've timed myself, it doesn't seem that way. It just seems to be less effort (I don't mind the effort, especially because the compounding rewards).

66. zahlman ◴[] No.42941403[source]
LoC per hour seems to me like a terrible metric.
replies(1): >>42942607 #
67. zahlman ◴[] No.42941412[source]
>In my tears of working

Sometimes typos are eerily appropriate ;)

(I almost typed "errily"...)

replies(1): >>42944970 #
68. skydhash ◴[] No.42941712{4}[source]
As I work, I pepper the files with TODO comments, then do a quick rgrep to find action items.
69. hinkley ◴[] No.42942166{3}[source]
Efficient, but not always more effective.
70. esafak ◴[] No.42942607[source]
Why? Since you are vetting the code it generates, the rate at which you end up with code you accept seems like a good measure of productivity.
replies(1): >>42945471 #
71. esafak ◴[] No.42942633{4}[source]
OP means 4o
72. bandushrew ◴[] No.42942698[source]
Producing 200 lines of usable code an hour is genuinely impressive.

My experiments have been nowhere near that successful.

I would love, love, love to see a transcript of how that process worked over an hour, if that was something you were willing to share.

73. taneq ◴[] No.42942735{3}[source]
> As much as it may fake it, AI isn't learning anything while you're chatting to it.

What's your definition of 'learn'? An LLM absolutely does extract and store information from its context. Sure, it's only short term memory and it's gone the next session, but within the session it's still learning.

I like your suggestion to update your original prompt instead of continuing the conversation.

replies(1): >>43069485 #
74. protocolture ◴[] No.42942765[source]
100%.

I do all this + rubber ducky the hell out of it.

Sometimes I just discuss concepts of the project with the thing and it helps me think.

I dont think chat is going to be right for everyone but it absolutely works for me.

75. bcoates ◴[] No.42943013{4}[source]
Agreed, particularly on code reviews: the only useful code reviews I've had were either in an outright trainee/expert relationship or when the reviewer is very experienced in the gotchas of the project being modified and the reviewer is new.

Peer and near-peer reviews have always wound up being nitpicking or perfunctory.

An alternative that might work if you want two hands on every change for process reasons is to have the reviewer do something closer to formal QA, building and running the changed code to verify it has the expected behavior. That has a lot of limitations too, but it least it doesn’t degrade to bikeshedding about variable name aesthetics.

76. cjonas ◴[] No.42943038[source]
I find its exactly the opposite. With AI chat, I can define signatures, write technical requirements and validate my approach in minutes. I'm not talking with the AI like I would a human... I'm writing a blend of stubs and concise requirements, providing documentation, reviewing, validating and repeating. When it goes in the wrong direction, I add additional details and regenerate from scratch. I focus on small, composable chunks of functionality and then tie it all together at the end.
77. bcoates ◴[] No.42943092{4}[source]
It's just incredibly inefficient if there's any other alternative.

Doing 4 sprints over 2 months to make a prototype in order to save 3 60 minute meetings over a week where you do a few requirements analysis/proposal review cycles.

replies(2): >>42946073 #>>42946781 #
78. godelski ◴[] No.42943114{4}[source]
I don't think you're disagreeing, in fact, I think you're agreeing. Ironically with the fact of either you or I being wrong demonstrating the difficulty of chat based UI communication. I believe yarekt would be in agreement with me that

  > you can't even know the answer until you're part way in.
Which it seems you do too. But for clarity, there's a big difference between building the /wrong/ thing and /not the right thing/. The underlying point of my comment is that not only is communication difficult, but the overall goals are ambiguous. That a lot of time should be dedicated to getting this right. Yes, that involves finding out what things are wrong and that is the sentiment behind the original meaning of "fail fast" but I think that term has come to mean something a bit different now. Moreover, I believe that there's just people not looking at details.

It is really hard to figure out what the right thing is. We humans don't do this just through chat. We experiment, discuss, argue, draw, and there's tons of inference and reliance upon shared understandings. There's a lot of associated context. You're right that a dysfunctional organization (not uncommon) is worse, but these things are still quite common in highly functioning organizations. Your explicit statement about management having even less of an idea of what the right solution is, is explicitly what we're pushing back against. Saying that that is a large part of a developer's job. I would argue that the main reason we have a better idea is due to our technical skills, our depth of knowledge, our experience. A compression machine (LLM) will get some of this, but there's a big gap when trying to get to the end point. Pareto is a bitch. We all know there is a huge difference between a demonstrating prototype and an actual product. That the amount of effort and resources are exponentially different. ML systems specifically struggle with detail and nuance, but this is the root of those resource differences.

I'll give an example for clarity. Considering the iPad, the existence of third party note taking apps can be interpreted of nothing short of Apple's failure. I mean for the love of god, you got the pencil and you can't pull up notes and interact with it like it is a piece of paper? It's how the damned thing is advertised! A third party note taking app should be interpreted by Apple as showing their weak points. But you can't even zoom on the notes app?! Sure, you can turn on the accessibility setting and zoom with triple tap (significantly diverging from the standard pinching gesture used literally everywhere else) but if you do this (assuming full screen) you are just zooming in on the portion of the actual screen and not zooming in the notes. So you get stupid results like not having access to your pen's settings. Which is extra important here given that the likely reason someone would zoom is to adjust details and certainly you're going to want to adjust the eraser size. What I'm trying to say is that there's a lot of low hanging fruit here that should be incredibly obvious were you to actually use the application, dog-fooding. Instead, Apple is dedicating time into hand writing recognition and equation solving, which in practice (at least in my experience) end up creating a more jarring experience and cause more editing. Though it is cool when it does work. I'd say that here, Apple is not building the right thing. They are completely out of touch with the actual goals and experiences of the users. It's not that they didn't build a perfect app, it is that they fail to build basic functionality.

But of course, Apple probably doesn't care. Because they significantly prioritize profits over building a quality product. These are orthogonal aspects and they can be simultaneously optimized. One should not need pick one over another and the truth is that our economics should ensure alignment, that quality begets profits and that one can't "hack" the metrics.

Apple is far from alone here though. I'd say this "low hanging infuriating bullshit" is actually quite common. In fact, I think it is becoming more common. I have argued here before about the need for more "grumpy developers." I think if you're not grumpy, you should be concerned. Our job is to find problems, break them down into a way that can be addressed, and to resolve them. The "grumpiness" here is a dissatisfaction with the state of things. Given that nothing is perfect, there should always be reason to be "grumpy." A good developer should be able to identify and fix problems without being asked. But I do think there's a worrying decline of (lack of!) "grumpy" types, and I have no doubt this is connected to the rapid rise of vaporware and shitty products.

Also, I notice you play Devil's advocate a lot. While I think it can be useful, I think it can be overused. It needs to drive home the key limitations to an argument, especially when they are uncomfortable. Though I think in our case, I'm the one making the argument that diverges from the norm.

79. skue ◴[] No.42944074{3}[source]
> I'm guessing you don't do it often (understandable if its not working for you). For me pair programming accelerates development to much more than 2x.

The value of pair programming is inversely proportional to the expertise of the participant. Junior devs who pair with senior devs get a lot out of it, senior devs not so much.

GP is probably a more experienced dev, whereas you are the type of dev who says things like “I’m guessing that you…”.

replies(2): >>42948827 #>>42950219 #
80. hmcdona1 ◴[] No.42944645[source]
This going to sound out of left field, but I would venture to guess you have very high spatial reasoning skills. I operate much this same way and only recently connected these dots that that skill might be what my brain leans on so heavily while programming and debugging.

Pair programming is endlessly frustrating beyond just rubber duckying because I’m having to exit my mental model, communicate it to someone else, and then translate and relate their inputs back into my mental model which is not exactly rooted in language in my head.

81. javier2 ◴[] No.42944970{3}[source]
I’ll leave it!
82. 59nadir ◴[] No.42945471{3}[source]
1000 lines of perfectly inoffensive and hard to argue against code that you don't need because it's not the right solution is negative velocity. Granted, I don't think that's much worse with LLMs but I do think it's going to be a growing problem caused by the cost of creating useless taxonomies and abstractions going down.

That is to say: I think LLMs are going to make a problem we already had (much) worse.

83. Tempest1981 ◴[] No.42945860{4}[source]
You could try setting some quiet hours. Or headphones.

Maybe collaborate the first hour each morning, then the first hour after lunch.

replies(1): >>42946723 #
84. TeMPOraL ◴[] No.42946073{5}[source]
Yeah, that would be stupid. I was thinking one order of magnitude less in terms of effort. If you can make a prototype in a day, it might deliver way more value than 3x 60 minute meetings. If you can make it in a week, where the proper implementation would take more than a month, that could still be a huge win.

I see this not as opposed, but as part of requirements analysis/review - working in the abstract, with imagination and prose and diagrams, it's too easy to make invalid assumptions without anyone realizing it.

85. ◴[] No.42946138[source]
86. Syzygies ◴[] No.42946146[source]
An environment such as Cursor supports many approaches for working with AI. "Chat" would be the instructions printed on the bottom, perhaps how their developers use it, but far from the only mode it actually supports.

It is helpful to frame this in the historical arc described by Yuval Harari in his recent book "Nexus" on the evolution of information systems. We're at the dawn of history for how to work with AI, and actively visualizing the future has an immediate ROI.

"Chat" is cave man oral tradition. It is like attempting a complex Ruby project through the periscope of an `irb` session. One needs to use an IDE to manage a complex code base. We all know this, but we haven't connected the dots that we need to approach prompt management the same way.

Flip ahead in Harari's book, and he describes rabbis writing texts on how to interpret [texts on how to interpret]* holy scriptures. Like Christopher Nolan's movie "Inception" (his second most relevant work after "Memento"), I've found myself several dreams deep collaborating with AI to develop prompts for [collaborating with AI to develop prompts for]* writing code together. Test the whole setup on multiple fresh AI sessions, as if one is running a business school laboratory on managerial genius, till AI can write correct code in one shot.

Duh? Good managers already understand this, working with teams of people. Technical climbers work cliffs this way. And AI was a blithering idiot until we understood how to simulate recursion in multilayer neural nets.

AI is a Rorschach inkblot test. Talk to it like a kindergartner, and you see the intelligence of a kindergartner. Use your most talented programmer to collaborate with you in preparing precise and complete specifications for your team, and you see a talented team of mature professionals.

We all experience degradation of long AI sessions. This is not inevitable; "life extension" needs to be tackled as a research problem. Just as old people get senile, AI fumbles its own context management over time. Civilization has advanced by developing technologies for passing knowledge forward. We need to engineer similar technologies for providing persistent memory to make each successive AI session smarter than the last. Authoring this knowledge helps each session to survive longer. If we fail to see this, we're condemning ourselves to stay cave men.

Compare the history of computing. There was a lot of philosophy and abstract mathematics about the potential for mechanical computation, but our worldview exploded when we could actually plug the machines in. We're at the same inflection point for theories of mind, semantic compression, structured memory. Indeed, philosophy was an untestable intellectual exercise before; now we can plug it in.

How do I know this? I'm just an old mathematician, in my first month trying to learn AI for one final burst of productivity before my father's dementia arrives. I don't have time to wait for anyone's version of these visions, so I computed them.

In mathematics, the line in the sand between theory and computation keeps moving. Indeed, I helped move it by computerizing my field when I was young. Mathematicians still contribute theory, and the computations help.

A similar line in the sand is moving, between visionary creativity and computation. LLMs are association engines of staggering scope, and what some call "hallucinations" can be harnessed to generalize from all human endeavors to project future best practices. Like how to best work with AI.

I've tested everything I say here, and it works.

87. knighthack ◴[] No.42946579[source]
I mostly agree with you, but I have to point out something to the contrary of this part you said: "...The minute you have to discuss those things with someone else, your bandwidth decreases by orders of magnitude and now you have to put words to these things and describe them, and physically type them in or vocalize them."

Subvocalization/explicit vocalization of what you're doing actually improves your understanding of the code. Doing so may 'decrease bandwith', but improves comprehension, because it's basically inline rubber duck debugging.

It's actually easy to write code which you don't understand and cannot explain what it's doing, whether at the syntax, logic or application level. I think the analogue is to writing well; anyone can write streams of consciousness amounting to word salad garbage. But a good writer can cut things down and explain why every single thing was chosen, right down to the punctuations. This feature of writing should be even more apparent with code.

I've coded tons of things where I can get the code working in a mediocre fashion, and yet find great difficulty in try to verbally explain what I'm doing.

In contrast there's been code where I've been able to explain each step of what I'm doing before I even write anything; in those situations what generally comes out tends to be superior maintainable code, and readable too.

88. andreasmetsala ◴[] No.42946723{5}[source]
I think you missed the point. AI chat is not compatible with the solitary focused programming session.
89. andreasmetsala ◴[] No.42946781{5}[source]
> Doing 4 sprints over 2 months to make a prototype

That’s a lot of effort for a prototype that you should be throwing away even if it does the right thing!

Are you sure you’re not gold plating your prototypes?

90. alickz ◴[] No.42946796[source]
in my experience, if you can't explain something to someone else then you don't fully understand it

our brains like to jump over inconsistencies or small gaps in our logic when working by themselves, but try to explain that same concept to someone else and those inconsistencies and gaps become glaringly obvious (doubly so if the other person starts asking questions you never considered)

it's why pair programming and rubber duck debugging work at all, at least in my opinion

replies(1): >>42946852 #
91. frizlab ◴[] No.42946852{3}[source]
Or maybe your in the process of building it and that’s why you cannot understand it: it does not exist yet.
replies(2): >>42948423 #>>42982070 #
92. renegat0x0 ◴[] No.42947001[source]
One thing I would keep in mind. There are some parts of the project that you really cannot fill by chat output.

I had crucial area with threads. Code generated by chat seemed to be ok, but had one flaw. My initial code written manually was bug free. chat-generated output was not. It was difficult to catch it via inspection.

93. viraptor ◴[] No.42947003{3}[source]
> but you won't reach that shared understanding until you start trying to implement something.

That's very much not my experience. Pairing on design and diagrams is as or more useful than on the code itself. Once you have a good design, the code is pretty simple.

94. slig ◴[] No.42947006{4}[source]
I'm very interested in your prompt and could you be so kind to paste it somewhere and link in your comment, please?
95. alickz ◴[] No.42948423{4}[source]
should you build something you don't understand?

it would seem to me that would cause a lot of issues

replies(1): >>42950849 #
96. balp ◴[] No.42948827{4}[source]
As a senior dev, when pairing with junious I get a more skilled team. Then I can continue to give new teams the skill and we all grow as people and companies.
97. kybernetikos ◴[] No.42950219{4}[source]
I don't agree with this at all. Pairing where there's a big skill gap isn't proper pairing, it's more like mentoring or hands on training.

In pair programming as I learned it and as I have occasionally experienced it, two individuals challenge each other to be their best selves while also handing off tasks that break flow so that the pair as a whole is in constant flow. When it works this is a fantastic, productive, intense experience. I would agree that it is more than 2x as productive. I don't believe it's possible to achieve this state at all with a mismatched pair.

If this is the experience people have in mind, then it's not surprising that they think that those who think it's only for training juniors haven't actually tried it very much.

replies(1): >>43068220 #
98. frizlab ◴[] No.42950849{5}[source]
So you should never build anything new? We should not use AI either, nobody truly understands how it works currently, it should not even have been built!
99. frizlab ◴[] No.42982070{4}[source]
*you are in the process, sorry
100. justneedaname ◴[] No.43012127{4}[source]
Also interested to see this
101. yarekt ◴[] No.43068049{4}[source]
Sure, but you’re always going to get better results by actively looking for the good solution rather than building something and hoping it’s right. (building a prototype is one tool in your toolbox)

We as developers are tasked with figuring out what the problem is, especially in cases where the client is wrong about their assumptions

102. yarekt ◴[] No.43068220{5}[source]
Exactly this. And I did mean equal skill when i said “more than 2x” implying that you get more done together than if you were separate.

One interesting thing is skill levels aren’t really comparable or equatable. I find pairing is productive where skills only partially overlap, meaning the authority on parts of implementation flows between participants, depending on the area you’re in.

I have some examples where I recently paired with a colleague for about 2-3 weeks nearly every day. i’m confident that what took us a month would be near impossible just working on our own

103. yarekt ◴[] No.43069485{4}[source]
When working together with a colleague, after some weeks each of you understand more how the other works, what they need to be productive, how to communicate better and achieve good results. Same with the domain context you’re working on, as your mental model grows more detailed, your performance goes up.

with LLMs that leaning is only context, and in case of “chat ai” is the chat backlog. It’s easy for the LLM to get stuck, and there aren’t many tools yet that help you change and morph that context to be more like shared understanding between two colleagues.

There is research going on in this area, eg “chain of thought”, so we’ll see, maybe things will get better