Chat is a bad UI pattern for development tools

1. themanmaran ◴[04 Feb 25 17:19 UTC] No.42935503[source]▶

I'm surprised that the article (and comments) haven't mentioned Cursor.

Agreed that copy pasting context in and out of ChatGPT isn't the fastest workflow. But Cursor has been a major speed up in the way I write code. And it's primarily through a chat interface, but with a few QOL hacks that make it way faster:

1. Output gets applied to your file in a git-diff style. So you can approve/deny changes.

2. It (kinda) has context of your codebase so you don't have to specify as much. Though it works best when you explicitly tag files ("Use the utils from @src/utils/currency.ts")

3. Directly inserting terminal logs or type errors into the chat interface is incredibly convenient. Just hover over the error and click the "add to chat"

replies(8): >>42935579 #>>42935604 #>>42935621 #>>42935766 #>>42935845 #>>42937616 #>>42938713 #>>42939579 #

2. dartos ◴[04 Feb 25 17:24 UTC] No.42935579[source]▶

>>42935503 (TP) #

I think the wildly different experiences we all seem to have with AI code tools speaks to the inconsistency of the tools and our own lack of understanding of what goes into programming.

I’ve only been slowed down with AI tools. I tried for a few months to really use them and they made the easy tasks hard and the hard tasks opaque.

But obviously some people find them helpful.

Makes me wonder if programming approaches differ wildly from developer to developer.

For me, if I have an automated tool writing code, it’s bc I don’t want to think about that code at all.

But since LLMs don’t really act deterministically, I feel the need to double check their output.

That’s very painful for me. At that point I’d rather just write the code once, correctly.

replies(3): >>42935622 #>>42936378 #>>42936638 #

3. lolinder ◴[04 Feb 25 17:26 UTC] No.42935604[source]▶

>>42935503 (TP) #

I like Cursor, but I find the chat to be less useful than the super advanced auto complete.

The chat interface is... fine. Certainly better integrated into the editor than GitHub Copilot's, but I've never really seen the need to use it as chat—I ask for a change and then it makes the change. Then I fixed what it did wrong and ask for another change. The chat history aspect is meaningless and usually counterproductive, because it's faster for me to fix its mistakes than keep everything in the chat window while prodding it the last 20% of the way.

replies(3): >>42935837 #>>42935991 #>>42938078 #

4. mholm ◴[04 Feb 25 17:27 UTC] No.42935621[source]▶

>>42935503 (TP) #

Yeah, the OP has a great idea, but models as-is can't handle that kind of workflow reliably. The article is both a year behind, and a year ahead at the same time. The user must iterate with the chatbot, and you can't do that by just doing a top down 'here's a list of all features, get going, ping me when finished' prompt. AI is a junior engineer, so you have to treat it like a junior engineer, and that means looking through your chat logs, and perhaps backing up to a restore point and going a different direction.

replies(1): >>42935833 #

5. aprilthird2021 ◴[04 Feb 25 17:27 UTC] No.42935622[source]▶

>>42935579 #

I think it's about what you're working on. It's great for greenfield projects, etc. Terrible for complex projects that plug into a lot of other complex projects (like most of the software those of us not at startups work on day to day)

replies(1): >>42935644 #

6. dartos ◴[04 Feb 25 17:28 UTC] No.42935644{3}[source]▶

>>42935622 #

It’s been a headache for my greenfield side projects and for my day to day work.

Leaning on these tools just isn’t for me rn.

I like them most for one off scripts or very small bash glue.

7. patrickaljord ◴[04 Feb 25 17:36 UTC] No.42935766[source]▶

>>42935503 (TP) #

Instead of Cursor I would recommend two open source alternatives that you can combine: https://www.continue.dev/ and https://github.com/cline/cline

replies(2): >>42935869 #>>42939810 #

8. mttrms ◴[04 Feb 25 17:40 UTC] No.42935833[source]▶

>>42935621 #

I've started using Zed on a side project and I really appreciate that you can easily manipulate the chat / context and continue making requests

https://zed.dev/docs/assistant/assistant-panel#editing-a-con...

It's still a "chat" but it's just text at the end of the day. So you can edit as you see fit to refine your context and get better responses.

9. themanmaran ◴[04 Feb 25 17:41 UTC] No.42935837[source]▶

>>42935604 #

Agreed the autocomplete definitely gets more milage than the chat. But I frequently use it for terminal commands as well. Especially AWS cli work.

"how do I check the cors bucket policies on [S3 bucket name]"

10. stitched2gethr ◴[04 Feb 25 17:41 UTC] No.42935845[source]▶

>>42935503 (TP) #

I think this misses the point. It seems like the author is saying we should move from imperative instructions to a declarative document that describes what the software should do.

Imperative: - write a HTTP server that serves jokes - add a healthcheck endpoint - add TLS and change the serving port to 443

Declarative: - a HTTP server that serves jokes - contains a healthcheck endpoint - supports TLS on port 443

The differences here seem minimal because you can see all of it at once, but in the current chat paradigm you'd have to search through everything you've said to the bot to get the full context, including the side roads that never materialized.

In the document approach you're constantly refining the document. It's better than reviewing the code because (in theory) you're looking at "support TLS on port 443" instead of a lot of code, which means it can be used by a wider audience. And ideally I can give the same high level spec to multiple LLMs and see which makes the best application.

replies(2): >>42936112 #>>42938101 #

11. freeone3000 ◴[04 Feb 25 17:43 UTC] No.42935869[source]▶

>>42935766 #

It’s not nearly as slick. cursor’s indexing and integration are significant value-adds.

12. fragmede ◴[04 Feb 25 17:51 UTC] No.42935991[source]▶

>>42935604 #

> while prodding it the last 20% of the way.

hint: you don't get paid to get the LLM to output perfect code, you get paid by PRs submitted and landed. Generate the first 80% or whatever with the LLM, and then finish the last 20% that you can write faster than the LLM yourself, by hand.

replies(2): >>42936325 #>>42937373 #

13. ygouzerh ◴[04 Feb 25 17:58 UTC] No.42936112[source]▶

>>42935845 #

Good explanation! As an open-reflexion: will a declarative document be as detailed as the imperative version? Often between the specs that the product team is providing (that we can consider as the "descriptive" document) and the implementation, many sub specs have been created by the tech team that uncovered some important implementation details. It's like a Rabbit Hole.

For example, for a signup page, we could have: - Declarative: Signup the user using their email address - Imperative: To do the same, we will need to implement the smtp library, which means discovering that we need an SMTP server, so now we need to choose which one. And when purchasing an SMTP Server plan, we discover that there are rate limit, so now we need to add some bot protection to our signup page (IP Rate Limit only? ReCaptcha? Cloudflare bot protection?), etc

Which means that at the end, the imperative code way is kind of like the ultimate implementation specs.

replies(1): >>42943644 #

14. jeremyjh ◴[04 Feb 25 18:11 UTC] No.42936325{3}[source]▶

>>42935991 #

That is exactly what GP was pointing out, and why they said they do not prod it for it the last 20%.

15. sangnoir ◴[04 Feb 25 18:16 UTC] No.42936378[source]▶

>>42935579 #

> But since LLMs don’t really act deterministically, I feel the need to double check their output.

I feel the same

> That’s very painful for me. At that point I’d rather just write the code once, correctly.

I use AI tools augmentatively, and it's not painful for me, perhaps slightly inconvenient. But for boiler-plate-heavy code like unit tests or easily verifiable refactors[1], adjusting AI-authored code on a per-commit basis is still faster than me writing all the code.

1. Like switching between unit-test frameworks

16. kenjackson ◴[04 Feb 25 18:33 UTC] No.42936638[source]▶

>>42935579 #

I use LLMs several times a day, and I think for me the issue is that verification is typically much faster than learning/writing. For example, I've never spent much time getting good at scripting. Sure, probably a gap I should resolve, but I feel like LLMs do a great job at it. And what I need to script is typically easy to verify, I don't need to spend time learning how to do things like, "move the files of this extension to this folder, but rewrite them so that the name begins with a three digit number based on the date when it was created, with the oldest starting with 001" -- or stuff like that. Sometimes it'll have a little bug, but one that I can debug quickly.

Scripting assistance by itself is worth the price of admission.

The other thing I've found it good at is giving me an English description of code I didn't write... I'm sure it sometimes hallucinates, but never in a way that has been so wrong that its been apparent to me.

replies(2): >>42937921 #>>42938049 #

17. reustle ◴[04 Feb 25 19:29 UTC] No.42937373{3}[source]▶

>>42935991 #

Depends on the company. Most of the time, you get paid to add features and fix bugs, while maintaining reliability.

End users don’t care where the code came from.

18. notShabu ◴[04 Feb 25 19:49 UTC] No.42937616[source]▶

>>42935503 (TP) #

chat is the best way to orchestrate and delegate. whether or not this is considered "ME writing MY code" is imo a philosophical debate

e.g. executives treat the org as a blackbox LLM and chat w it to get real results

19. shaan7 ◴[04 Feb 25 20:11 UTC] No.42937921{3}[source]▶

>>42936638 #

I think you and the parent comment are onto something. I also feel like the parent since I find it relatively difficult to read code that someone else wrote. My brain easily gets biased into thinking that the cases that the code is covering are the only possible ones. On the flip side, if I were writing the code, I am more likely to determine the corner cases. In other words, writing code helps me think, reading just biases me. This makes it extremely slow to review a LLM's code at which point I'd just write it myself.

Very good for throwaway code though, for example a PoC which won't really be going to production (hopefully xD).

replies(1): >>42963488 #

20. skydhash ◴[04 Feb 25 20:20 UTC] No.42938049{3}[source]▶

>>42936638 #

Your script example is a good one, but the nice thing about scripting is when you learn the semantic of it. Like the general pattern of find -> filter/transform -> select -> action. It’s very easy to come up with a one liner that can be trivially modified to adapt it to another context. More often than not, I find LLMs generate overly complicated scripts.

replies(1): >>42939137 #

21. tarsinge ◴[04 Feb 25 20:24 UTC] No.42938078[source]▶

>>42935604 #

I was a very skeptic on AI assisted coding until I tried Cursor and experienced the super autocomplete. It is ridiculously productive. For me it’s to the point it makes Vim obsolete because pressing tab correctly finishes the line or code block 90% of the time. Every developer having an opinion on AI assistance should have just tried to download Cursor and start editing a file.

22. skydhash ◴[04 Feb 25 20:26 UTC] No.42938101[source]▶

>>42935845 #

The issue is that there’s no execution platform for declarative specs, so something will be translated to imperative and that is where the issue lies. There’s always an imperative core which needs to be deterministic or it’s out needs to be verified. LLMs are not the former and the latter option can take more time than just writing the code.

23. mkozlows ◴[04 Feb 25 21:07 UTC] No.42938713[source]▶

>>42935503 (TP) #

Windsurf is even moreso this way -- it'll look through your codebase trying to find the right files to inspect, it runs the build/test stuff and examines the output to see what went wrong.

I found interacting with it via chat to be super-useful and a great way to get stuff done. Yeah, sometimes you just have to drop into the code, and tag a particular line and say "this isn't going to work, rewrite it to do x" (or rewrite it yourself), but the ability to do that doesn't vitiate the value of the chat.

24. lukeschlather ◴[04 Feb 25 21:35 UTC] No.42939137{4}[source]▶

>>42938049 #

It's astounding how often I ask an LLM to generate some thing, do a little more research, come back and I'm ready to use the code it generated and I realize, no, it's selected the wrong flags entirely.

Although most recently I caught it because I fed it into both gpt-4o and o1 and o1 had the correct flags. Then I asked 4o to expand the flags from the short form to the long form and explain them so I could double-check my reasoning as to why o1 was correct.

replies(1): >>42991393 #

25. koito17 ◴[04 Feb 25 22:04 UTC] No.42939579[source]▶

>>42935503 (TP) #

I'm not familiar with Cursor, but I've been using Zed with Claude 3.5 Sonnet. For side projects, I have found it extremely useful to provide the entire codebase as context and send concise prompts focusing on a single requirement. Claude handles "junior developer" tasks well when each unit of work is clearly separated.

Zed makes it trivial to attach documentation and terminal output as context. To reduce risk of hallucination, I now prefer working in static, strongly-typed languages and use libraries with detailed documentation, so that I can send documentation of the library alongside the codebase and prompt. This sounds like a lot of work, but all I do is type "/f" or "/t" in Zed. When I know a task only modifies a single file, then I use the "inline assist" feature and review the diffs generated by the LLM.

Additionally, I have found it extremely useful to actually comment a codebase. LLMs are good at unstructured human language, it's what they were originally designed for. You can use them to maintain comments across a codebase, which in turn helps LLMs since they get to see code and design together.

Last weekend, I was able to re-build a mobile app I made a year ago from scratch with a cleaner code base, better UI, and implement new features on top (making the rewrite worth my time). The app in question took me about a week to write by hand last year; the rewrite took exactly 2 days.

---

As a side note: a huge advantage of Zed with locally-hosted models is that one can correct the code emitted by the model and force the model to re-generate its prior response with those corrections. This is probably the "killer feature" of models like qwen2.5-coder:32b. Rather than sending extra prompts and bloating the context, one can just delete all output from where the first mistake was made, correct the mistake, then resume generation.

26. coder543 ◴[04 Feb 25 22:20 UTC] No.42939810[source]▶

>>42935766 #

I used Continue before Cursor. Cursor’s “agent” composer mode is so much better than what Continue offered. The agent can automatically grep the codebase for relevant files and then read them. It can create entirely new files from scratch. I can still manually provide some files as context, but it’s not usually necessary. With Continue, everything was very manual.

Cursor also does a great job of showing inline diffs of what composer is doing, so you can quickly review every change.

I don’t think there’s any reason Continue can’t match these features, but it hadn’t, last I checked.

Cursor also focuses on sane defaults, which is nice. The tab completion model is very good, and the composer model defaults to Claude 3.5 Sonnet, which is arguably the best non-reasoning code model. (One would hope that Cursor gets agent-composer working with reasoning models soon.) Continue felt much more technical… which is nice for power users, but not always the best starting place.

27. bze12 ◴[05 Feb 25 04:06 UTC] No.42943644{3}[source]▶

>>42936112 #

I could imagine a hybrid where declarative statements drive the high-level, and lower-level details branch off and are hashed out imperatively (in chat). Maybe those detail decisions then revise the declarative statements.

The source of truth would still be the code though, otherwise the declarative statements would get so verbose that they wouldn't be any more useful than writing the code itself.

28. dartos ◴[06 Feb 25 15:47 UTC] No.42963488{4}[source]▶

>>42937921 #

Yes! It’s the same for me.

Maybe it’s bc I’ve been programming since I was young or because I mainly learned by doing code-along books, but writing the code is where my thinking gets done.

I don’t usually plan, then write code. I write code, understand the problem space, then write better code.

I’ve known friends and coworkers who liked to plan out a change in psudocode or some notes before getting into coding.

Maybe these different approaches benefit from AI differently.

29. dartos ◴[09 Feb 25 15:52 UTC] No.42991393{5}[source]▶

>>42939137 #

At that point, wouldn’t the man pages be better than asking 4o?

It was already wrong once.