The idea here is: if you have a substantive point, make it thoughtfully. If not, please don't comment until you do.
I really like a lot of what Google produces, but they can't seem to keep a product that they don't shut down and they can be pretty ham-fisted, both with corporate control (Chrome and corrupt practices) and censorship
Is this why HN is so dang pro-AI? the negative comments, even small ones, are moderated away? explains a lot TBH
But realistically lots of RAG systems have LLM calls interleaved for various reasons, so what they probably mean it not doing the usual chunking + embeddings thing.
Either I'm worse than then at programming, to the point that I find an LLM useful and they don't, or they don't know how to use LLMs for coding.
I guess most people are not paying and cant therefore apply the project-space (which is one of the best features), which unleashes its full magic.
Even if I'm currently without a job, I'm still paying because it helps me.
If people are getting faster responses than this regularly, it could account for a large amount of the difference in experiences.
If you use GitHub Copilot - which has its own system level prompts - you can hotswap between models, and Claude outperforms OpenAI’s and Google’s models by such a large margin that the others are functionally useless in comparison.
Despite the persistent memes here and elsewhere, it doesn't depend very much on the particular tool you use (with the exception of model choice), how you hold it, or your experience prompting (beyond a bare minimum of competence). People who jump into any conversation with "use tool X" or "you just don't understand how to prompt" are the noise floor of any conversation about AI-assisted coding. Folks might as well be talking about Santeria.
Even for projects that I initiate with LLM support, I find that the usefulness of the tool declines quickly as the codebase increases in size. The iron law of the context window rules everything.
Edit: one thing I'll add, which I only recently realized exists (perhaps stupidly) is that there is a population of people who are willing to prompt expensive LLMs dozens of times to get a single working output. This approach seems to me to be roughly equivalent to pulling the lever on a slot machine, or blindly copy-pasting from Stack Overflow, and is not what I am talking about. I am talking about the tradeoffs involved in using LLMs as an assistant for human-guided programming.
It’s not consistent, though. I haven’t figured out what they are but it feels like there are circumstances where it’s more prone to doing ugly hacky things.
Error: kill EPERM
at process.kill (node:internal/process/per_thread:226:13)
at Ba2 (file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:19791)
at file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:19664
at Array.forEach (<anonymous>)
at file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:19635
at Array.forEach (<anonymous>)
at Aa2 (file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:19607)
at file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:19538
at ChildProcess.W (file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:20023)
at ChildProcess.emit (node:events:519:28) {
errno: -1,
code: 'EPERM',
syscall: 'kill'
}
I'm guessing one of the scripts it runs kills Node.js processes, and that inadvertantly kills Claude as well. Or maybe it feels bad that it can't solve my problem and commits suicide.In any case, I wish it would stay alive and help me lol.
For those who’ve built coding agents: do you think LLMs are better suited for generating structured config vs. raw code?
My theory is that agents producing valid YAML/JSON schemas could be more reliable than code generation. The output is constrained, easier to validate, and when it breaks, you can actually debug it.
I keep seeing people creating apps with vibe coder tools but then get stuck when they need to modify the generated code.
Curious if others think config-based approaches are more practical for AI-assisted development.
Then add a grader step to your agentic loop that is triggered after the files are modified. Give feedback to the model if there any errors and it will fix them.
A lot of programmers work on maintaining huge monolith codebases, built on top of 10-years old tech using obscure proprietary dependencies. Usually they dont have most of the code to begin with and APIs are often not well documented.
Hype. There's nothing wrong with using, e.g., full-text search for RAG.
With a subscription plan, Anthropic is highly incentivized to be efficient in their loops beyond just making it a better experience for users.
And you start from the stratch all the time so you can generate all the documentation before you ever start to generate code. And when LLM slop become overwhelming you just drop it and go to check next idea.
If I only ever wrote small Python scripts, did small to medium JavaScript front end or full stack websites, or a number of other generic tasks where LLMs are well trained I’d probably have a different opinion.
Drop into one of my non-generic Rust codebases that does something complex and I could spent hours trying to keep the LLM moving in the right direction and away from all of the dead ends and thought loops.
It really depends on what you’re using them for.
That said, there are a lot of commenters who haven’t spent more than a few hours playing with LLMs and see every LLM misstep as confirmation of their preconceived ideas that they’re entirely useless.
It dumps out a JSON file as well as a very nicely formatted HTML file that shows you every single tool and all the prompts that were used for a session.
I think Claude is much more predictable and follows instructions better- the todo list it manages seems very helpful in this respect.
Edit: bonus points if this gets me banned.
Had a similar problems until I saw the advice "Dont say what it shouldn't but focus on what it should".
i.e. make sure when it reaches for the 'thing', it has the alternative in context.
Haven't had those problems since then.
(Though now that I think of it, I might start interrupting people with “SUMMARIZING CONVERSATION HISTORY!” whenever they begin to bore me. Then I can change the subject.)
This is essential to productivity for humans and LLMs alike. The more reliable your edit/test loop, the better your results will be. It doesn't matter if it's compiling code, validating yaml, or anything else.
To your broader question. People have been trying to crack the low-code nut for ages. I don't think it's solvable. Either you make something overly restrictive, or you are inventing a very bad programming language which is doomed to fail because professional coders will never use it.
Might sound crazy but we built full web apps in just yaml.. Been doing this for about 5 years now and it helps us scale to build many web apps, fast, that are easy to maintain. We at Resonancy[1] have found many benefits in doing so. I should write more about this.
[1] - https://resonancy.io
First of all, keep in mind that research has shown that people generally overestimate the productivity gains of LLM coding assistance. Even when using a coding assistant makes them less productive, they feel like they are more productive.
Second, yeah, experience matters, both with programming and LLM coding assistants. The better you are, the less helpful the coding assistant will be, it can take less work to just write what you want than convince an LLM to do it.
Third, some people are more sensitive to the kind of errors or style that LLMs tend to use. I frequently can't stand the output of LLMs, even if it technically works; it doesn't live to to my personal standards.
I’m in the middle of some refactoring/bug fixing/optimization but it’s constantly running into issues, making half baked changes, not able to fix regressions etc. Still trying to figure out how to make do a better job. Might have to break it into smaller chunks or something. Been pretty frustrating couple of weeks.
If anyone has pointers, I’m all ears!!
There are various hacks these tools take to cram more crap into a fixed-size bucket, but it’s still fundamentally different than how a person thinks.
I've noticed the stronger my opinions are about how code should be written or structured, the less productive LLMs feel to me. Then I'm just fighting them at every step to do things "my way."
If I don't really have an opinion about what's going on, LLMs churning out hundreds of lines of mostly-working code is a huge boon. After all, I'd rather not spend the energy thinking through code I don't care about.
Everytime I tried LLMs, I had the feeling of talking with a ignorant trying to sound VERY CLEVER: terrible mistakes at every line, surrounded with punchlines, rocket emojis and tons of bullshit. (I'm partly kidding).
Maybe there are situations where LLMs are useful e.g. if you can properly delimit and isolate your problem; but when you have to write code that is meant to mess up with the internal of some piece of software then it doesn't do well.
It would be nice to know from each part of the "happy users" and "mecontent usere" of LLMs in what context they experimented with it to be more informed on this question.
But the situation is very different if you’re coding slop in the first place (front end stuff, small repo simple code). The LLMs can churn that slop out at a rapid clip.
I don’t think this research is fully baked. I don’t see a story in these results that aligns with my experience and makes me think “yeah, that actually is what I’m doing”. I get that at this point I’m supposed to go “the effect is so subtle that even I don’t notice it!” But experience tells me that’s not normally how this kind of thing works.
Perhaps we’re still figuring out how to describe the positive effects of these tools or what axes we should really be measuring on, but the idea that there’s some sort of placebo effect going on here doesn’t pass muster.
You can see the system prompts too.
It's all how the base model has been trained to break tasks into discrete steps and work through them patiently, with some robustness to failure cases.
That it authored in the first place?
My tactic is to work with Gemini to build a dense summary of the project and create a high level plan of action, then take that to gpt5 and have it try to improve the plan, and convert it to a hyper detailed workflow xml document laying out all the steps to implement the plan, which I then hand to claude.
This avoids pretty much all of Claude's unplanned bumbling.
That repository does not contain the code. It's just used for the issue tracker and some example hooks.
A few takeaways for me from this (1) Long prompts are good - and don't forget basic things like explaining in the prompt what the tool is, how to help the user, etc (2) Tool calling is basic af; you need more context (when to use, when not to use, etc) (3) Using messages as the state of the memory for the system is OK; i've thought about fancy ways (e.g., persisting dataframes, parsing variables between steps, etc, but seems like as context windows grow, messages should be ok)
That's been my experience, anyway. Maybe it hates me? I sure hate it.
[1]: https://github.com/badlogic/lemmy/tree/main/apps/claude-brid...
Actually, no. When LLMs produce good, working code, it also tends to be efficient (in terms of lines, etc).
May vary with language and domain, though.
I've yet had the "forgets everything" to be a limiting factor. In fact, when using Aider, I aggressively ensure it forgets everything several times per session.
To me, it's a feature, not a drawback.
I've certainly had coworkers who I've had to tell "Look, will you forget about X? That use case, while it look similar, is actually quite different in assumptions, etc. Stop invoking your experiences there!"
I know it, because I recently learned jj, with a lot of struggling.
If a human struggles learning it, I wouldn't expect LLMs to be much better.
Do you understand yourself what you just said? File is a way to organize data in memory of a computer by definition. When you write instructions to LLM, they persistently modify your prompts making LLM „remember“ certain stuff like coding conventions or explanations of your architectural choices.
> particularly if I have to do it
You have to communicate with LLM about the code. You either do it persistently (must remember) or contextually (should know only in context of a current session). So word „particularly“ is out of place here. You choose one way or another instead of bring able to just tell that some information is important or unimportant long-term. This communication would happen with humans too. LLMs have different interface for it, more explicit (giving the perception of more effort, when it is in fact the same; and let’s not forget that LLM is able to decide itself on whether to remember something or not).
> and in any case, it consumes context
So what? Generalization is an effective way to compress information. Because of it persistent instructions consume only a tiny fraction of context, but they reduce the need for LLM to go into full analysis of your code.
> but it’s still fundamentally different than how a person thinks.
Again, so what? Nobody can keep in short-term memory the entire code base. It should not be the expectation to have this ability neither it should not be considered a major disadvantage not to have it. Yes, we use our „context windows“ differently in a thinking process. What matters is what information we pack there and what we make of it.
Nothing in the world is simply outright garbage. Even the seemingly worst products exist for a reason and is used for a variety of use cases.
So, take a step back and reevaluate whether your reply could have been better. Because, it simply "just sucks"
Let the LLM do the boring stuff, and focus on writing the fun stuff.
Also, setting up logging in Python is never fun.
Also if the task runs out of context it will get progressively worse rather than refresh its own context from time to time.
I know, thus the :trollface:
> Happen to know where I can find a fork?
I don't know where you can find a fork, but even if there is a fork somewhere that's still alive, which is unlikely, it would be for a really old version of Claude Code. You would probably be better off reverse engineering the minified JavaScript or whatever that ships with the latest Claude Code.
But I do think there is a qualitative different between getting candidates and adding them to context before generating (retrieval augmented generation) vs the LLM searching for context till it is satisfied.
I've now gone back to just using vanilla CC with a really really rich claude.md file.
It makes me think that the language/platform/architecture that is "most known" by LLMs will soon be the preferred -- sort of a homogenization of technologies by LLM usage. Because if you can be 10x as successfully vibey in, say, nodejs versus elixir or go -- well, why would you opt for those in a greenfield project at all? Particularly if you aren't a tech shop and that choice allows you to use junior coders as if they were midlevel or senior.
for context, i want to build a claude code like agent in a WYSIWYG markdown app. that's how i stumbled on your blog post :)
The thing is, a lot of the code that people write is cookie-cutter stuff. Possibly the entirety of frontend development. It's not copy-paste per se, but it is porting and adapting common patterns on differently-shaped data. It's pseudo-copy-paste, and of course AI's going to be good at it, this is its whole schtick. But it's not, like, interesting coding.
If it's a new, non-trivial algorithm, I enjoy writing it.
If it felt like a waste of time and energy to post something substantive, rather than the GP comment (https://news.ycombinator.com/item?id=44998577), then you should have just posted nothing. That comment was obviously neither substantive nor thoughtful. This is hardly a borderline call!
We want substantive, thoughtful comments from people who do have the time and energy to contribute them.
Btw, to avoid a misunderstanding that sometimes shows up: it's fine for comments to be critical; that is, it's possible to be substantive, thoughtful, and critical all at the same time. For example, I skimmed through your account's most recent comments and saw several of that kind, e.g. https://news.ycombinator.com/item?id=44299479 and https://news.ycombinator.com/item?id=42882357. If your GP comment had been like that, it would have been fine; you don't have to like Claude Code (or whatever the $thing is).
I hear people say things like, “AI isn’t coming for my job because LLMs suck at [language or tech stack]!”
And I wonder, does that just mean that other stacks have an advantage? If a senior engineer with Claude Code can solve the problem in Python/TypeScript in significantly less time than you can solve it in [tech stack] then are you really safe? Maybe you still stack up well against your coworkers, but how well does your company stack up against the competition?
And then the even more distressing thought accompanies it: I don’t like the code that LLMs produce because it looks nothing like the code I write by hand. But how relevant is my handwritten code becoming in a world where I can move 5x faster with coding agents? Is this… shitty style of LLM generated code actually easier for code agents to understand?
Like I said, I don’t endorse either of these ideas. They’re just questions that make me uncomfortable because I can’t definitively answer them right now.
if true this seems like a bloated approach but tbh I wouldn't claim to know totally how to use Claude like the author here...
I find you can get a lot of mileage out of "regular" prompts, I'd call them?
Just asking for what you need one prompt at a time?
I still can't visualize how any of the complexity on top of that like discussed in the article adds anything to carefully crafted prompts one at a time
I also still can't really visualize how claude works compared to simple prompts one at a time.
Like, wouldn't it be more efficient to generate a prompt and then check it by looping through the appendix sections ("Main Claude Code System Prompt" and "All Claude Code Tools"), or is that basically what the LLM does somewhat mysteriously (it just works)? So like "give me while loop equivalent in [new language I'm learning]" is the entirety of the prompt... then if you need to you can loop through the appendix section? Otherwise isn't that a massive over-use of tokens, and the requests might even be ignored because they're too complex?
The control flow eludes me a bit here. I otherwise get the impression that the LLM does not use the appendix sections correctly by adding them to prompts (like, couldn't it just ignore them at times)? It would seem like you'd get more accurate responses by separating that from whatever you're prompting and then checking the prompt through looping over the appendix sections.
Does that make any sense?
I'm visualizing coding an entire program as prompting discrete pieces of it. I have not needed elaborate .md files to do that, you just ask for "how to do a while loop equivalent in [new language I'm learning]" for example. It's possible my prompts are much simpler for my uses, but I still haven't seen any write-ups on how people are constructing elaborate programs in some other way.
Like how are people stringing prompts together to create whole programs? (I guess is one question I have that comes to mind)
I guess maybe I need to find a prompt-by-prompt breakdown of some people building things to get a clearer picture of how LLMs are being used
I should mention I made that one for my research/stats workflow, so there's some specific stuff in there for that, but you can prompt chat gpt to generalize it.
Another thing is that before, you were in a greenfield project, so Claude didn't need any context to do new things. Now, your codebase is larger, so you need to point out to Claude where it should find more information. You need to spoon-feed the relevant files with "@" where you want it to look up things and make changes.
If you feel Claude is lazy, force it to use more thinking budget "think" < "think hard" < "think harder" < "ultrathink.". Sometimes I like to throw "ultrathink" and do something else while it codes. [1]
[1]: https://www.anthropic.com/engineering/claude-code-best-pract...
How do we get the LLM to gain knowledge on this new language that we have no example usage of?
Honestly I don’t think customers care.
So if you need to avoid GC issues, or have robust type safety, or whatever it is, to gain an edge in a certain industry or scenario, you can't just switch to the vibe tool of choice without (best case) giving up $$$ to pay to make up for the inefficiency or (worst case) having more failures that your customers won't tolerate.
But this means the gap between the "hard" work and the "easy" work may become larger - compensation included. Probably most notably in FAANG companies where people are brought in expected to be able to do "hard" work and then frequently given relatively-easy CRUD work in low-ROI ancillary projects but with higher $$$$ than that work would give anywhere else.
And the places currently happy to hire disaffected ex-FAANG engineers who realized they were being wasted on polishing widgets may start having more hiring difficulty as the pipeline dries up. Like trying to hire for assembly or COBOL today.
For now LLMs still suffers from hallucination and lack of generalizability, The large amount of code generated is sometimes not necessarily a benefit, but a technical debt.
LLMs are good for open and fast, prototype web applications, but if we need a stable, consistent, maintainable, secure framework, or scientific computing, pure LLMs are not enough, one can't vibe everything without checking details
I get lost a bit at things like this, from the link. The lessons in the article match my experience with LLMs and tools around them (see also: RAG is a pain in the ass and vector embedding similarity is very far from a magic bullet), but the takeaway - write really good prompts instead of writing code - doesn't ring true.
If I need to write out all the decision points and steps of the change I'm going to make, why am I not just doing it myself?
Especially when I have an editor that can do a lot of automated changes faster/safer than grep-based text-first tooling? If I know the language the syntax isn't an issue; if I don't know the language it's harder to trust the output of the model. (And if I 90% know the language but have some questions, I use an LLM to plow through the lines I used to have to go to Google for - which is a speedup, but a single-digit-percentage one.)
My experience is that the tools fall down pretty quickly because I keep trying to make them to let me skip the details of every single task. That's how I work with real human coworkers. And then something goes sideways. When I try to pseudocode the full flow vs actually writing the code I lose the speed advantage, and often end up with a nasty 80%-there-but-I-don't-really-know-how-to-fix-the-other-20%-without-breaking-the-80% situation because I noticed a case I didn't explicitly talk about that it guessed wrong on. So then it's either slow and tedious or `git reset` and try again.
(99% of these issues go away when doing greenfield tooling or scripts for operations or prototyping, which is what the vast majority of compelling "wow" examples I've seen have been, but only applies to my day job sometimes.)
I have also a code reviewer agent in CC that writes all my unit and integration tests, which feeds into my CI/CD pipeline. I use the "/security" command that Claude recently released to review my code for security vulnerabilities while also leveraging a red team agent that tests my codebase for vulnerabilities to patch.
I'm starting to integrate Claude into Linear so I can assign Linear tickets to Claude to start working on while I tackle core stuff. Hope that helps!
I use some ai tools and sometimes they're fine, but I won't in my lifetime anyway hand over everything to an AI, not out of some fear or anything, but even purely as a hobby. I like creating things from scratch, I like working out problems, why would I need to let that go?
(I don’t use any clients that answer coding questions by using the context of my repos).
In my case it was exactly the kind of situation where I would also run into trouble on my own - trying to change too many things at once.
It was doing superbly for smaller, more contained tasks.
I may have to revert and approach each task on its own.
I find I need to know better than Claude what is going on, and guide it every step. It will figure out the right code if I show it where it should go, that kind of thing.
I think people may be underestimating / underreporting how much they have to be in the loop, guiding it.
It’s not really autonomous or responsible. But it can still be very useful!
Would you mind linking to your startup? I’m genuinely curious to see it.
(I won’t reply back with opinions about it. I just want to know what people are actually building with these tools!)
Oh, and the chatbot is cheap. I pay for API usage. On average I'm paying less than $5 per month.
> and I don't have to worry about random hallucinations.
For boilerplate code, I don't think I've ever had to fix anything. It's always worked the first time. If it didn't, my prompt was at fault.
Am I missing something here? Or is this just Anthropic shilling?
Nice, do share a link, would love to check out your agent!
Claude can run commands to search code, test compilation, and perform various other operations.
Unix is great because its commands are well-documented, and the training data is abundant with examples.
It's really freeing to say "Well, if the linter and the formatter don't catch it, it doesn't matter". I always update lint settings (writing new rules if needed) based on nit PR feedback, so the codebase becomes easier to review over time.
It's the same principle as any other kind of development - let the machine do what the machine does well.
Make sure you read it first though... I believe it expected Req to be present as a dependency when generating code that makes HTTP requests.
I'm deliberately trying not to do too much manual coding right now so I can figure out these (infuriating/wonderful) tools.
Unfortunately I can't always share all of my work, but everything on github after perhaps 2025-06-01 is as vibe-coded as I can get it to be. (I manually review commits before they're pushed, and PRs once in a complete state, but I always feed those reviews back into the tooling, not fix them manually, unless I get completely fed up.)
I run into bugs which are not documented in documentation or anywhere except github issues.
Is it legal to search github issues using LLM? if yes how?
Long term memory is its training data.
`Tool name: WebFetch Tool description: - Fetches content from a specified URL and processes it using an AI model - Takes a URL and a prompt as input - Fetches the URL content, converts HTML to markdown - Processes the content with the prompt using a small, fast model - Returns the model's response about the content - Use this tool when you need to retrieve and analyze web content`
I came up with this one:
`import asyncio from playwright.async_api import async_playwright from readability import Document from markdownify import markdownify as md
async def web_fetch_robust(url: str, prompt: str) -> str: """ Fetches content from a URL using a headless browser to handle JS-heavy sites, processes it, and returns a summary. """ try: async with async_playwright() as p: # Launch a headless browser (Chromium is a good default) browser = await p.chromium.launch() page = await browser.new_page()
# --- Avoiding Blocks ---
# Set a realistic User-Agent to mimic a real browser
await page.set_extra_http_headers({
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
})
# Navigate to the URL
await page.goto(url, wait_until='networkidle', timeout=15000) # wait_until='networkidle' is key
# --- Extracting Content ---
# Get the fully rendered HTML content
html_content = await page.content()
await browser.close()
# --- Processing for Token Minimization ---
# 1. Extract main content using Readability.js
doc = Document(html_content)
main_content_html = doc.summary()
# 2. Convert to clean Markdown
markdown_content = md(main_content_html, strip=['a', 'img']) # Strip links/images to save tokens
# 3. Use the small, fast model to process the clean content
# summary = small_model.process(prompt, markdown_content) # Placeholder for your model call
# For demonstration, we'll just return a message
summary = f"A summary of the JS-rendered content from {url} would be generated here."
return summary
except Exception as e:
return f"Error fetching or processing URL with headless browser: {e}"
# To run this async function
# result = asyncio.run(web_fetch_robust("https://example.com", "Summarize this."))
# print(result)
`Do you just let it run rampant on your system and do whatever it thinks it should, installing whatever it wants and sucking all your config files into the cloud or what?
What AI can definitely not do is launch or sell anything.
I can write some arbitrary SaaS in a few hours with my own framework, too - and know it's much more secure than anything written by AI. I also know how to launch it. (I'm not so good at the "selling" part).
But if anyone can do all of this - including the launching the selling - then they would not be selling themselves on Reddit or Youtube. Once you see someone explaining to you how to get rich quickly, you must assume that they have failed or else they would not be wasting their time trying to sell you something. And from that you should deduce that it's not wise to take their advice.
Sure but he was particularly talking about the technical side of things.
> (I'm not so good at the "selling" part).
In person I am, but this new fangled 'influencer' selling or what not I do not understand and cannot do (yet) (i'm in my 50s so I can still learn).
> But if anyone can do all of this - including the launching the selling - then they would not be selling themselves on Reddit or Youtube
Yeah but most don't actually name the url of the product and he does. So that's a difference.
Without these premises, one could state that the 1996 Yugo was so damn good. I mean, it was better than a horse.
It's Coke vs. Pepsi.
Because it's twice the price and doesn't even have a trial.
I feel like if it were a game changer, like Cursor once was vs Ask mode with GPT, it would be worth it, but CoPilot has come a long way and the only up-to-date comparisons I've read point to it being marginally better or the same, but twice the price.
You could set up a docker image and run it in that if you wanted.
Do they also allow you to view the thinking process and planning, and hit ESC to correct if it’s going down a wrong path? I’ve found that to be one of my favorite features of Claude code. If it says “ah, the the implementation isn’t complete, I’ll update test to use mocks” I can interrupt it and say no, it’s fine for the test to fail until the implementation is finished, so not mock anything. Etc.
It may be that I just discovered this after switching, but I don’t recall that being an interaction pattern on cursor or copilot. I was always having to revert after the fact (which might have been me not seeing the option).
I promise if someone posted human made code and said it was LLM generated, it would still be nit-picked to death. I swear 75% of developers ride around on a high horse that their style of doing things is objectively the best and everyone else is a knuckle dragger.
Overall, it has been working pretty well. I did make a tweak I haven't pushed yet to make it always writes the outline to a file first (instead of just terminal). And I've also started adding slash commands to the instructions so I can type things like "/create some flow" and then just "/refresh" (instead of "pardon me, would you mind refreshing that flow now?").
Go back to school Anthropic.
In all seriousness, at the times of LLMs I am not surprised to see an article that can basically be summarized into: "This product is good because it's good and I am not gonna compare it to others because why do you expect critical thinking in the era of LLMs"
It wasn't something I considered at first but it makes sense if you think about text prediction models and infilling and training by reading code. The statistics of style matching what you are doing against similar things. You're not going to paint a photorealistic chunk into a hole of an impressionist painting, ya know?
So in my experience if you give it "code that avoids the common issues" that works like a style it will follow. But if you're working with a codebase that looks like it doesn't "avoid those common issues" I would expect it to follow suit and suggest code that you would expect from codebases that don't "avoid those common issues". If the input code looks like crappy code, I would expect it to statistically predict output code that looks like crappy code. And I'm not talking about formatting (formatting is for formatters), it's things like which functions and steps are used to accomplish whatever. That sort of thing. At least without some sort of specific prompting it's not going to jump streams.
Edit: one amusing thing you can do is ask Claude to predict attributes of the developers of the code and their priorities and development philosophy (i.e. ask Claude to write a README that includes these cultural things). I have a theory it gives you an idea about the overall codesmell Claude is assigning to the project.
Again I am very new to these tools and have only used claude-code because the command line interface and workflow didn't make me immediately run for the hills the way other things have. So no idea how other systems work, etc because I immediately bounced on them in the past. My use of claude-code started as an "okay fine why not give these things the young guns can't shut up about a shot on the boring shit and maybe clear out some backlog" for making chores in projects that I usually hate doing at least a little interesting but I've expanded my use significantly after gaining experience with it. But I have noticed it behave very differently in different code bases and the above is how I currently interpret that.
Raw code. Use case was configuring a mapping of health data JSON from heterogeneous sources to a standard (also JSON) format. Initial prototype was a YAML DSL, based on the same theory as yours. LLMs had difficulty using the DSL’s semantics correctly, or even getting its syntax (not YAML-level syntax, but the schema: nesting levels for different constructs, and so on). It’s possible that better error loops or something would have cracked it, but a second prototype generating jq worked so much better out of the box that we basically never looked back.
FWIW: “Infuriating/wonderful” is exactly how I feel about LLM copilots, too! Like you, I also use them extensively. But nothing I’ve built (yet?) has crossed the threshold into salable web services and every time someone makes the claim that they’ve primarily used AI to launch a new business with paid customers, links are curiously absent from the discussion… too bad, since they’d be great learning material too!
I know this is a new "space" so I've just been going off what I can find on here and other places and...
it all seems a little confusing to me besides what I otherwise tried to describe (and which apparently resonates with you, which is good to see)
> - IMPORTANT: DO NOT ADD ***ANY*** COMMENTS unless asked
> - VERY IMPORTANT: You MUST avoid using search commands like `find` and `grep`.
Does using caps, or the stars, really carry meaning through the tokenization process?
1. I code with LLMs (Copilot, Claude Code). Like anyone who has done so, I know a lot about where these tools are useful and where they're hopeless. They can't do it all, claims to the contrary aside.
2. I've built a couple businesses (and failed tragicomically at building a couple more). Like anyone who has done so, I know the hard parts of startups are rarely the tech itself: sales, marketing, building a team with values, actually listening to customers and responding to their needs, making forward progress in a sea of uncertainty, getting anyone to care at all... sheesh, those are hard! Last I checked, AI doesn't singlehandedly solve any of that.
Which is not to say LLMs are useless; on the contrary, used well and aimed at the right tasks, my experience is that they can be real accelerants. They've undoubtedly changed the way I approach my own new projects. But "LLMs did it all and I've got a profitable startup"... I mean, if that's true, link to it because we should all be celebrating the achievement.
Overall "meta" commands seem to work much more effectively that I expected. I'm still getting used to it and letting it run more freely lately but there's some sort of a loop you can watch as it runs where it will propose code given logic that is dumb and makes you want to stop it and intervene... but on the next step it evaluates what it just wrote and rejects for the same reason I would have rejected it and then tries something else. It's somewhat interesting to watch.
If you asked a new "I need you to write XYZ stat!" vs "We care a lot about security, maintainability and best practices. Create a project that XYZ." you would expect different product from the new hire. At least that's how I am treating it.
Basically I would give it a sort of job description. And you can even do things like pick a project you like as a model and have it write a file describing development practices used in that project. Then in the new project ask it to refer to that file as guidance and design a plan for writing the program. And then let it implement that plan. That would probably give a good scaffold, but I haven't tried. It seems like how I would approach that right now as an experiment. It's all speculation but I can see how it might work.
Maybe I'll get there and try that, but at the moment I'm just doing things I have wanted to do forever but that represented massive amounts of my time that I couldn't justify. I'm still learning to trust it and my projects are not large. Also I am not primarily a programmer (physicist who builds integrations, new workflows and tools for qc and data handling at a hospital).
I still think you'll be at a significant disadvantage since the LLM has been trained on millions of lines of all mainstream languages, and 0 lines of gervwyks funny yaml lang.
It's fine, of course, to make your substantive points thoughtfully, but that is a very different kind of comment.