Most active commenters
  • Implicated(4)

←back to thread

514 points mfiguiere | 16 comments | | HN request time: 0.702s | source | bottom
Show context
gklitt ◴[] No.43710093[source]
I tried one task head-to-head with Codex o4-mini vs Claude Code: writing documentation for a tricky area of a medium-sized codebase.

Claude Code did great and wrote pretty decent docs.

Codex didn't do well. It hallucinated a bunch of stuff that wasn't in the code, and completely misrepresented the architecture - it started talking about server backends and REST APIs in an app that doesn't have any of that.

I'm curious what went so wrong - feels like possibly an issue with loading in the right context and attending to it correctly? That seems like an area that Claude Code has really optimized for.

I have high hopes for o3 and o4-mini as models so I hope that other tests show better results! Also curious to see how Cursor etc. incorporate o3.

replies(7): >>43710162 #>>43710290 #>>43711286 #>>43713258 #>>43714390 #>>43714966 #>>43716635 #
strangescript ◴[] No.43711286[source]
Claude Code still feels superior. o4-mini has all sorts of issues. o3 is better but at that point, you aren't saving money so who cares.

I feel like people are sleeping on Claude Code for one reason or another. Its not cheap, but its by far the best, most consistent experience I have had.

replies(3): >>43711411 #>>43711764 #>>43712470 #
1. Aeolun ◴[] No.43711411[source]
> Its not cheap, but its by far the best, most consistent experience I have had.

It’s too expensive for what it does though. And it starts failing rapidly when it exhausts the context window.

replies(2): >>43711801 #>>43712547 #
2. jasonjmcghee ◴[] No.43711801[source]
If you get a hang of controlling costs, it's much cheaper. If you're exhausting the context window, I'm not surprised you're seeing high cost.

Be aware of the "cache".

Tell it to read specific files, never use /compact (that'll bust cache, if you need to, you're going back and forth too much or using too many files at once).

Never edit files manually during a session (that'll bust cache). THIS INCLUDES LINT.

Have a clear goal in mind and keep sessions to as few messages as possible.

Write / generate markdown files with needed documentation using claude.ai, and save those as files in the repo and tell it to read that file as part of a question.

I'm at about ~$0.5-0.75 for most "tasks" I give it. I'm not a super heavy user, but it definitely helps me (it's like having a super focused smart intern that makes dumb mistakes).

If i need to feed it a ton of docs etc. for some task, it'll be more in the few $, rather than < $1. But I really only do this to try some prototype with a library claude doesn't know about (or is outdated).

For hobby stuff, it adds up - totally.

For a company, massively worth it. Insanely cheap productivity boost (if developers are responsible / don't get lazy / don't misuse it).

3. Implicated ◴[] No.43712547[source]
I keep seeing this sentiment and it's wild to me.

Sure, it might cost a few dollars here and there. But what I've personally been getting from it, for that cost, is so far away from "expensive" it's laughable.

Not only does it do things I don't want to do, in a _super_ efficient manner. It does things I don't know how to do - contextually, within my own project, such that when it's done I _do_ know how to do it.

Like others have said - if you're exhausting the context window, the problem is you, not the tool.

Example, I have a project where I've been particularly lazy and there's a handful of models that are _huge_. I know better than to have Claude read those models into context - that would be stupid. Rather - I tell it specifically what I want to do within those models, give it specific method names and tell it not to read the whole file, rather search for and read the area around the method definition.

If you _do_ need it to work with very large files - they probably shouldn't be that large and you're likely better off refactoring those files (with Claude, of course) to abstract out where you can and reduce the line count. Or, if anything, literally just temporarily remove a bunch of code from the huge files that isn't relevant to the task so that when it reads it it doesn't have to pull all of that into context. (ie: Copy/paste the file into a backup location, delete a bunch of unrelated stuff in the working file, do your work with claude then 'merge' the changes to the backup file and copy it back)

If a few dollars here and there for getting tasks done is "too expensive" you're using it wrong. The amount of time I'm saving for those dollars is worth many times the cost and the number of times that I've gotten unsatisfactory results from that spending has been less than 5.

I see the same replies to these same complaints everywhere - people complaining about how it's too expensive or becomes useless with a full context. Those replies all state the same thing - if you're filling the context, you've already screwed it up. (And also, that's why it's so expensive)

I'll agree with sibling commenters - have claude build documentation within the project as you go. Try to keep tasks silo'd - get in, get the thing done, document it and get out. Start a new task. (This is dependent on context - if you have to load up the context to get the task done, you're incentivized to keep going rather than dump and reload with a new task/session, thus paying the context tax again - but you also are going to get less great results... so, lesson here... minimize context.)

100% of the time that I've gotten bad results/gone in circles/gotten hallucinations was when I loaded up the context or got lazy and didn't want to start new sessions after finishing a task and just kept moving into new tasks. If I even _see_ that little indicator on the bottom right about how much context is available before auto-compact I know I'm getting less-good functionality and I need to be careful about what I even trust it's saying.

It's not going to build your entire app in a single session/context window. Cut down your tasks into smaller pieces, be concise.

It's a skill problem. Not the tool.

replies(6): >>43712943 #>>43713102 #>>43713376 #>>43715001 #>>43717933 #>>43730027 #
4. disqard ◴[] No.43712943[source]
This comment echoes my own experience with Claude. Especially the advice about only pulling in the context you need.

I'm a paying customer and I know my time is sufficiently valuable that this kind of technology pays for itself.

As an analogy, I liken it to a scribe (author's assistant).

Your comment has lots of useful hints -- thanks for taking the time to write them up!

replies(1): >>43713829 #
5. someothherguyy ◴[] No.43713102[source]
How can it be a skill problem when the tool itself is sold as being skilled?
replies(3): >>43713289 #>>43713529 #>>43713820 #
6. mirsadm ◴[] No.43713289{3}[source]
You're using it wrong, you're using the wrong version etc etc insert all the excuses how it's never the tool but the users fault.
replies(1): >>43713804 #
7. siva7 ◴[] No.43713376[source]
True. Matches my experience. It takes much effort to get really proficient with ai. It's like learning to ride a wild horse. Your senior dev skills will sure come handy in this ride but don't expect it to work like some google query
8. mwigdahl ◴[] No.43713529{3}[source]
A junior developer is skilled too, but still requires a senior’s guidance to keep them focused and on track. Just because a tool has built in intelligence doesn’t mean it can read your intentions from nothing if you fail to communicate to it well.
9. Implicated ◴[] No.43713804{4}[source]
If this is truly your perspective, you've already lost the plot.

It's almost always the users fault when it comes to tools. If you're using it and it's not doing its 'job' well - it's more likely that you're using it wrong than it is that it's a bad tool. Almost universally.

Right tool for the job, etc etc. Also important that you're using it right, for the right job.

Claude Code isn't meant to refactor entire projects. If you're trying to load up 100k token "whole projects" into it - you're using it wrong. Just a fact. That's not what this tool is designed to do. Sure.. maybe it "works" or gets close enough to make people think that is what it's designed for, but it's not.

Detailed, specific work... it excels, so wildly, that it's astonishing to me that these takes exist.

In saying all of that, there _are_ times I dump huge amounts of context into it (Claude, projects, not Claude Code - cause that's not what it's designed for) and I don't have "conversations" with it in that manner. I load it up with a bunch of context, ask my question/give it a task and that first response is all you need. If it doesn't solve your concern, it should shine enough light that you now know how you want to address it in a more granular fashion.

replies(1): >>43714059 #
10. Implicated ◴[] No.43713820{3}[source]
Serious question?

Is it a tool problem or a skill problem when a surgeon doesn't know how to use a robotic surgery assistant/robot?

replies(1): >>43714061 #
11. Implicated ◴[] No.43713829{3}[source]
I like the scribe analogy. And, just like a scribe, my primary complaint with claude code isn't the cost or the context - but the speed. It's just so slow :D
12. troupo ◴[] No.43714059{5}[source]
The unpredictable non-deterministic black box with an unknown training set, weights and biases is behaving contrary to how it's advertised? The fault lies with the user, surely.
13. troupo ◴[] No.43714061{4}[source]
https://news.ycombinator.com/item?id=43714059
14. afletcher ◴[] No.43715001[source]
Thanks for sharing. Are you able to control the context when using Claude Code, or are you using other tools that give you greater control over what context to provide? I haven't used Claude Code enough to understand how smart it is at deciding what context to load itself and if you can/need to explicitly manage it yourself.
15. threecheese ◴[] No.43717933[source]
How can one develop this skill via trial and error if the cost is unknowably high? Before reasoning, it was less important when tokens are cheap, but mixing models, some models being expensive to use, and reasoning blowing up the cost, having to pay even five bucks to make a mistake sure makes the cost seem higher than the value. A little predictability here would go a long way in growing the use of these capabilities, and so one should wonder why cost predictability doesn’t seem to be important to the vendors - maybe the value isn’t there, or is only there for the select few that can intuit how to use the tech effectively.
16. Aeolun ◴[] No.43730027[source]
> It's not going to build your entire app in a single session/context window.

I mean, it was. Right up until it exhausted the context window. Then it suddenly required hand holding.

If I wanted to do that I might as well use Cursor.