Claude 3.7 Sonnet and Claude Code

(www.anthropic.com)

2127 points bakugo | 1 comments | 24 Feb 25 18:28 UTC | HN request time: 0.001s | source

Show context

datadeft ◴[25 Feb 25 09:04 UTC] No.43169696[source]▶

I am not sure how good these Exercism tasks are for measuring how good at a model with coding.

My experience is that these models could write a simple function and get it right if it does not require any out of the box thinking (so essentially offloading the boilerplate part of coding). When it comes to think creatively and have a much better solution to a specific task that would require the think 2-3 steps ahead than they are not suitable.

replies(1): >>43169797 #

berkes ◴[25 Feb 25 09:19 UTC] No.43169797[source]▶

>>43169696 #

I think many of the "AI can do coding" narratives don't see what coding means in real situations.

It's finding out why "jbdoe1337" added this large if/else around the entire function body back in 2016 - it seems important business logic, but the commit just says "updated code". And how the h*ll this interaction between the conf.ini files, the conf/something.json and the ENV vars works. Why sometimes the ENV var overrides a value in the ini and why its sometimes the other way around. But also finding that when you clean it up, everything falls apart.

It's discussing with the stakeholders why "adding a delete button" isn't as easy as just putting a button there, but that it means designing a whole cascading deletion strategy and/or trashcan and/or soft-delete and/or garbage-collection.

It's finding out why - again - the grumb pipeline crashes with the typebar checker, when used through mpm-yearn package manager. Both in containers and on a osx machine but not on Linux Brobuntu 22.12 LTLS.

It's moving stuff in the right abstraction layer. It's removing abstractions while introducing others. KISS vs future flexibility. It's gut feeling when to apply DRY and when to embrace it.

And then, if your lucky, churning out boilerplate or new code for 120 minutes a week.

I'm glad that this 120 minutes can be improved with AI and become 20 minutes. Truly. But this is not what (senior?) programmers do. Despite what the hyped up AI press makes us believe. It only shows they have no idea what the "real" problems and time-consumers are for programmers.

replies(2): >>43172844 #>>43176737 #

CamperBob2 ◴[25 Feb 25 20:13 UTC] No.43176737[source]▶

>>43169797 #

Systems built from scratch with AI won't have these limitations, because only the model will ever see the code. It will implement a spec that's written in English or another human language.

When the business requirements change, the spec will change. When that happens, the system will either modify its previously-written code or regenerate it from the ground up. Which strategy it chooses won't be especially interesting or important.

The process of maintaining the English-language spec will still require great care and precision. It will be called "programming," or perhaps "coding."

A few graybearded gurus will insist on examining the underlying C or Javascript or Python or Rust or whatever the model generates, the way they peer at compiler-generated assembly code now. Occasionally this capability will be important, even vital. But not usually. The situations where it's necessary will become less common over time.

replies(2): >>43182306 #>>43184711 #

1. camdenreslink ◴[26 Feb 25 15:43 UTC] No.43184711{3}[source]▶

>>43176737 #

I haven't seen evidence that this will come to pass. But it's possible. English-language specs are ambiguous. Do you really think businesses with money on the line will tolerate an LLM making automated changes to a codebase and pushing them without a human in the loop? Even human programmers create outages (and we have "AGI" which is the holy grail). If an autonomous LLM creates outages 10% more frequently than a team of humans it is basically unusable. We would need to see a lot more improvement from current state of the art.

↑