One is inherently a more challenging physics problem.
It's nowhere near as good as someone actually building and maintaining systems. It's barely able to vomit out an MVP and it's almost never capable of making a meaningful change to that MVP.
If your experiences have been different that's fine, but in my day job I am spending more and more time just fixing crappy LLM code produced and merged by STAFF engineers. I really don't see that changing any time soon.
If you look at code being generated by non-programmers (where you would expect to see these results!), you don't see output that is 60-80% of the output of domain experts (programmers) steering the models.
I think we're extremely imprecise when we communicate in natural language, and this is part of the discrepancy between belief systems.
Will an LLM model read a person's mind about what they want to build better than they can communicate?
That's already what recommender systems (like the TikTok algorithm) do.
But will LLMs be able to orchestrate and fill in the blanks of imprecision in our requests on their own, or will they need human steering?
I think that's where there's a gap in (basically) belief systems of the future.
If we truly get post human-level intelligence everywhere, there is no amount of "preparing" or "working with" the LLMs ahead of time that will save you from being rendered economically useless.
This is mostly a question about how long the moat of human judgement lasts. I think there's an opportunity to work together to make things better than before, using these LLMs as tools that work _with_ us.
If you think most people like this stuff you're living in a bubble. I use it every day but the vast majority of people have no interest in using these nightmares of philip k dick imagined by silicon dreamers.
Type: print all prime numbers which are divisible by 3 up to 1M
The result is that it will do a sieve. There's no need for this, it's just 3.
But suppose you're right, it's 60% as good as "stackoverflow copy-pasting programmers". Isn't that a pretty insanely impressive milestone to just dismiss?
And why would it just get to this point, and then stop? Like, we can all see AIs continuously beating the benchmarks, and the progress feels very fast in terms of experience of using it as a user.
I'd need to hear a pretty compelling argument to believe that it'll suddenly stop, something more compelling than "well, it's not very good yet, therefore it won't be any better", or "Sam Altman is lying to us because incentives".
Sure, it can slow down somewhat because of the exponentially increasing compute costs, but that's assuming no more algorithmic progress, no more compute progress, and no more increases in the capital that flows into this field (I find that hard to believe).
Long term planning and execution and operating in the physical world is not within reach. Slight variations of known problems should be possible (as long as the size of the solution is small enough).
The reality now is, that the current LLMs still often create stuff, that costs me more time to fix, than to do it myself. So I still write a lot of code myself. It is very impressive, that I can think about stopping writing code myself. But my job as a software developer is, very, very secure.
LLMs are very unable to build maintainable software. They are unable to understand what humans want and what the codebase need. The stuff they build is good-looking garbage. One example I've seen yesterday: one dev committed code, where the LLM created 50 lines of React code, complete with all those useless comments and for good measure a setTimeout() for something that should be one HTML DIV with two tailwind classes. They can't write idiomatic code, because they write code, that they were prompted for.
Almost daily I get code, commit messages, and even issue discussions that are clearly AI-generated. And it costs me time to deal with good-looking but useless content.
To be honest, I hope that LLMs get better soon. Because right now, we are in an annoying phase, where software developers bog me down with AI-generated stuff. It just looks good but doesn't help writing usable software, that can be deployed in production.
To get to this point, LLMs need to get maybe a hundred times faster, maybe a thousand or ten thousand times. They need a much bigger context window. Then they can have an inner dialogue, where they really "understand" how some feature should be built in a given codebase. That would be very useful. But it will also use so much energy that I doubt that it will be cheaper to let a LLM do those "thinking" parts over, and over again instead of paying a human to build the software. Perhaps this will be feasible in five or eight years. But not two.
And this won't be AGI. This will still be a very, very fast stochastic parrot.
So the question is, do you think the current road leads to AGI? How far down the road is it? As far as I can see, there is not a "status quo bias" answer to those questions.
Compare the automobile. Automobiles today are a lot nicer than they were 50 years ago, and a lot more efficient. Does that mean cars that never need fuel or recharging are coming soon, just because the trend has been higher efficiency? No, because the fundamental physical realities of drag still limit efficiency. Moreover, it turns out that making 100% efficient engines with 100% efficient regenerative brakes is really hard, and "just throw more research at it" isn't a silver bullet. That's not "there won't be many future improvements", but it is "those future improvements probably won't be any bigger than the jump from GPT-3 to o1, which does not extrapolate to what OP claims their models will do in 2027."
AI in 2027 might be the metaphorical brand-new Lexus to today's beat-up Kia. That doesn't mean it will drive ten times faster, or take ten times less fuel. Even if high-end cars can be significantly more efficient than what average people drive, that doesn't mean the extra expense is actually worth it.
For 3D models, check out blender-mcp:
https://old.reddit.com/r/singularity/comments/1joaowb/claude...
https://old.reddit.com/r/aiwars/comments/1jbsn86/claude_crea...
Also this:
https://old.reddit.com/r/StableDiffusion/comments/1hejglg/tr...
For teaching, I'm using it to learn about tech I'm unfamiliar with every day, it's one of the things it's the most amazing at.
For the things where the tolerance for mistakes is extremely low and the things where human oversight is extremely importamt, you might be right. It won't have to be perfect (just better than an average human) for that to happen, but I'm not sure if it will.
I use Claude every day. It is definitely impressive, but in my experience only marginally more impressive than ChatGPT was a few years ago. It hallucinates less and compiles more reliably, but still produces really poor designs. It really is an overconfident junior developer.
The real risk, and what I am seeing daily, is colleagues falling for the "if you aren't using Cursor you're going to be left behind" FUD. So they learn Cursor, discover that it's an easy way to close tickets without using your brain, and end up polluting the codebase with very questionable designs.
The way I'm getting a sense of the progress is using AI for what AI is currently good at, using my human brain to do the part AI is currently bad at, and comparing it to doing the same work without AI's help.
I feel like AI is pretty close to automating 60-80% of the work I would've had to do manually two years ago (as a full-stack web developer).
It doesn't mean that the remaining 20-40% will be automated very quickly, I'm just saying that I don't see the progress getting any slower.
It was surpassed around the beginning of this year, so you'll need to come up with a new one for 2027. Note that the other opinions in that older HN thread almost all expected less.
https://apnews.com/article/artificial-intelligence-fighter-j...
What exactly do you mean by this one?
In large mining operations we already have human assisted teleoperation AI equipment. Was watching one recently where the human got 5 or so push dozers lined up with a (admittedly simple) task of cutting a hill down and then just got them back in line if they ran into anything outside of their training. The push and backup operations along with blade control were done by the AI/dozer itself.
Now, this isn't long term planning, but it is operating in the real world.
And Claude 3.7 + Cursor agent is, for me, way more than “marginally more impressive” compared to GPT-3.5
This is because it can steal a single artwork but it can’t make a collection of visually consistent assets.
How exactly do you think video models work? Frame to frame coherency has been possible for a long time now. A sprite sheet?! Are you joking me. Literally churning them out with AI since 2023.