I’d love to see how this compares when either the problem space is different or the language/ecosystem is different.
It was a great read regardless!
I’d love to see how this compares when either the problem space is different or the language/ecosystem is different.
It was a great read regardless!
GPT3.5 was impressive at the time, but today's SOTA (like GPT 5 Pro) are almost night-and-difference both in terms of just producing better code for wider range of languages (I mostly do Rust and Clojure, handles those fine now, was awful with 3.5) and more importantly, in terms of following your instructions in user/system prompts, so it's easier to get higher quality code from it now, as long as you can put into words what "higher quality code" means for you.
Multiplying two 24 bit posits in 8-bit Avr for instance. No models have succeeded yet, but usually because they try and put more than 8 bits into a register. Algorithmically it seems like they are on the right track but they don't seem to be able to hold the idea that registers are only 8-bits through the entirety of their response.
Something along the lines of
Can you generate 8-bit AVR assembly code to multiply two 24 bit posit numbers
You get some pretty funny results from the models that have no idea what a posit is. It's usually pretty clear to tell if they know what they are supposed to be doing. I haven't had a success yet (haven't tried for a while though). Some of them have come pretty close, but usually it's the trying to squeeze more than 8 bits of data into a register is what brings them down.
One thing you learn quickly about working with LLMs if they have these kind of baked-in biases, some of which are very fixed and tied to their very limited ability to engage in novel reasoning (cc François Chollet), while others are far more loosely held/correctable. If it sticks with the errant patten, even when provided the proper context, it probably isn’t something an off-the-shelf model can handle.
LLMs are nothing more than rubber ducking in game dev. The code they generate is often useful as a starting point or to lighten the mood because it's so bad you get a laugh. Beyond that it's broadly useless.
I put this down to the relatively small number of people who work in game dev resulting in relatively small number of blogs from which to "learn" game dev.
Game Dev is a conservative industry with a lot of magic sauce hidden inside companies for VERY good reasons.