←back to thread

207 points todsacerdoti | 2 comments | | HN request time: 0.426s | source
Show context
ikari_pl ◴[] No.46004050[source]
Today, Gemini wrote a python script for me, that connects to Fibaro API (local home automation system), and renames all the rooms and devices to English automatically.

Worked on the first run. I mean, the second, because the first run was by default a dry run printing a beautiful table, and the actual run requires a CLI arg, and it also makes a backup.

It was a complete solution.

replies(3): >>46004066 #>>46004101 #>>46004754 #
UltraSane ◴[] No.46004066[source]
I've been surprised by how often Sonnet 4.5 writes working code the first try.
replies(3): >>46004099 #>>46004141 #>>46004176 #
troupo ◴[] No.46004176[source]
I've found it to depend on the phase of the moon.

It goes from genius to idiot and back a blink of an eye.

replies(3): >>46004943 #>>46005171 #>>46005533 #
1. Mtinie ◴[] No.46004943[source]
In my experience that “blink of an eye” has turned out to be a single moment when the LLM misses a key point or begins to fixate on an incorrect focus. After that, it’s nearly impossible to recover and the model acts in noticeably divergent ways from the prior behavior.

That single point is where the model commits fully to the previous misunderstanding. Once it crosses that line, subsequent responses compound the error.

replies(1): >>46008319 #
2. troupo ◴[] No.46008319[source]
For me it's also sometimes consequtive sessions, or sessions on different days.