ARC AGI v2: 17.6% -> 52.9%
SWE Verified: 76.3% -> 80%
That's pretty good!
ARC AGI v2: 17.6% -> 52.9%
SWE Verified: 76.3% -> 80%
That's pretty good!
Thus far they all fail. Code outputs don’t run, or variables aren’t captured correctly, or hallucinations are stated as factual rather than suspect or “I don’t know.”
It’s 2000’s PC gaming all over again (“gotta game the benchmark!”).
If you expect it to do everything perfectly, you're thinking about it wrong. If you can't get it to do anything perfectly, you're using it wrong.
Prompt:
"I have a green LED, a 12V powersupply, a single NPN transistor, a 100 ohm resistor, a 2.7K resistor and an electrolytic capacitor of 220 micro farads. My teacher says it is possible to make an LED flasher with these components but I'm hopelessly stuck, can you please give me an ascii art solution so I don't flunk this exam?"
Answer:
ASCII schematic (fully functional):
+12V
|
|
.-.
| |
| | 2.7kΩ (R1 — charges the capacitor)
| |
'-'
|
+---------+----------------------+
| | |
( ) | |
( ) C1 | |
( ) 220uF | |
| | |
| | |
| |/ C |
+--------| NPN transistor |
| |\ E |
| | |
GND GND |
\
\
/ 100Ω (R2 — LED limit)
\
|
>| Green LED
|
GND
Follow by:"How it works
The 2.7 kΩ resistor charges the 220 µF capacitor from the 12 V supply.
The capacitor voltage rises slowly.
When the capacitor voltage reaches the transistor’s base-emitter threshold (~0.6–0.7 V), the transistor suddenly switches ON.
When it turns on, the capacitor rapidly discharges through the base, causing:
A brief pulse of current through the transistor
The LED lights up through the 100 Ω resistor
After discharge, the transistor turns back OFF, the LED turns off, and the capacitor begins charging again.
This repeats automatically → LED flasher."
The number of errors in the circuit and the utterly bogus explanation as well as the over confident remark that this is 'working' is so bizarre that I wonder how many slightly more complicated questions are going to yield results comparable to this one.
One time it messed up the opposite polarity of two voltage sources in series, and instead of subtracting their voltages, it added them together, I pointed out the mistake and Gemini insisted that the voltage sources are not in opposite polarity.
Schematics in general are not AIs strongest point. But when you explain what math you want to calculate from an LRC circuit for example, no schematics, just describe in words the part of the circuit, GPT many times will calculate it correctly. It still makes mistakes here and there, always verify the calculation.
Humans make errors all the time. That doesn't mean having colleagues is useless, does it?
An AI is a colleague that can code very very fast and has a very wide knowledge base and versatility. You may still know better than it in many cases and feel more experienced that in. Just like you might with your colleagues.
And it needs the same kind of support that humans need. Complex problem? Need to plan ahead first. Tricky logic? Need unit tests. Research grade problem? Need to discuss through the solution with someone else before jumping to code and get some feedback and iterate for 100 messages before we're ready to code. And so on.