←back to thread

555 points maheshrijal | 1 comments | | HN request time: 0s | source
Show context
osigurdson ◴[] No.43708704[source]
I have a very basic / stupid "Turing test" which is just to write a base 62 converter in C#. I would think this exact thing would be in github somewhere (thus in the weights) but has always failed for me in the past (non-scientific / didn't try every single model).

Using o4-mini-high, it actually did produce a working implementation after a bit of prompting. So yeah, today, this test passed which is cool.

replies(3): >>43708784 #>>43709386 #>>43713122 #
croemer ◴[] No.43709386[source]
I asked o3 to build and test a maximum parsimony phylogenetic tree builder in Python (my standard test for new models) and it's been thinking for 10 minutes. Still not clear if anything is happening, I have barely seen any code since I asked to test what it produced in the first answer. The thought summary is totally useless compared to Gemini's. Underwhelming so far.

The CoT summary is full of references to Jupyter notebook cells. The variable names are too abbreviated, nbr for neighbor, the code becomes fairly cryptic as a result, not nice to read. Maybe optimized too much for speed.

Also I've noticed ChatGPT seems to abort thinking when I switch away from the app. That's stupid, I don't want to look at a spinner for 5 minutes.

And the CoT summary keeps mentioning my name which is irritating.

replies(2): >>43710864 #>>43749211 #
1. beefnugs ◴[] No.43749211[source]
Have you tried cutting the job up into a series of smaller verifiable intermediate steps?