(openai.com)

555 points maheshrijal | 1 comments | 16 Apr 25 17:01 UTC | HN request time: 0.417s | source

Show context

osigurdson ◴[16 Apr 25 18:22 UTC] No.43708704[source]▶

I have a very basic / stupid "Turing test" which is just to write a base 62 converter in C#. I would think this exact thing would be in github somewhere (thus in the weights) but has always failed for me in the past (non-scientific / didn't try every single model).

Using o4-mini-high, it actually did produce a working implementation after a bit of prompting. So yeah, today, this test passed which is cool.

replies(3): >>43708784 #>>43709386 #>>43713122 #

sebzim4500 ◴[16 Apr 25 18:31 UTC] No.43708784[source]▶

>>43708704 #

Unless I'm misunderstanding what you are asking the model to do, Gemini 2.5 pro just passed this easily. https://g.co/gemini/share/e2876d310914

replies(2): >>43708929 #>>43711326 #

1. AaronAPU ◴[16 Apr 25 23:13 UTC] No.43711326[source]▶

>>43708784 #

I’ve been using Gemini 2.5 pro side by side with o1-pro and Grok lately. My experience is they each randomly offer significant insight the other two didn’t.

But generally, o1-pro listens to my profile instructions WAY better, and it seems to be better at actually solving problems the first time. More reliable.

But they are all quite similar and so far these new models are similar but faster IMO.

↑

OpenAI o3 and o4-mini