OpenAI o3 and o4-mini

(openai.com)

555 points maheshrijal | 1 comments | 16 Apr 25 17:01 UTC | HN request time: 0.213s | source

Show context

osigurdson ◴[16 Apr 25 18:22 UTC] No.43708704[source]▶

I have a very basic / stupid "Turing test" which is just to write a base 62 converter in C#. I would think this exact thing would be in github somewhere (thus in the weights) but has always failed for me in the past (non-scientific / didn't try every single model).

Using o4-mini-high, it actually did produce a working implementation after a bit of prompting. So yeah, today, this test passed which is cool.

replies(3): >>43708784 #>>43709386 #>>43713122 #

sebzim4500 ◴[16 Apr 25 18:31 UTC] No.43708784[source]▶

>>43708704 #

Unless I'm misunderstanding what you are asking the model to do, Gemini 2.5 pro just passed this easily. https://g.co/gemini/share/e2876d310914

replies(2): >>43708929 #>>43711326 #

osigurdson ◴[16 Apr 25 18:44 UTC] No.43708929[source]▶

>>43708784 #

As I mentioned, this is not a scientific test but rather just something that I have tried from time to time and has always (shockingly in my opinion) failed but today worked. It takes a minute of two of prompting, is boring to verify and I don't remember exactly which models I have used. It is purely a personal anecdote, nothing more.

However, looking at the code that Gemini wrote in the link, it does the same thing that other LLMs often do, which is to assume that we are encoding individual long values. I assume there must be a github repo or stackoverflow question in the weights somewhere that is pushing it in this direction but it is a little odd. Naturally, this isn't the kind encoder that someone would normally want. Typically it should encode a byte array and return a string (or maybe encode / decode UTF8 strings directly). Having the interface use a long is very weird and not very useful.

In any case, I suspect with a bit more prompting you might be able to get gemini to do the right thing.

replies(2): >>43711098 #>>43711934 #

1. jiggawatts ◴[16 Apr 25 22:36 UTC] No.43711098[source]▶

>>43708929 #

Similarly, many of my informal tests have started passing with Gemini 2.5 that never worked before, which makes the 2025 era of AI models feel like a step change to me.

↑