(openai.com)

555 points maheshrijal | 1 comments | 16 Apr 25 17:01 UTC | HN request time: 0s | source

Show context

georgewsinger ◴[16 Apr 25 17:20 UTC] No.43707951[source]▶

Very impressive! But under arguably the most important benchmark -- SWE-bench verified for real-world coding tasks -- Claude 3.7 still remains the champion.[1]

Incredible how resilient Claude models have been for best-in-coding class.

[1] But by only about 1%, and inclusive of Claude's "custom scaffold" augmentation (which in practice I assume almost no one uses?). The new OpenAI models might still be effectively best in class now (or likely beating Claude with similar augmentation?).

replies(7): >>43708008 #>>43708068 #>>43708249 #>>43708545 #>>43709203 #>>43713202 #>>43716307 #

pizzathyme ◴[16 Apr 25 19:10 UTC] No.43709203[source]▶

>>43707951 #

The image generation improvement with o4-mini is incredible. Testing it out today, this is a step change in editing specificity even from the ChatGPT 4o LLM image integration just a few weeks ago (which was already a step change). I'm able to ask for surgical edits, and they are done correctly.

There isn't a numerical benchmark for this that people seem to be tracking but this opens up production-ready image use cases. This was worth a new release.

replies(3): >>43710367 #>>43710556 #>>43711280 #

ilaksh ◴[16 Apr 25 21:23 UTC] No.43710556{3}[source]▶

>>43709203 #

wait, o4-mini outputs images? What I thought I saw was the ability to do a tool call to zoom in on an image.

Are you sure that's not 4o?

replies(1): >>43711306 #

AaronAPU ◴[16 Apr 25 23:11 UTC] No.43711306{4}[source]▶

>>43710556 #

I’m generating logo designs for merch via o4-mini-high and they are pretty good. Good text and comprehending my instructions.

replies(2): >>43713173 #>>43713182 #

1. ilaksh ◴[17 Apr 25 04:39 UTC] No.43713182{5}[source]▶

>>43711306 #

It's using the new gpt-4o, a version that's not in the API

↑

OpenAI o3 and o4-mini