Google is winning on every AI front

(www.thealgorithmicbridge.com)

1005 points vinhnx | 1 comments | 12 Apr 25 03:58 UTC | HN request time: 0s | source

Show context

codelord ◴[12 Apr 25 06:51 UTC] No.43661966[source]▶

As an Ex-OpenAI employee I agree with this. Most of the top ML talent at OpenAI already have left to either do their own thing or join other startups. A few are still there but I doubt if they'll be around in a year. The main successful product from OpenAI is the ChatGPT app, but there's a limit on how much you can charge people for subscription fees. I think soon people expect this service to be provided for free and ads would become the main option to make money out of chatbots. The whole time that I was at OpenAI until now GOOG has been the only individual stock that I've been holding. Despite the threat to their search business I think they'll bounce back because they have a lot of cards to play. OpenAI is an annoyance for Google, because they are willing to burn money to get users. Google can't as easily burn money, since they already have billions of users, but also they are a public company and have to answer to investors. But I doubt if OpenAI investors would sign up to give more money to be burned in a year. Google just needs to ease off on the red tape and make their innovations available to users as fast as they can. (And don't let me get started with Sam Altman.)

replies(23): >>43661983 #>>43662449 #>>43662490 #>>43662564 #>>43662766 #>>43662930 #>>43662996 #>>43663473 #>>43663586 #>>43663639 #>>43663820 #>>43663824 #>>43664107 #>>43664364 #>>43664519 #>>43664803 #>>43665217 #>>43665577 #>>43667759 #>>43667990 #>>43668759 #>>43669034 #>>43670290 #

ramraj07 ◴[12 Apr 25 09:48 UTC] No.43662930[source]▶

>>43661966 #

I don't know what you did there, but clearly being ex OpenAI isn't the intellectual or product flex it is: I and every other smart person I know still use ChatGPT (paid) because even now it's the best at what it does and we keep trying Google and Claude and keep coming back.

They got and as of now continue to get things right for the most part. If you still aren'ĥt seeing it maybe you should introspect what you're missing.

replies(4): >>43663018 #>>43663266 #>>43663418 #>>43663788 #

epolanski ◴[12 Apr 25 10:05 UTC] No.43663018[source]▶

>>43662930 #

I don't know your experience doesn't match mine.

NotebookLM by Google is in a class of its own in the use case of "provide documents and ask a chat or questions about them" for personal use. ChatGPT and Claude are nowhere near. ChatGPT uses RAG so it "understands" less about the topic and sometimes hallucinate.

When it comes to coding Claude 3.5/3.7 embedded in Cursor or stand alone kept giving better results in real world coding, and even there Gemini 2.5 blew it away in my experience.

Antirez, hping and Redis creator among many others releases a video on AI pretty much every day (albeit in Italian) and his tests where Gemini reviews his PRs for Redis are by far the better out of all the models available.

replies(4): >>43663110 #>>43663142 #>>43663452 #>>43699039 #

1. mike_hearn ◴[12 Apr 25 11:23 UTC] No.43663452[source]▶

>>43663018 #

Gemini with coding seems to be a bit of a mixed bag.

The article claims Gemini is acing the Aider Polyglot benchmark. At the moment this is the only benchmark that really matters to me because Aider is actually a useful tool and performance on that translates directly to real world impact, although Claude Code is even better. If you look closely, in fact Gemini is at the top only in the "percent correct" category but not "percent correct using the right edit format". Cost is marked as ? because it's not entirely available yet (I think?). Not emitting the correct edit format is pretty useless because it means the changes won't apply and the tool has to try again.

Claude in contrast almost never makes a mistake with emitting the right format. It's at 97%+ in the benchmark, in practice it's ~100% in my experience. This tracks: Claude is really good at following instructions. Gemini is about ~90%. This makes a big difference to how frustrating a tool is to use in practice.

They might get that fixed, but my experience has been that Google's models are consistently much more likely to refuse instructions for dumb reasons. Google is the company with by far the biggest purity spiral problem and it does show up in their output even when doing apparently ordinary tasks.

I'm also concerned by this event: https://news.sky.com/story/googles-ai-chatbot-gemini-tells-u...

Given how obsessed Google claimed to be with AI safety I expected an SRE style postmortem after that, and there was bupkis. An AI that can suffer a psychotic break out of nowhere like that is one I wouldn't trust unless it's behind a very strong sandbox and being supervised very closely, but none of the AI tools today offer much in the way of sandboxing.

↑