Most active commenters

(10)
Alifatisk(7)
nudpiedo(5)
VenturingVole(4)
Breza(4)
A_D_E_P_T(4)
behnamoh(4)
lamuswawir(3)
BoorishBears(3)
freedomben(3)

Popular/hot comments

>>42951332 #
>>42951472 #
>>42951372 #
>>42951630 #
>>42950852 #
>>42952286 #
>>42950815 #
>>42952478 #
>>42954011 #
>>42951056 #
>>42951817 #
>>42952683 #
>>42950798 #
>>42951343 #
>>42952327 #
>>42951246 #
>>42952496 #
>>42951525 #
>>42953004 #
>>42953115 #

Gemini 2.0 is now available to everyone

(blog.google)

1. lowmagnet ◴[05 Feb 25 16:18 UTC] No.42950712[source]▶

Here's me not using Gemini 1 because the only use case for me for old assistant is setting a timer. Because of reports that Gemini is randomly incapable of setting one.

replies(1): >>42950754 #

2. progbits ◴[05 Feb 25 16:21 UTC] No.42950754[source]▶

>>42950712 #

How does release of LLM API relate to assistant?

replies(1): >>42950798 #

3. w0m ◴[05 Feb 25 16:23 UTC] No.42950798{3}[source]▶

>>42950754 #

Pixels replaced Assistent w/ Gemini a while back and it was horrendous; would answer questions but not perform the basic tasks you actually used Assistant for (setting timer, navigating, home control, etc).

Seems like they're approaching parity (finally) months and months later (alarms/tv control work at least now), but losing basic oft-used functionality is a serious fumble.

replies(3): >>42950887 #>>42950891 #>>42950997 #

4. butlike ◴[05 Feb 25 16:24 UTC] No.42950815[source]▶

>>42950454 (OP) #

Flash is back, baby.

Next release should be called Gemini Macromedia

replies(5): >>42950923 #>>42951016 #>>42951526 #>>42952271 #>>42952735 #

5. sho_hn ◴[05 Feb 25 16:25 UTC] No.42950830[source]▶

>>42950454 (OP) #

Anyone have a take on how the coding performance (quality and speed) of the 2.0 Pro Experimental compares to o3-mini-high?

The 2 million token window sure feels exciting.

replies(2): >>42950892 #>>42956069 #

6. gwern ◴[05 Feb 25 16:26 UTC] No.42950839[source]▶

>>42950454 (OP) #

2.0 Pro Experimental seems like the big news here?

> Today, we’re releasing an experimental version of Gemini 2.0 Pro that responds to that feedback. It has the strongest coding performance and ability to handle complex prompts, with better understanding and reasoning of world knowledge, than any model we’ve released so far. It comes with our largest context window at 2 million tokens, which enables it to comprehensively analyze and understand vast amounts of information, as well as the ability to call tools like Google Search and code execution.

replies(1): >>42951235 #

7. mohsen1 ◴[05 Feb 25 16:27 UTC] No.42950852[source]▶

>>42950454 (OP) #

> available via the Gemini API in Google AI Studio and Vertex AI.

> Gemini 2.0, 2.0 Pro and 2.0 Pro Experimental, Gemini 2.0 Flash, Gemini 2.0 Flash Lite

3 different ways of accessing the API, more than 5 different but extremely similarly named models. Benchmarks only comparing to their own models.

Can't be more "Googley"!

replies(7): >>42950889 #>>42950921 #>>42951056 #>>42951310 #>>42952327 #>>42954910 #>>42967480 #

8. KeplerBoy ◴[05 Feb 25 16:28 UTC] No.42950887{4}[source]▶

>>42950798 #

It's not just pixels. That feature rolled out to billions of android phones.

9. sho_hn ◴[05 Feb 25 16:29 UTC] No.42950889[source]▶

>>42950852 #

I think this is a good summary: https://storage.googleapis.com/gweb-developer-goog-blog-asse...

replies(1): >>42952260 #

10. VenturingVole ◴[05 Feb 25 16:29 UTC] No.42950891{4}[source]▶

>>42950798 #

I feel as though some of the rigour of systems engineering is missing from AI model development/integration. Not a negative per-se, as velocity is incredibly important: But it seems a lot of lessons have to be learned again.

I sometimes forget - it is still very early days relatively speaking.

As a user of Gemini 2.0, so far I have been very impressed for the most part.

11. mohsen1 ◴[05 Feb 25 16:29 UTC] No.42950892[source]▶

>>42950830 #

I don't know what those "needle in haystack" benchmarks are testing for because in my experience dumping a big amount of code in the context is not working as you'd expect. It works better if you keep the context small

replies(2): >>42950964 #>>42952255 #

12. llm_trw ◴[05 Feb 25 16:30 UTC] No.42950921[source]▶

>>42950852 #

You missed the first sentence of the release:

>In December, we kicked off the agentic era by releasing an experimental version of Gemini 2.0 Flash

I guess I wasn't building AI agents in February last year.

replies(1): >>42954050 #

13. VenturingVole ◴[05 Feb 25 16:31 UTC] No.42950923[source]▶

>>42950815 #

Or perhaps it'll help people to weave their dreams together and so it should be called.. ahh I feel old all of a sudden.

replies(3): >>42950978 #>>42951039 #>>42954282 #

14. airstrike ◴[05 Feb 25 16:33 UTC] No.42950964{3}[source]▶

>>42950892 #

I think the sweet spot is to include some context that is limited to the scope of the problem and benefit from the longer context window to keep longer conversations going. I often go back to an earlier message on that thread and rewrite with understanding from that longer conversation so that I can continue to manage the context window

15. benob ◴[05 Feb 25 16:34 UTC] No.42950978{3}[source]▶

>>42950923 #

You made me feel old ;)

replies(1): >>42951200 #

16. progbits ◴[05 Feb 25 16:34 UTC] No.42950997{4}[source]▶

>>42950798 #

Thanks, didn't know, never really used these voice assistants.

It's a weird choice, I suppose the endless handcrafted rules and tools don't scale across languages and usecases but then LLM are not good at reliability. And what's the point of using assistant that will not do the task reliably, if you have to double-check you are better of not using it...

replies(1): >>42952781 #

17. drewda ◴[05 Feb 25 16:36 UTC] No.42951016[source]▶

>>42950815 #

Google Gemini MX 2026

18. sbruchmann ◴[05 Feb 25 16:37 UTC] No.42951039{3}[source]▶

>>42950923 #

Google Frontpage?

replies(1): >>42951105 #

19. raverbashing ◴[05 Feb 25 16:38 UTC] No.42951056[source]▶

>>42950852 #

Honestly naming conventions in the AI world have been appalling regardless of the company

replies(4): >>42951246 #>>42951304 #>>42952496 #>>42953133 #

20. VenturingVole ◴[05 Feb 25 16:40 UTC] No.42951105{4}[source]▶

>>42951039 #

I feel seen!

Also just had to explain to the better half why I suddenly shuddered and pulled such a face of despair.

replies(1): >>42953042 #

21. VenturingVole ◴[05 Feb 25 16:45 UTC] No.42951200{4}[source]▶

>>42950978 #

Everyone old is new again!

On a serious note - LLMs have actually brought me a lot of joy lately and elevated my productivity substantially within the domains in which I choose to use them. When witnessing the less experienced more readily accept outputs without understanding the nuances there's definitely additional value in being... experienced.

22. butz ◴[05 Feb 25 16:45 UTC] No.42951203[source]▶

>>42950454 (OP) #

Does "everyone" here means "users with google accounts"?

23. Tiberium ◴[05 Feb 25 16:46 UTC] No.42951235[source]▶

>>42950839 #

It's not that big of a news because they already had gemini-exp-1206 on the API - they just didn't say it was Gemini 2.0 Pro until today. Now the AI Studio marks it as 2.0 Pro Experimental - basically an older snapshot, the newer one is gemini-2.0-pro-exp-02-05.

replies(1): >>42952424 #

24. esafak ◴[05 Feb 25 16:46 UTC] No.42951240[source]▶

>>42950454 (OP) #

Benchmarks or it didn't happen. Anything better than https://lmarena.ai/?leaderboard?

My experience with the Gemini 1.5 models has been positive. I think Google has caught up.

replies(2): >>42951259 #>>42954290 #

25. belval ◴[05 Feb 25 16:47 UTC] No.42951246{3}[source]▶

>>42951056 #

Google isn't even the worst in my opinion. From the top of my head

Anthropic:

Claude 1 Claude Instant 1 Claude 2 Claude Haiku 3 Claude Sonnet 3 Claude Opus 3 Claude Haiku 3.5 Claude Sonnet 3.5 Claude Sonnet 3.5v2

OpenAI:

GPT-3.5 GPT-4 GPT-4o-2024-08-06 GPT-4o GPT-4o-mini o1 o3-mini o1-mini

Fun times when you try to setup throughput provisioning.

replies(3): >>42951345 #>>42960376 #>>42968149 #

26. og_kalu ◴[05 Feb 25 16:47 UTC] No.42951259[source]▶

>>42951240 #

Livebench is better. llmarena is a vibes benchmark

27. yogthos ◴[05 Feb 25 16:49 UTC] No.42951289[source]▶

>>42950454 (OP) #

I wish the blog mentioned whether they backported DeepSeek ideas into their model to make it more efficient.

replies(1): >>42952834 #

28. ◴[05 Feb 25 16:49 UTC] No.42951304{3}[source]▶

>>42951056 #

29. seanhunter ◴[05 Feb 25 16:50 UTC] No.42951310[source]▶

>>42950852 #

I don't know why you're finding it confusing. There's Duff, Duff Lite and now there's also all-new Duff Dry.

replies(1): >>42953389 #

30. singhrac ◴[05 Feb 25 16:51 UTC] No.42951332[source]▶

>>42950454 (OP) #

What is the model I get at gemini.google.com (i.e. through my Workspace subscription)? It says "Gemini Advanced" but there are no other details. No model selection option.

I find the lack of clarity very frustrating. If I want to try Google's "best" model, should I be purchasing something? AI Studio seems focused around building an LLM wrapper app, but I just want something to answer my questions.

Edit: what I've learned through Googling: (1) if you search "is gemini advanced included with workspace" you get an AI overview answer that seems to be incorrect, since they now include Gemini Advanced (?) with every workspace subscription.(2) a page exists telling you to buy the add-on (Gemini for Google Workspace), but clicking on it says this is no longer available because of the above. (3) gemini.google.com says "Gemini Advanced" (no idea which model) at the top, but gemini.google.com/advanced redirects me to what I have deduced is the consumer site (?) which tells me that Gemini Advanced is another $20/month

The problem, Google PMs if you're reading this, is that the gemini.google.com page does not have ANY information about what is going on. What model is this? What are the limits? Do I get access to "Deep Research"? Does this subscription give me something in aistudio? What about code artifacts? The settings option tells me I can change to dark mode (thanks!).

Edit 2: I decided to use aistudio.google.com since it has a dropdown for me on my workspace plan.

replies(10): >>42951517 #>>42951525 #>>42951760 #>>42952840 #>>42952974 #>>42953003 #>>42953418 #>>42956781 #>>42963210 #>>42974312 #

31. gallerdude ◴[05 Feb 25 16:52 UTC] No.42951343[source]▶

>>42950454 (OP) #

Is there really no standalone app, like ChatGPT/Claude/DeepSeek, available yet for Gemini?

replies(3): >>42951400 #>>42951437 #>>42952904 #

32. jorvi ◴[05 Feb 25 16:52 UTC] No.42951345{4}[source]▶

>>42951246 #

I don't understand why if they're gonna use shorthands to make the tech seem cooler, they can't at least use mnemonic shorthands.

Imagine if it went like this:

  Mnemonics: m(ini), r(easoning), t(echnical)

  Claude 3m
  Claude 3mr
  Claude 3mt
  Claude 3mtr
  Claude 3r
  Claude 3t
  Claude 3tr

33. ◴[05 Feb 25 16:52 UTC] No.42951349[source]▶

>>42950454 (OP) #

34. pmayrgundter ◴[05 Feb 25 16:54 UTC] No.42951372[source]▶

>>42950454 (OP) #

I tried voice chat. It's very good, except for the politics

We started talking about my plans for the day, and I said I was making chili. G asked if I have a recipe or if I needed one. I said, I started with Obama's recipe many years ago and have worked on it from there.

G gave me a form response that it can't talk politics.

Oh, I'm not talking politics, I'm talking chili.

G then repeated form response and tried to change conversation, and as long as I didn't use the O word, we were allowed to proceed. Phew

replies(8): >>42951630 #>>42954680 #>>42954716 #>>42954903 #>>42957687 #>>42963388 #>>42963959 #>>42996033 #

35. bangaladore ◴[05 Feb 25 16:56 UTC] No.42951400[source]▶

>>42951343 #

Presumably any app that is API agnostic works fine.

I'm not sure why you would want an app for each anyways.

36. silvajoao ◴[05 Feb 25 16:57 UTC] No.42951437[source]▶

>>42951343 #

The standalone app is at https://gemini.google.com/app, and is similar to ChatGPT.

You can also use https://aistudio.google.com to use base models directly.

37. leetharris ◴[05 Feb 25 16:58 UTC] No.42951472[source]▶

>>42950454 (OP) #

These names are unbelievably bad. Flash, Flash-Lite? How do these AI companies keep doing this?

Sonnet 3.5 v2

o3-mini-high

Gemini Flash-Lite

It's like a competition to see who can make the goofiest naming conventions.

Regarding model quality, we experiment with Google models constantly at Rev and they are consistently the worst of all the major players. They always benchmark well and consistently fail in real tasks. If this is just a small update to the gemini-exp-1206 model, then I think they will still be in last place.

replies(10): >>42951885 #>>42953495 #>>42954011 #>>42954244 #>>42955580 #>>42958678 #>>42959671 #>>42959742 #>>42960059 #>>43039553 #

38. ysofunny ◴[05 Feb 25 17:00 UTC] No.42951517[source]▶

>>42951332 #

hmm did you try clickin where it says 'gemini advanced'? I find it opens a drop down

replies(1): >>42951952 #

39. rickette ◴[05 Feb 25 17:01 UTC] No.42951525[source]▶

>>42951332 #

"what model are you using, exact name please" is usually the first prompt I enter when trying out something.

replies(3): >>42951664 #>>42951784 #>>42960721 #

40. ChocolateGod ◴[05 Feb 25 17:01 UTC] No.42951526[source]▶

>>42950815 #

This is going to send shockwaves through the industry.

replies(1): >>42954125 #

41. mtaras ◴[05 Feb 25 17:03 UTC] No.42951581[source]▶

>>42950454 (OP) #

Updates for Gemini models will always be exciting to me because of how generous free API tier is, I barely run into limits for personal use. Huge context window is a huge advantage for use in personal projects, too

42. xnorswap ◴[05 Feb 25 17:06 UTC] No.42951630[source]▶

>>42951372 #

I find it horrifying and dystopian that the part where it "Can't talk politics" is just accepted and your complaint is that it interrupts your ability to talk chilli.

"Go back to bed America." "You are free, to do as we tell you"

https://youtu.be/TNPeYflsMdg?t=143

replies(7): >>42951810 #>>42951817 #>>42952683 #>>42953278 #>>42953705 #>>42954874 #>>42959435 #

43. lxgr ◴[05 Feb 25 17:08 UTC] No.42951664{3}[source]▶

>>42951525 #

You'd be surprised at how confused some models are about who they are.

replies(1): >>42951775 #

44. miyuru ◴[05 Feb 25 17:13 UTC] No.42951760[source]▶

>>42951332 #

changes must be rolling out now, I can see 3 Gemini 2.0 models in the dropdown, with blue "new" badges.

screenshot: https://beeimg.com/images/g25051981724.png

replies(2): >>42952009 #>>42954466 #

45. freedomben ◴[05 Feb 25 17:14 UTC] No.42951775{4}[source]▶

>>42951664 #

Indeed, asking the model which model it is might be one of the worst ways to find that information out

46. mynameisvlad ◴[05 Feb 25 17:15 UTC] No.42951784{3}[source]▶

>>42951525 #

Gemini 2.0 Flash Thinking responds with

> I am currently running on the Gemini model.

Gemini 1.5 Flash responds with

> I'm using Gemini 2.0 Flash.

I'm not even going to go on a limb here and say that question isn't going to give you an accurate response.

replies(2): >>42956107 #>>42964437 #

47. falcor84 ◴[05 Feb 25 17:16 UTC] No.42951810{3}[source]▶

>>42951630 #

Hear, hear!

There has to be a better way about it. As I see it, to be productive, AI agents have to be able to talk about politics, because at the end of the day politics are everywhere. So following up on what they do already, they'll have to define a model's political stance (whatever it is), and to have it hold its ground, voicing an opinion or abstaining from voicing an opinion, but continuing the conversation, as a person would (at least as those of us who don't rage-quit a conversation when they hear something slightly controversial).

replies(2): >>42951839 #>>42951876 #

48. freedomben ◴[05 Feb 25 17:17 UTC] No.42951817{3}[source]▶

>>42951630 #

I agree it's ridiculous that the mention of a politician triggers the block so feels overly tightened (which is the story of existencer for Gemini), but the alternative is that the model will have the politics of it's creators/trainers. Is that preferable to you? (I suppose that depends on how well your politics align with Silicon Valley)

replies(3): >>42956370 #>>42958837 #>>42962135 #

49. freedomben ◴[05 Feb 25 17:18 UTC] No.42951839{4}[source]▶

>>42951810 #

There aren't many mono-cultures as strong as silicon valley politics. Where this intersects with my beliefs I love it, but where it doesn't it is maddening. I suspect that's how most people feel.

But anyway, when one is rarely or never challenged on their beliefs, they become rusty. Do you trust them to do a good job training their own views into the model, let alone training in the views of someone on the opposite side of the spectrum?

replies(1): >>42956145 #

50. silvajoao ◴[05 Feb 25 17:18 UTC] No.42951840[source]▶

>>42950454 (OP) #

Try out the new models at https://aistudio.google.com.

It's a great way to experiment with all the Gemini models that are also available via the API.

If you haven't yet, try also Live mode at https://aistudio.google.com/live.

You can have a live conversation with Gemini and have the model see the world via your phone camera (or see your desktop via screenshare on the web), and talk about it. It's quite a cool experience! It made me feel the joy of programming and using computers that I had had so many times before.

replies(1): >>42957434 #

51. xnorswap ◴[05 Feb 25 17:20 UTC] No.42951876{4}[source]▶

>>42951810 #

Indeed, you can facilitate talking politics without having a set opinion.

It's a fine line, but it is something the BBC managed to do for a very long time. The BBC does not itself present an opinion on Politics yet facilitates political discussion through shows like Newsnight and The Daily Politics (rip).

replies(1): >>42952032 #

52. falcor84 ◴[05 Feb 25 17:21 UTC] No.42951885[source]▶

>>42951472 #

> It's like a competition to see who can make the goofiest naming conventions.

I'm still waiting for one of them to overflow from version 360 down to One.

replies(1): >>42952406 #

53. Ninjinka ◴[05 Feb 25 17:21 UTC] No.42951897[source]▶

>>42950454 (OP) #

Pricing is CRAZY.

Audio input is $0.70 per million tokens on 2.0 Flash, $0.075 for 2.0 Flash-Lite and 1.5 Flash.

For gpt-4o-mini-audio-preview, it's $10 per million tokens of audio input.

replies(2): >>42952141 #>>42952542 #

54. singhrac ◴[05 Feb 25 17:25 UTC] No.42951952{3}[source]▶

>>42951517 #

I just tried it but nothing happens when I click on that. You're talking about the thing on the upper left next to the open/close menu button?

replies(1): >>42952349 #

55. denysvitali ◴[05 Feb 25 17:26 UTC] No.42951968[source]▶

>>42950454 (OP) #

When will they release Gemini 2.0 Pro Max?

56. ◴[05 Feb 25 17:26 UTC] No.42951969[source]▶

>>42950454 (OP) #

57. msuvakov ◴[05 Feb 25 17:27 UTC] No.42951981[source]▶

>>42950454 (OP) #

Gemini 2.0 works great with large context. A few hours ago, I posted a ShowHN about parsing an entire book in a single prompt. The goal was to extract characters, relationships, and descriptions that could then be used for image generation:

https://news.ycombinator.com/item?id=42946317

replies(1): >>42952382 #

58. singhrac ◴[05 Feb 25 17:28 UTC] No.42952009{3}[source]▶

>>42951760 #

This works on my personal Google account, but not on my workspace one. So I guess there's no access to 2.0 Pro then? I'm ok trying out Flash for now and see if it fixes the mistakes I ran into yesterday.

Edit: it does not. It continues to miss the fact that I'm (incorrectly) passing in a scaled query tensor to scaled_dot_product_attention. o3-mini-high gets this right.

replies(1): >>42953404 #

59. jbarrow ◴[05 Feb 25 17:29 UTC] No.42952017[source]▶

>>42950454 (OP) #

I've been very impressed by Gemini 2.0 Flash for multimodal tasks, including object detection and localization[1], plus document tasks. But the 15 requests per minute limit was a severe limiter while it was experimental. I'm really excited to be able to actually _do_ things with the model.

In my experience, I'd reach for Gemini 2.0 Flash over 4o in a lot of multimodal/document use cases. Especially given the differences in price ($0.10/million input and $0.40/million output versus $2.50/million input and $10.00/million output).

That being said, Qwen2.5 VL 72B and 7B seem even better at document image tasks and localization.

[1] https://notes.penpusher.app/Misc/Google+Gemini+101+-+Object+...

replies(1): >>42952471 #

60. ImHereToVote ◴[05 Feb 25 17:30 UTC] No.42952032{5}[source]▶

>>42951876 #

BBC is great at talking about the Gaza situation. Makes it seem like people are just dying from natural causes all the time.

replies(1): >>42954300 #

61. SuperHeavy256 ◴[05 Feb 25 17:34 UTC] No.42952100[source]▶

>>42950454 (OP) #

it sucks btw. I tried scheduling an event in google calendar through gemini, and it got the date wrong, the time wrong, and the timezone wrong. it set an event that's supposed to be tomorrow to next year.

replies(1): >>42957135 #

62. karaterobot ◴[05 Feb 25 17:36 UTC] No.42952123[source]▶

>>42950454 (OP) #

> Gemini 2.0 is now being forced on everyone.

63. sunaookami ◴[05 Feb 25 17:37 UTC] No.42952141[source]▶

>>42951897 #

Sadly: "Gemini can only infer responses to English-language speech."

https://ai.google.dev/gemini-api/docs/audio?lang=rest#techni...

replies(1): >>42958141 #

64. rzz3 ◴[05 Feb 25 17:45 UTC] No.42952256[source]▶

>>42950454 (OP) #

It’s funny, I’ve never actually used Gemini and, though this may be incorrect, I automatically assume it’s awful. I assume it’s awful because the AI summaries at the top of Google Search are so awful, and that’s made me never give Google AI a chance.

replies(2): >>42952368 #>>42953980 #

65. cma ◴[05 Feb 25 17:45 UTC] No.42952255{3}[source]▶

>>42950892 #

Claude works well for me loading code up to around 80% of its 200K context and then asking for changes. If the whole project can't fit I try to at least get in headers and then the most relevant files. It doesn't seem to degrade. If you are using something like an AI IDE a lot of times they don't really get the 200K context.

66. vdfs ◴[05 Feb 25 17:45 UTC] No.42952260{3}[source]▶

>>42950889 #

- Experimental™

- Preview™

- Coming soon™

replies(2): >>42956507 #>>42959058 #

67. sho_hn ◴[05 Feb 25 17:46 UTC] No.42952271[source]▶

>>42950815 #

How about Gemini Director for the next agentic stuff.

68. leonidasv ◴[05 Feb 25 17:47 UTC] No.42952286[source]▶

>>42950454 (OP) #

That 1M tokens context window alone is going to kill a lot of RAG use cases. Crazy to see how we went from 4K tokens context windows (2023 ChatGPT-3.5) to 1M in less than 2 years.

replies(6): >>42952393 #>>42952519 #>>42952569 #>>42954277 #>>42958220 #>>42975332 #

69. Alifatisk ◴[05 Feb 25 17:48 UTC] No.42952310[source]▶

>>42950454 (OP) #

Exciting news to see these models being released to the Gemini app, I would wish my preferences on which model I want to default to got saved for further sessions.

How many tokens can gemini.google.com handle as input? How large is the context window before it forgets? A quick search said it's 128k token window but that applies to Gemini 1.5 Pro, how is it now then?

My assumption is that "Gemini 2.0 Flash Thinking Experimental is just" "Gemini 2.0 Flash" with reasoning and "Gemini 2.0 Flash Thinking Experimental with apps" is just "Gemini 2.0 Flash Thinking Experimental" with access to the web and Googles other services, right? So sticking to "Gemini 2.0 Flash Thinking Experimental with apps" should be the optimal choice.

Is there any reason why Gemini 1.5 Flash is still an option? Feels like it should be removed as an option unless it does something better than the other.

I have difficulties understanding where each variant of the Gemini model is suited the most. Looking at aistudio.google.com, they have already update the available models.

Is "Gemini 2.0 Flash Thinking Experimental" on gemini.google.com just "Gemini experiment 1206" or was it "Gemini Flash Thinking Experimental" aistudio.google.com?

I have a note on my notes app where I rank every llm based on instructions following and math, to this day, I've had difficulties knowing where to place every Gemini model. I know there is a little popup when you hover over each model that tries to explain what each model does and which tasks it is best suited for, but these explanations have been very vague to me. And I haven't even started on the Gemini Advanced series or whatever I should call it.

The available models on aistudio is now:

- Gemini 2.0 Flash (gemini-2.0-flash)

- Gemini 2.0 Flash Lite Preview (gemini-2.0-flash-lite-preview-02-05)

- Gemini 2.0 Pro Experimental (gemini-2.0-pro-exp-02-05)

- Gemini 2.0 Flash Thinking Experimental (gemini-2.0-flash-thinking-exp-01-21)

If I had to sort these from most likely to fulfill my need to least likely, then it would probably be:

gemini-2.0-flash-thinking-exp-01-21 > gemini-2.0-pro-exp-02-05 > gemini-2.0-flash-lite-preview-02-05 > gemini-2.0-flash

Why? Because aistudio describes gemini-2.0-flash-thinking-exp-01-21 as being able to tackle most complex and difficult tasks while gemini-2.0-pro-exp-02-05 and gemini-2.0-flash-lite-preview-02-05 only differs with how much context they can handle.

So with that out of the way, how does Gemini-2.0-flash-thinking-exp-01-21 compare against o3-mini, Qwen 2.5 Max, Kimi k1.5, DeepSeek R1, DeepSeek V3 and Sonnet 3.5?

My current list of benchmarks I go through is artificialanalysis.ai, lmarena.ai, livebench.ai and aider.chat:s polygot benchmark but still, the whole Gemini suite is difficult to reason and sort out.

I feel like this trend of having many different models with the same name but different suffix starts be an obstacle to my mental model.

replies(1): >>42990826 #

70. justanotheratom ◴[05 Feb 25 17:49 UTC] No.42952327[source]▶

>>42950852 #

They actually have two "studios"

Google AI Studio and Google Cloud Vertex AI Studio

And both have their own documentation, different ways of "tuning" the model.

Talk about shipping the org chart.

replies(3): >>42953347 #>>42955153 #>>42955945 #

71. easychris ◴[05 Feb 25 17:50 UTC] No.42952349{4}[source]▶

>>42951952 #

Yes, very frustrating for me as well. I consider now purchasing Gemini Advance with another Non-Workspace account. :-(

I also found this [1]: “ Important:

A chat can only use one model. If you switch between models in an existing chat, it automatically starts a new chat. If you’re using Gemini Apps with a work or school Google Account, you can’t switch between models. Learn more about using Gemini Apps with a work or school account.”

I have no idea why the workspace accounts are such restricted.

[1] https://support.google.com/gemini/answer/14517446?hl=en&co=G...

72. Alifatisk ◴[05 Feb 25 17:52 UTC] No.42952368[source]▶

>>42952256 #

The huge context window is a big selling point.

73. Alifatisk ◴[05 Feb 25 17:52 UTC] No.42952382[source]▶

>>42951981 #

Which Gemini model is notebooklm using atm? Have they switched yet?

replies(1): >>42953069 #

74. Alifatisk ◴[05 Feb 25 17:53 UTC] No.42952393[source]▶

>>42952286 #

Gemini can in theory handle 10M tokens, I remember they saying it in one of their presentations.

75. cheeze ◴[05 Feb 25 17:54 UTC] No.42952406{3}[source]▶

>>42951885 #

Just wait for One X, S, Series X, Series X Pro, Series X Pro with Super Fast Charging 2.0

replies(1): >>42956501 #

76. Alifatisk ◴[05 Feb 25 17:55 UTC] No.42952424{3}[source]▶

>>42951235 #

Oh so the previous model gemini-exp-1206 is now gemini-2.0-pro-experimental on aistudio? Is it better than gemini-2.0-flash-thinking-exp?

replies(1): >>42957170 #

77. Alifatisk ◴[05 Feb 25 17:58 UTC] No.42952471[source]▶

>>42952017 #

> In my experience, I'd reach for Gemini 2.0 Flash over 4o

Why not use o1-mini?

replies(1): >>42952648 #

78. simonw ◴[05 Feb 25 17:58 UTC] No.42952478[source]▶

>>42950454 (OP) #

I upgraded my llm-gemini plugin to handle this, and shared the results of my "Generate an SVG of a pelican riding a bicycle" benchmark here: https://simonwillison.net/2025/Feb/5/gemini-2/

The pricing is interesting: Gemini 2.0 Flash-Lite is 7.5c/million input tokens and 30c/million output tokens - half the price of OpenAI's GPT-4o mini (15c/60c).

Gemini 2.0 Flash isn't much more: 10c/million for text/image input, 70c/million for audio input, 40c/million for output. Again, cheaper than GPT-4o mini.

replies(5): >>42952546 #>>42953320 #>>42954077 #>>42954864 #>>42955373 #

79. jug ◴[05 Feb 25 17:59 UTC] No.42952496{3}[source]▶

>>42951056 #

Google is the least confusing to me. Old school version number and Pro is better than Flash which is fast and for "simple" stuff (which can be effortless intermediate level coding at this point).

OpenAI is crazy. There may be a day when we might have o5 that is reasoning and 5o that is not, and where they belong to different generations too, snd where "o" meant "Omni" despite o1-o3 not being audiovisual anymore like 4o.

Anthropic crazy too. Sonnets and Haikus, just why... and a 3.5 Sonnet that was released in October that was better than 3.5 Sonnet. (Not a typo) And no one knows why there never was a 3.5 Opus.

replies(3): >>42952907 #>>42953547 #>>42956515 #

80. monsieurbanana ◴[05 Feb 25 18:00 UTC] No.42952519[source]▶

>>42952286 #

Maybe someone knows, what's the usual recommendation regarding big context windows? Is it safe to use it to the max, or performance will degrade and we should adapt the maximum to our use case?

81. KTibow ◴[05 Feb 25 18:01 UTC] No.42952542[source]▶

>>42951897 #

The increase is likely because 1.5 Flash was actually cheaper than all other STT services. I wrote about this a while ago at https://ktibow.github.io/blog/geminiaudio/.

replies(1): >>42953271 #

82. zamadatix ◴[05 Feb 25 18:02 UTC] No.42952546[source]▶

>>42952478 #

Is there a way to see/compare the shared results for all of the LLMs you've tested this prompt on in one place? The 2.0 pro result seems decent but I don't have a baseline if that's because it is or if the other 2 are just "extremely bad" or something.

replies(1): >>42953688 #

83. Topfi ◴[05 Feb 25 18:03 UTC] No.42952569[source]▶

>>42952286 #

We have heard this before when 100k and 200k were first being normalized by Anthropic way back when and I tend to be skeptical in general when it comes to such predictions, but in this case, I have to agree.

Having used the previews for the last few weeks with different tasks and personally designed challenges, what I found is that these models are not only capable of processing larger context windows on paper, but are also far better at actually handling long, dense, complex documents in full. Referencing back to something upon specific request, doing extensive rewrites in full whilst handling previous context, etc. These models also have handled my private needle in haystack-type challenges without issues as of yet, though those have been limited to roughly 200k in fairness. Neither Anthropics, OpenAIs, Deepseeks or previous Google models handled even 75k+ in any comparable manner.

Cost will of course remain a factor and will keep RAG a viable choice for a while, but for the first time I am tempted to agree that someone has delivered a solution which showcases that a larger context window can in many cases work reliably and far more seemlessly.

Is also the first time a Google model actually surprised me (positively), neither Bard, nor AI answers or any previous Gemini model had any appeal to me, even when testing specificially for what other claimed to be strenghts (such as Gemini 1.5s alleged Flutter expertise which got beaten by both OpenAI and Anthropics equivalent at the time).

84. serjester ◴[05 Feb 25 18:04 UTC] No.42952594[source]▶

>>42950454 (OP) #

For anyone that parsing PDF's this is a game changer in term of price per dollar - I wrote a blog about it [1]. I think a lot of people were nervous about pricing since they released the beta, and although it's slightly more expensive than 1.5 Flash, this is still incredibly cost-effective. Looking forward to also benchmarking the lite version.

[1] https://www.sergey.fyi/articles/gemini-flash-2

85. jbarrow ◴[05 Feb 25 18:08 UTC] No.42952648{3}[source]▶

>>42952471 #

Mostly because OpenAI's vision offerings aren't particularly compelling:

- 4o can't really do localization, and ime is worse than Gemini 2.0 and Qwen2.5 at document tasks

- 4o mini isn't cheaper than 4o for images because it uses a lot of tokens per image compared to 4o (~5600/tile vs 170/tile, where each tile is 512x512)

- o1 has support for vision but is wildly expensive and slow

- o3-mini doesn't yet have support for vision, and o1-mini never did

86. duxup ◴[05 Feb 25 18:10 UTC] No.42952683{3}[source]▶

>>42951630 #

Online the idea of "no politics" is often used as a way to try to stifle / silence discussion too. It's disturbingly fitting to the Gemini example.

I was a part of a nice small forum online. Most posts were everyday life posts / personal. The person who ran it seemed well meaning. Then a "no politics" rule appeared. It was fine for a while. I understood what they meant and even I only want so much outrage in my small forums.

Yet one person posted about how their plans to adopt were in jeopardy over their state's new rules about who could adopt what child. This was a deeply important and personal topic for that individual.

As you can guess the "no politics" rule put a stop to that. The folks who supported laws like were being proposed of course thought that they shouldn't discuss it because it is "politics", others felt that this was that individual talking about their rights and life, it wasn't "just politics". Whole forum fell apart after that debacle.

Gemini's response here is sadly fitting internet discourse... in bad way.

replies(3): >>42953298 #>>42960128 #>>42962525 #

87. weatherlite ◴[05 Feb 25 18:14 UTC] No.42952735[source]▶

>>42950815 #

Gemini Applets

88. w0m ◴[05 Feb 25 18:16 UTC] No.42952781{5}[source]▶

>>42950997 #

The issue wasn't inconsistency it was "had no home integration at all" at launch. They rushed to roll out the 'new' assistant and didn't bother waiting for the basic feature set first.

Today; it works ~perfectly for TV control/Alarm setting - I can't think of it not working first try in the last month or so for me. Maybe more consistent than prior?

The rollout was simply borked from the PM/Decision making side.

89. weatherlite ◴[05 Feb 25 18:20 UTC] No.42952834[source]▶

>>42951289 #

Only DeepSeek is allowed to take ideas from everyone else?

90. behnamoh ◴[05 Feb 25 18:20 UTC] No.42952840[source]▶

>>42951332 #

The number one reason I don't use Google Gemini is because they truncate the input text. So I can't simply paste long documents or other kinds of things as raw text in the prompt box.

replies(1): >>42953004 #

91. browningstreet ◴[05 Feb 25 18:25 UTC] No.42952904[source]▶

>>42951343 #

What do you mean by an app? I have a Gemini app on my iPhone.

92. ◴[05 Feb 25 18:25 UTC] No.42952907{4}[source]▶

>>42952496 #

93. nudpiedo ◴[05 Feb 25 18:30 UTC] No.42952974[source]▶

>>42951332 #

Today I wasted 1 hour looking in how to use or where to find "Deep Research”.

I could not. I have the business workplace standard, which contains the Gemini advance, not sure whether I need a VPN, pay a separate AI product, or even pay a higher workplace tier or what the heck is going on at all.

There are so many confusing products interrelated and lack of focus everywhere that I really do not know anymore whether it is worth as an AI provider.

replies(2): >>42953115 #>>42961561 #

94. coolgoose ◴[05 Feb 25 18:32 UTC] No.42953003[source]▶

>>42951332 #

Plus one on this it's so stupid, but also mandatory in a way. Sigh

95. radeeyate ◴[05 Feb 25 18:32 UTC] No.42953004{3}[source]▶

>>42952840 #

If you have the need to paste long documents, why don't you just upload the file at that point?

replies(3): >>42953066 #>>42953094 #>>42954880 #

96. mmanfrin ◴[05 Feb 25 18:32 UTC] No.42953009[source]▶

>>42950454 (OP) #

It sure is cool that people who joined Google's pixel pass continue to be unable to give them money to access Advanced.

97. staticman2 ◴[05 Feb 25 18:34 UTC] No.42953026[source]▶

>>42950454 (OP) #

I have a fun query in AI studio where I pasted a 800,000 token Wuxia martial arts novel and ask it worldbuilding questions.

1.5 pro and the old 2.0 flash experimental generated responses in AI studio but the new 2.0 models respond with blank answers.

I wonder if it's timing out or some sort of newer censorship models is preventing 2.0 from answering my query. The novel is pg-13 at most but references to "bronze skinned southern barbarians" "courtesans" "drugs" "demonic sects" and murder could I guess set it off.

replies(1): >>42961431 #

98. butlike ◴[05 Feb 25 18:35 UTC] No.42953042{5}[source]▶

>>42951105 #

Why the p[artition?

99. heavyarms ◴[05 Feb 25 18:37 UTC] No.42953066{4}[source]▶

>>42953004 #

The last time I checked (a few days ago) it only had an "Upload Image" option... and I have been playing with Gemini on and off for months and I have never been able to actually upload an image.

It's basically what I've come to expect from most Google products at this point: half-baked, buggy, confusing, not intuitive.

replies(1): >>42957873 #

100. msuvakov ◴[05 Feb 25 18:37 UTC] No.42953069{3}[source]▶

>>42952382 #

Not sure. I am using models/API keys from https://aistudio.google.com. They just added new models, e.g., gemini-2.0-pro-exp-02-05. Exp models are free of charge with some daily quota depending on model.

101. Xiol32 ◴[05 Feb 25 18:39 UTC] No.42953094{4}[source]▶

>>42953004 #

Friction.

replies(1): >>42954562 #

102. gerad ◴[05 Feb 25 18:40 UTC] No.42953115{3}[source]▶

>>42952974 #

You need to pay for Gemini to access it. In my experience, it's not worth it. So much potential in the experience, but the AI isn't good enough.

I'm curious about the OpenAI alternative, but am not willing to pay $200/month.

replies(3): >>42953489 #>>42959390 #>>42963144 #

103. dpkirchner ◴[05 Feb 25 18:41 UTC] No.42953133{3}[source]▶

>>42951056 #

Mistral vs mistral.rs, Llama and llama.cpp and ollama, groq and grok. It's all terrible.

replies(1): >>42953682 #

104. barrenko ◴[05 Feb 25 18:47 UTC] No.42953230[source]▶

>>42950454 (OP) #

If you're Google and you're reading, please offer finetuning on multi-part dialogue.

105. radeeyate ◴[05 Feb 25 18:49 UTC] No.42953271{3}[source]▶

>>42952542 #

I feel that the audio interpreting aspects of the Gemini models aren't just STT. If you give it something like a song, it can give you information about it.

106. FeepingCreature ◴[05 Feb 25 18:51 UTC] No.42953298{4}[source]▶

>>42952683 #

To be honest, the limiting factor is often competent moderation.

replies(1): >>42953602 #

107. iimaginary ◴[05 Feb 25 18:53 UTC] No.42953320[source]▶

>>42952478 #

The only benchmark worth paying attention to.

108. bn-l ◴[05 Feb 25 18:55 UTC] No.42953347{3}[source]▶

>>42952327 #

> Talk about shipping the org chart.

Good expression. I’ve been thinking about a way to say exactly this.

replies(2): >>42954666 #>>42955101 #

109. crowcroft ◴[05 Feb 25 18:58 UTC] No.42953382[source]▶

>>42950454 (OP) #

Worth noting that with 2.0 they're now offering free search tool use for 1,500 queries per day.

Their search costs 7x Perplexity Sonar's but imagine a lot of people will start with Google given they can get a pretty decent amount of search for free now.

110. whynotminot ◴[05 Feb 25 18:58 UTC] No.42953389{3}[source]▶

>>42951310 #

I tend to prefer Duff Original Dry and Lite, but that’s just me

replies(1): >>42959810 #

111. vel0city ◴[05 Feb 25 18:59 UTC] No.42953404{4}[source]▶

>>42952009 #

As someone with over a decade of Google Apps management history, my experiences is Workspace customers are practically always the last to get the shiny new features. Quite frustrating.

replies(2): >>42954647 #>>42964215 #

112. vldmrs ◴[05 Feb 25 19:00 UTC] No.42953418[source]▶

>>42951332 #

This is funny how bad UI is on some of websites which are considered the best. Today I tried to find prices for Mistral models but I couldn’t. Their prices page leads to 404…

replies(2): >>42954153 #>>42955991 #

113. nudpiedo ◴[05 Feb 25 19:06 UTC] No.42953489{4}[source]▶

>>42953115 #

if it would make whole market research on products and companies I would gladly pay for it... but a bit unsure from Europe where it seems to be everything restricted due political boundaries.

replies(1): >>42956177 #

114. tremarley ◴[05 Feb 25 19:07 UTC] No.42953495[source]▶

>>42951472 #

Thankfully the names aren’t as bad as how Sony names their products like earphones

115. mistrial9 ◴[05 Feb 25 19:11 UTC] No.42953544[source]▶

>>42950454 (OP) #

Why does no one mention that you must login with a Google account, with all of the record keeping, cross correlations and 3rd party access implied there..

116. NitpickLawyer ◴[05 Feb 25 19:11 UTC] No.42953547{4}[source]▶

>>42952496 #

> And no one knows why there never was a 3.5 Opus.

If you read between the lines it's been pretty clear. The top labs are keeping the top models in house and use them to train the next generation (either SotA or faster/cheaper etc).

117. duxup ◴[05 Feb 25 19:14 UTC] No.42953602{5}[source]▶

>>42953298 #

Yup.

I sometimes wish magically there could be a social network of:

1. Real people / real validated names and faces.

2. Paid for by the users...

3. Competent professional moderation.

Don't get me wrong I like my slices of anonymity, and free services, but my positive impressions of such products is waning fast. Over time I want more real...

118. danielbln ◴[05 Feb 25 19:21 UTC] No.42953682{4}[source]▶

>>42953133 #

Claude Sonnet 3.5...no, not that 3.5, the new 3.5. o3-mini, no not o2. yes there was o1, yes it's better than gpt-4o.

replies(1): >>42965138 #

119. nolist_policy ◴[05 Feb 25 19:21 UTC] No.42953688{3}[source]▶

>>42952546 #

Search by tag: https://simonwillison.net/tags/pelican-riding-a-bicycle/

120. redcobra762 ◴[05 Feb 25 19:22 UTC] No.42953705{3}[source]▶

>>42951630 #

Eh, OP isn't stopped from talking politics, Gemini('s owner, Google) is merely exercising its right to avoid talking about politics with OP. That said, the restriction seems too tight, since merely mentioning Obama ought not count as "politics". From a technical perspective that should be fixed.

OP can go talk politics until he's blue in the face with someone willing to talk politics with them.

121. blihp ◴[05 Feb 25 19:42 UTC] No.42953980[source]▶

>>42952256 #

I don't think your take is incorrect. I give it a try from time to time and it's always been inferior to other offerings for me every time I've tested it. Which I find a bit strange as NotebookLM (until recently) had been great to use. Whatever... there are plenty of other good options out there.

122. Skunkleton ◴[05 Feb 25 19:44 UTC] No.42954011[source]▶

>>42951472 #

Haiku/sonnet/opus are easily the best named models imo.

replies(4): >>42955610 #>>42956509 #>>42956599 #>>43039532 #

123. soulofmischief ◴[05 Feb 25 19:46 UTC] No.42954050{3}[source]▶

>>42950921 #

Yeah some of us have been working on agents predominately for years now, but at least people are finally paying attention. Can't wait to be told how I'm following a hype cycle again.

124. qingcharles ◴[05 Feb 25 19:47 UTC] No.42954077[source]▶

>>42952478 #

Not a bad pelican from 2.0 Pro! The singularity is almost upon us :)

125. seydor ◴[05 Feb 25 19:51 UTC] No.42954125{3}[source]▶

>>42951526 #

.SWF is all we need

126. PhilippGille ◴[05 Feb 25 19:53 UTC] No.42954153{3}[source]▶

>>42953418 #

Just in case you're still interested in their pricing, it's towards the bottom of [1], section "How to buy", when changing the selection from "Self-hosted" to "Mistral Cloud".

[1] https://mistral.ai/en/products/la-plateforme

127. lamuswawir ◴[05 Feb 25 19:58 UTC] No.42954244[source]▶

>>42951472 #

Flash Lite is the least bad.

128. torginus ◴[05 Feb 25 20:01 UTC] No.42954277[source]▶

>>42952286 #

That's not really my experience. Error rate goes up the more stuff you cram into the context, and processing gets both slower and more expensive with the amount of input tokens.

I'd say it makes sense to do RAG even if your stuff fits into context comfortably.

replies(1): >>42954311 #

129. lamuswawir ◴[05 Feb 25 20:01 UTC] No.42954282{3}[source]▶

>>42950923 #

Dreamweaver!

130. Hrun0 ◴[05 Feb 25 20:02 UTC] No.42954290[source]▶

>>42951240 #

Some of my saved bookmarks:

- https://aider.chat/docs/leaderboards/

- https://www.prollm.ai/leaderboard

- https://www.vellum.ai/llm-leaderboard

- https://lmarena.ai/?leaderboard

131. jay_kyburz ◴[05 Feb 25 20:02 UTC] No.42954300{6}[source]▶

>>42952032 #

Australia's ABC makes it fairly clear who is killing who but also manages to avoid taking sides.

132. lamuswawir ◴[05 Feb 25 20:03 UTC] No.42954311{3}[source]▶

>>42954277 #

Try exp-1206. That thing works on large context.

133. panarky ◴[05 Feb 25 20:13 UTC] No.42954466{3}[source]▶

>>42951760 #

If you subscribe to Gemini the menu looks like this, with the addition of 2.0 Pro.

https://imgur.com/a/xZ7hzag

replies(1): >>42956390 #

134. johnisgood ◴[05 Feb 25 20:21 UTC] No.42954562{5}[source]▶

>>42953094 #

Claude automatically uploads it as "Pasted text" if it is too long that you paste into the textarea. Works either way anyways.

135. basch ◴[05 Feb 25 20:29 UTC] No.42954647{5}[source]▶

>>42953404 #

Isn't that generally how it goes? Windows Vista was tested on consumers to make 7 Enterprise appropriate?

replies(1): >>42959918 #

136. lelandfe ◴[05 Feb 25 20:30 UTC] No.42954666{4}[source]▶

>>42953347 #

A pithy reworking of Conway's Law https://en.wikipedia.org/wiki/Conway%27s_law

replies(1): >>42959223 #

137. ◴[05 Feb 25 20:31 UTC] No.42954680[source]▶

>>42951372 #

138. petre ◴[05 Feb 25 20:34 UTC] No.42954716[source]▶

>>42951372 #

I find it kind of useless due to the no politics and I usually quickly lose my patience with it. Same with DeepSeek. Meanwhile you can have a decent conversation with Mistral, Claude, pi.ai and other LLMs. Even Chat GPT, although the patronizing appologizing tone is annoying.

replies(1): >>42954833 #

139. m_ppp ◴[05 Feb 25 20:39 UTC] No.42954763[source]▶

>>42950454 (OP) #

I'm interested to know how well video processing works here. Ran into some problems when I was using vertex to serve longer youtube videos.

140. greenavocado ◴[05 Feb 25 20:44 UTC] No.42954833{3}[source]▶

>>42954716 #

Can censorship damage to LLMs be mitigated with LoRA fine-tuning?

141. ◴[05 Feb 25 20:46 UTC] No.42954864[source]▶

>>42952478 #

142. bigstrat2003 ◴[05 Feb 25 20:47 UTC] No.42954874{3}[source]▶

>>42951630 #

There's nothing wrong with (and in fact much to be said in favor of) a "no politics" rule. When I was growing up it was common advice to not discuss politics/religion in mixed company. At one point I thought that was stupid fuddy-duddy advice, because people are adults and can act reasonably even if they disagree. But as I get older, I realize that I was wrong: people really, really can't control their emotions when politics comes up and it gets ugly. Turns out that the older generation was correct, and you really shouldn't talk politics in mixed company.

Obviously in this specific case the user isn't trying to talk politics, but the rule isn't dystopian in and of itself. It's simply a reflection of human nature, and that someone at Google knows it's going to be a lot of trouble for no gain if the bot starts to get into politics with users.

replies(3): >>42956231 #>>42956712 #>>42965288 #

143. behnamoh ◴[05 Feb 25 20:47 UTC] No.42954880{4}[source]▶

>>42953004 #

Because sometimes the text is the result of my whisper transcription.

144. everdrive ◴[05 Feb 25 20:49 UTC] No.42954903[source]▶

>>42951372 #

This is AI. Someone else decides what topics and what answers are acceptable.

145. ssijak ◴[05 Feb 25 20:50 UTC] No.42954910[source]▶

>>42950852 #

Working with google APIs is often an exercise in frustration. I like their base cloud offering the best actually, but their additional APIs can be all over the place. These AI related are the worst.

replies(1): >>42974438 #

146. echelon ◴[05 Feb 25 21:04 UTC] No.42955101{4}[source]▶

>>42953347 #

I love this phrase so much.

Google still has some unsettled demons.

147. jiggawatts ◴[05 Feb 25 21:08 UTC] No.42955153{3}[source]▶

>>42952327 #

> Talk about shipping the org chart.

To be fair, Microsoft has shipped like five AI portals in the last two years. Maybe four — I don’t even know any more. I’ve lost track of the renames and product (re)launches.

replies(2): >>42955296 #>>42957940 #

148. felixg3 ◴[05 Feb 25 21:18 UTC] No.42955296{4}[source]▶

>>42955153 #

They made a new one to unite them all: Microsoft Fabric.

https://xkcd.com/927/

replies(2): >>42957613 #>>42961048 #

149. mattlondon ◴[05 Feb 25 21:24 UTC] No.42955373[source]▶

>>42952478 #

The SVGs are starting to look actually recognisable! You'll need a new benchmark soon :)

150. bionhoward ◴[05 Feb 25 21:34 UTC] No.42955508[source]▶

>>42950454 (OP) #

I always get to, “You may not use the Services to develop models that compete with the Services (e.g., Gemini API or Google AI Studio).” [1] and exit

- [1] https://ai.google.dev/gemini-api/terms

replies(2): >>42957401 #>>42961808 #

151. foresto ◴[05 Feb 25 21:39 UTC] No.42955578[source]▶

>>42950454 (OP) #

Not to be confused with Project Gemini.

https://geminiprotocol.net/

152. vok ◴[05 Feb 25 21:40 UTC] No.42955580[source]▶

>>42951472 #

https://www.smbc-comics.com/comic/version

153. throwaway314155 ◴[05 Feb 25 21:42 UTC] No.42955610{3}[source]▶

>>42954011 #

you mean sonnet-3.5 (first edition, second edition)?

replies(1): >>42956490 #

154. user3939382 ◴[05 Feb 25 21:50 UTC] No.42955740[source]▶

>>42950454 (OP) #

I wonder how common this is, but my interest in this product is 0 simply because my level of trust and feeling of goodwill for Google almost couldn’t be lower.

155. itissid ◴[05 Feb 25 22:05 UTC] No.42955945{3}[source]▶

>>42952327 #

I wonder what changelog of the two studio products tell us about internal org fights(strifes)?

156. sylware ◴[05 Feb 25 22:05 UTC] No.42955951[source]▶

>>42950454 (OP) #

This is a lie since I don't have a google account, and cannot search on google anymore since noscript/basic (x)html browsers interop was broken a few weeks ago.

157. behnamoh ◴[05 Feb 25 22:09 UTC] No.42955991{3}[source]▶

>>42953418 #

if only these models were good at web development and could be used in agentic frameworks to build high quality website... wait...

replies(1): >>42961399 #

158. TuxSH ◴[05 Feb 25 22:16 UTC] No.42956069[source]▶

>>42950830 #

Bad (though I haven't tested autocompletion). It's underperforming other models on livebench.ai.

With Copilot Pro and DeepSeek's website, I ran "find logic bugs" on a 1200 LOC file I actually needed code review for:

- DeepSeek R1 found like 7 real bugs out of 10 suggested with the remaining 3 being acceptable false positives due to missing context

- Claude was about the same with fewer remaining bugs; no hallucinations either

- Meanwhile, Gemini had 100% false positive rate, with many hallucinations and unhelpful answers to the prompt

I understand Gemini 2.0 is not a reasoning model, but DeepClaude remains the most effective LLM combo so far.

replies(1): >>42958483 #

159. jug ◴[05 Feb 25 22:18 UTC] No.42956107{4}[source]▶

>>42951784 #

Yeah, they need something in their system prompt to tell their name or else they have absolutely no idea what they are and will hallucinate to 100% based on training data. If you're lucky, the AI just might guess right based on these circumstances.

It's not unusual for AI's to think they're OpenAI/ChatGPT because it's become so popular that it's leaked into the buzz it's trained on.

160. falcor84 ◴[05 Feb 25 22:22 UTC] No.42956145{5}[source]▶

>>42951839 #

I don't know if I trust them as such, but they're doing it anyway, so I'd appreciate it being more explicit.

Also, as long as it's not training the whole model on the fly as with the Tay fiasco, I'd actually be quite interested in an LLM that would debate you and possibly be convinced and change its stance for the rest of that conversation with you. "Strong opinions weakly held" and all.

161. A_D_E_P_T ◴[05 Feb 25 22:25 UTC] No.42956177{5}[source]▶

>>42953489 #

I have OpenAI Deep Research access in Europe and it is extremely good. It's also particularly good at niche market research in products and companies.

Happy to give you a demo. If you want to send me a prompt, I can share a link to the resulting output.

replies(2): >>42957067 #>>42960689 #

162. tmaly ◴[05 Feb 25 22:27 UTC] No.42956223[source]▶

>>42950454 (OP) #

I am on the iOS app and I see Gemini 2.0 and Gemini 1.5 as options in the drop down. I am on free tier

replies(1): >>42956271 #

163. avar ◴[05 Feb 25 22:27 UTC] No.42956231{4}[source]▶

>>42954874 #

As an outsider's perspective: This aspect of American culture seems self-reinforcing.

It's not like things can't get heated when people in much of the rest of the world discuss politics.

But if the subject isn't entirely verboten, adults will have some practice in agreeing to disagree, and moving on.

With AI this particular cultural export has gone from a quaint oddity, to something that, as a practical matter, can be really annoying sometimes.

replies(3): >>42957443 #>>42959154 #>>42960632 #

164. dtquad ◴[05 Feb 25 22:30 UTC] No.42956271[source]▶

>>42956223 #

Try the Gemini webapp. It has a powerful reasoning model with Google Search and Maps integration.

165. danenania ◴[05 Feb 25 22:38 UTC] No.42956370{4}[source]▶

>>42951817 #

I think it will still have the politics of its creators even if it’s censored with a superficial “no politics” rule. Politics is adjacent to almost everything, so it’s going to leak through no matter what you talk about.

166. glerk ◴[05 Feb 25 22:40 UTC] No.42956390{4}[source]▶

>>42954466 #

It doesn't for workspace users. No dropdown appears.

167. CSMastermind ◴[05 Feb 25 22:42 UTC] No.42956407[source]▶

>>42950454 (OP) #

Is it still the case that it doesn't really support video input?

As in I have a video file I want to send it to the model and get a response about it. Not their 'live stream' or whatever functionality.

168. kridsdale3 ◴[05 Feb 25 22:50 UTC] No.42956490{4}[source]▶

>>42955610 #

I have a signed copy of the first-edition

169. kridsdale3 ◴[05 Feb 25 22:50 UTC] No.42956501{4}[source]▶

>>42952406 #

Meanwhile:

Playstation

Playstation 2

Playstation 3

Playstation 4

Playstation 5

replies(2): >>42956578 #>>42956702 #

170. esafak ◴[05 Feb 25 22:51 UTC] No.42956507{4}[source]▶

>>42952260 #

Don't forget the OG, "Beta".

https://en.wikipedia.org/wiki/History_of_Gmail#Extended_beta...

171. risho ◴[05 Feb 25 22:51 UTC] No.42956509{3}[source]▶

>>42954011 #

as a person who thought they were arbitrary names when i first discovered them and spent an hour trying to figure out the difference i disagree. it gets even more confusion when you realize that opus, which according to their silly naming scheme is supposed to be the biggest and best model they offer is seemingly abandoned and that title has been given to sonnet which is supposed to be the middle of the road model.

172. esafak ◴[05 Feb 25 22:52 UTC] No.42956515{4}[source]▶

>>42952496 #

4o is a more advanced model than o1 or o3, right!?

replies(2): >>42957322 #>>42963561 #

173. miohtama ◴[05 Feb 25 22:57 UTC] No.42956578{5}[source]▶

>>42956501 #

Unlike Sony, Google attempts to confuse people for people to use their limited models as they are free to run and most won't pay.

174. ◴[05 Feb 25 23:00 UTC] No.42956599{3}[source]▶

>>42954011 #

175. cybertronic ◴[05 Feb 25 23:11 UTC] No.42956702{5}[source]▶

>>42956501 #

actually just for the PS5 we have: PS5 Standard Edition, PS5 Digital Edition, PS5 First Chassis Revision, PS5 Second Chassis Revision, PS5 Slim Standard Edition, PS5 Slim Digital Edition, PS5 Pro

replies(1): >>42962167 #

176. 63 ◴[05 Feb 25 23:12 UTC] No.42956712{4}[source]▶

>>42954874 #

It's challenging because one person's "politics" is another person's "my extended family is being actively slaughtered in the middle east, how can I think or talk about anything else." Is the person wrong for that? The line for what is and isn't politics is incredibly blurry and is almost always drawn on the side of whoever already has the most privilege

replies(3): >>42960617 #>>42963650 #>>42973226 #

177. ◴[05 Feb 25 23:20 UTC] No.42956781[source]▶

>>42951332 #

178. nudpiedo ◴[05 Feb 25 23:48 UTC] No.42957067{6}[source]▶

>>42956177 #

I wish to know how to access it for myself, but how may I contact you?

is it chatGPT Plus, Pro or something else? Where are you based UK or Germany?

I was hoping for a market research on SysML tooling.

replies(2): >>42960297 #>>42960366 #

179. kridsdale3 ◴[05 Feb 25 23:55 UTC] No.42957135[source]▶

>>42952100 #

To be fair, the best human programmers struggle like hell with date math.

180. gpm ◴[06 Feb 25 00:00 UTC] No.42957170{4}[source]▶

>>42952424 #

It's not a thinking model, and as such is probably going to be good and bad at different things. For the handful of prompts I've thrown at both, it (exp-1206) was generally better than gemin-2.0-flash-thinking-exp, but that could easily be a property of the prompts I tried.

181. vitorgrs ◴[06 Feb 25 00:17 UTC] No.42957322{5}[source]▶

>>42956515 #

Imagine when they launch o4...

182. BoorishBears ◴[06 Feb 25 00:27 UTC] No.42957401[source]▶

>>42955508 #

I just ignore that. If I'm ever large enough to be worth suing, I'll be very happy.

replies(1): >>42957496 #

183. BoorishBears ◴[06 Feb 25 00:31 UTC] No.42957434[source]▶

>>42951840 #

Except Gemini multimodal outputs are still under lock and key except for a select few.

Very disappointing to see the claim Gemini 2.0 is available for everyone when it's simply not. Seems like Google is following the OpenAI playbook on this.

184. bigstrat2003 ◴[06 Feb 25 00:32 UTC] No.42957443{5}[source]▶

>>42956231 #

Will people really agree to disagree and move on, though? That certainly hasn't been my experience. Over time I have found that some people can do that. For example, my wife and I actually very rarely agree on political matters, but when we do discuss them I know that neither of us will belittle the other or walk away thinking the other is a horrible monster. But that's rare in my experience. So I only talk politics IRL with people who I know have that ability, and it takes time to feel out who can and can't be trusted to do that. Thus, I don't talk politics in mixed company.

I also think another aspect of the "no politics" rule which is important is that it attempts to preserve spaces where people can just enjoy things. People need to escape from politics and just enjoy the good things in life together. This is important for personal mental health but also social cohesion, as it's extremely difficult to have positive relationships with those you only ever argue politics with. If we don't have spaces which enforce a no politics rule, you can't ever unplug from the madness and that isn't good.

185. danskeren ◴[06 Feb 25 00:39 UTC] No.42957496{3}[source]▶

>>42957401 #

They don't need to sue, they'll just ban your account with no warning or explanation the moment you get on their radar.. at least that's what they did to me.

replies(1): >>42958025 #

186. jiggawatts ◴[06 Feb 25 00:57 UTC] No.42957613{5}[source]▶

>>42955296 #

Fabric is their unified data and reporting service… which also does a bit of AI (of course).

The latest one is Azure AI Foundry: https://techcommunity.microsoft.com/discussions/marketplace-...

187. heresie-dabord ◴[06 Feb 25 01:08 UTC] No.42957687[source]▶

>>42951372 #

"I can't talk politics."

It's a question of right or wrong.

"I can't talk politics."

It's a question of health care.

"I can't talk politics."

It's a question of fact vs fiction, knowledge vs ignorance.

"I can't talk politics."

You are a slave to a master that does not believe in integrity, ethics, community, and social values.

"I can't talk politics."

188. rezonant ◴[06 Feb 25 01:35 UTC] No.42957873{5}[source]▶

>>42953066 #

It definitely has the ability to upload normal files, the + button has several options.

If you don't have it, you might be in a Google feature flag jail-- this happens frustratingly often, where 99.9% of users have a feature flag enabled but your account just gets stuck with the flag off with no way to resolve it. It's the absolute worst part about Google.

189. justanotheratom ◴[06 Feb 25 01:45 UTC] No.42957940{4}[source]▶

>>42955153 #

not to mention all the Copilots....

190. BoorishBears ◴[06 Feb 25 01:53 UTC] No.42958025{4}[source]▶

>>42957496 #

That's a pretty extraordinary claim that you should expand on.

How did they know you were using Gemini to train another model?

191. mbrock ◴[06 Feb 25 02:08 UTC] No.42958141{3}[source]▶

>>42952141 #

I don't know what they mean by this but the obvious interpretation is not true. It understands other languages, it even does really well with low representation languages, in my case Latvian.

192. xnx ◴[06 Feb 25 02:20 UTC] No.42958220[source]▶

>>42952286 #

> That 1M tokens context window

2M context window on Gemini 2.0 Pro: https://deepmind.google/technologies/gemini/pro/

193. ryao ◴[06 Feb 25 02:57 UTC] No.42958467[source]▶

>>42950454 (OP) #

Google should release the weights under a MIT license like Deepseek.

replies(1): >>42958931 #

194. ryao ◴[06 Feb 25 03:00 UTC] No.42958483{3}[source]▶

>>42956069 #

I have seen Gemini hallucinate ridiculous bugs in a file that had less than 1000 LOC when I was scratching my head over what was wrong. The issue turned out to be that the cbBLAS matrix multiplication functions expected column major indexing while the code expected row major indexing.

195. barapa ◴[06 Feb 25 03:28 UTC] No.42958678[source]▶

>>42951472 #

What do you use LLMs for at rev? And separate question, how does your diarization compare to deepgram or assembly AI.

196. nilamo ◴[06 Feb 25 03:49 UTC] No.42958837{4}[source]▶

>>42951817 #

I think a personal assistant /ai agent that refuses to do its job is a problem, yes. "I'm sorry, Dave, but I'm afraid I can't talk about that."

197. WiSaGaN ◴[06 Feb 25 04:02 UTC] No.42958931[source]▶

>>42958467 #

They won't after OpenAI used transformer and refused to give back.

replies(1): >>42959114 #

198. hoytschermerhrn ◴[06 Feb 25 04:23 UTC] No.42959058{4}[source]▶

>>42952260 #

- Generally Available? Available?? Simply showing a checkmark???

- In Experimental? Coming soon??

Make it make sense.

199. ryao ◴[06 Feb 25 04:34 UTC] No.42959114{3}[source]▶

>>42958931 #

How is that relevant?

200. tptacek ◴[06 Feb 25 04:41 UTC] No.42959154{5}[source]▶

>>42956231 #

I don't know, I compulsively read Wikivoyage pages and pretty much every page I read says "don't talk politics with the locals".

replies(1): >>42959218 #

201. defrost ◴[06 Feb 25 04:52 UTC] No.42959218{6}[source]▶

>>42959154 #

I've actually travelled a lot (ground truthing pre WGS84 map datums and geophysical exploration) and both things can be true;

* locals everywhere discuss politics between themselves, many are able to discuss politics 'reasonably' but things can and do get heated, AND

* it's good advice as a traveller to not get drawn into political discussions with locals. Listen by all means, going further can be a bad move.

I recall a radiometric survey in Nor'Western India when an underground mini nuke was detonated near our aircraft .. that got rather tense, particularly when the others were detonated and Pakistan responded.

Not a good time to discuss where the border ran.

202. mhh__ ◴[06 Feb 25 04:54 UTC] No.42959223{5}[source]▶

>>42954666 #

Subtle takeaway from that article is that he's stil alive and publishing stuff!

203. starchild3001 ◴[06 Feb 25 05:04 UTC] No.42959276[source]▶

>>42950454 (OP) #

I use all top of the line models everyday. Not for coding, but for general "cognitive" tasks like research, thinking, analysis, writing etc. What Google calls Gemini Pro 2.0 has been my most favorite model for the past couple of months. I think o1/4o come pretty close. Those are kinda equals, with a slight preference for Gemini. Claude has fallen behind, clearly. DeepSeek is intriguing. It excels occassionally where others won't. For consistency's sake, Gemini Pro 2.0 is amazing.

I highly recommend using it via https://aistudio.google.com/. Gemini app has some additional bells and whistles, but for some reason quality isn't always on par with aistudio. Also Gemini app seems to have more filters -- it seems more shy answering controversial topics. Just some general impressions.

replies(1): >>42959396 #

204. diamondfist25 ◴[06 Feb 25 05:28 UTC] No.42959390{4}[source]▶

>>42953115 #

Deep research in chatgpt pro is nice. It’ll go fetch api docs (and github examples) and make your code more correct.

Saved me the headache of manually going thru pages of doc

205. tnias23 ◴[06 Feb 25 05:30 UTC] No.42959396[source]▶

>>42959276 #

do you mean "Gemini 2.0 Pro Experimental 02-05"? too many similarly named models

replies(1): >>42959784 #

206. CraigJPerry ◴[06 Feb 25 05:41 UTC] No.42959435{3}[source]▶

>>42951630 #

One plus is that’s usually less pointless noise this way.

If the model says “sorry, no politics, let’s talk about something else” - there’s a tiny fraction of a minority will make a comment like you did and be done with it. We can all move on.

If the model responds as neutrally as possible, maybe “Obama’s chilli is a great recipe, let me know when you want to begin”, we end up with ENDLESS clutching of pearls, WOKE MODEL SUGGESTS LEFT WING CHILLI BETTER THAN RIGHT WING CHILLI!!! CANCEL LEFTIST GOOGLE!!!

And then the bit that actually bugs me, just to stir up some drama you’ll get the occasional person who absolutely knows better and knows exactly what they’re doing: “I’m Professor MegaQualifications, and actually I will show that those who have criticised the models as leftist are being shut down[1] and ignored but the evidence shows they have a point…”

[1] always said unironically at a time when it’s established as a daily recurring news story being rammed down our throats because it’s one of those easy opinion generators that sells engagement like few other stories outside of mass bloodshed events

207. SilverSlash ◴[06 Feb 25 06:33 UTC] No.42959671[source]▶

>>42951472 #

Can you come up with better names?

replies(2): >>42959766 #>>43039562 #

208. jillesvangurp ◴[06 Feb 25 06:45 UTC] No.42959742[source]▶

>>42951472 #

They are probably using their own LLMs to generate the names.

209. gloosx ◴[06 Feb 25 06:51 UTC] No.42959766{3}[source]▶

>>42959671 #

Google AI. Google AI 2. Google AI 3. - for google

AIPal. AltmanAI. PayAI. - for Altman

210. starchild3001 ◴[06 Feb 25 06:53 UTC] No.42959784{3}[source]▶

>>42959396 #

Yes.

replies(1): >>42965815 #

211. TiredOfLife ◴[06 Feb 25 06:57 UTC] No.42959810{4}[source]▶

>>42953389 #

And all three filled from the same tube. That the widescreen version doesn't show because the top is cut off.

212. OccamsMirror ◴[06 Feb 25 07:14 UTC] No.42959918{6}[source]▶

>>42954647 #

Except with Google you might never get the feature.

Still can't manage Google Workspace Calendar with Google Home, for instance. A feature that's been available for personal accounts for years.

replies(1): >>42964227 #

213. tomjuggler ◴[06 Feb 25 07:41 UTC] No.42960059[source]▶

>>42951472 #

Deepseek Reasoner is a pretty good name for a pretty good model I think.. pity the performance is so terrible via the api

214. lynguist ◴[06 Feb 25 07:56 UTC] No.42960128{4}[source]▶

>>42952683 #

I have come to understand that "no politics"/"just politics" means no election campaign talk.

As almost everything that is personal is in some way political (when taking the meaning "what strategy to use for ruling over a city") even the discussion of what politics is can kill discussions. (Like it seems to have happened in your example.)

So my conclusion is you cannot separate "personal" and "political" into completely disjoint categories.

The rule seems to be in place to make discussions not veer off in direction of which policies to apply/to be in favor of which particular politicians (which is nowadays the biggest taboo for a corporate LLM).

215. stavros ◴[06 Feb 25 08:29 UTC] No.42960297{7}[source]▶

>>42957067 #

See the top of the announcement:

https://openai.com/index/introducing-deep-research/

It's Pro only for now.

216. A_D_E_P_T ◴[06 Feb 25 08:41 UTC] No.42960366{7}[source]▶

>>42957067 #

Just post your prompt here and I'll give you a link to the response. Deep research requires pretty detailed prompts in order to generate high-quality reports.

I'm usually in Croatia but am right now in Greece. My account ($200 Pro account) works the same wherever I am, even when I'm outside the EU, e.g. in Serbia.

217. rfoo ◴[06 Feb 25 08:43 UTC] No.42960376{4}[source]▶

>>42951246 #

It's

GPT-3.5 GPT-4 GPT-4-turbo GPT-4o-2024-08-06 GPT-4o GPT-4o-mini o1-preview o1 (low) o1 (medium) o1 (high) o1-mini o3-mini (low) o3-mini (medium) o3-mini (high)

At this pace we are going to get to USB 3.2 Gen 1 real fast.

218. MrScruff ◴[06 Feb 25 09:24 UTC] No.42960617{5}[source]▶

>>42956712 #

That would still be politics, yes. What would be the benefit of discussing it at work? The particular example you give is one which is especially divisive. If someone feels emotionally affected by an issue to the point where they cannot focus on work they should take a leave of absence.

replies(1): >>42960775 #

219. MrScruff ◴[06 Feb 25 09:26 UTC] No.42960632{5}[source]▶

>>42956231 #

It’s not just the US, no-one talked about politics or religion at work in the UK twenty years ago either

220. nudpiedo ◴[06 Feb 25 09:35 UTC] No.42960689{6}[source]▶

>>42956177 #

Thanks for the offering, seriously, it is an important investment/cost I wasn't sure it would be worth right now. (If you wish to I added some contact information, for an easier chat, in my HN profile; just for the following day).

Here is a realistic case I would have used

Short prompt: "research the market of the SysML products and industry cap, key actors, opportunities and advantages"

Expanded prompt:

“Conduct a detailed market analysis of SysML (Systems Modeling Language) products and services, focusing on the following aspects: 1. Industry Overview: • Current market size and growth trends for SysML-related products and services. • Global market cap or valuation of the SysML industry or similar system engineering software. 2. Key Players: • Identify major companies and organizations offering SysML tools, including established leaders and emerging competitors. • Provide a brief description of each key player’s products, innovations, and market share. 3. Product Offerings: • Highlight the range of SysML-based products available (e.g., modeling software, integration tools, training services). • Compare features, target industries, and pricing strategies. 4. Market Opportunities: • Explore new or underserved industries where SysML adoption could grow (e.g., aerospace, automotive, healthcare). • Identify gaps in current product offerings that represent potential areas for innovation or competitive advantage. 5. Competitive Advantages: • Analyze what makes SysML valuable to companies (e.g., improving system complexity management, ensuring design consistency). • Evaluate how SysML tools offer a competitive advantage compared to alternative solutions like UML or other system modeling methods. 6. Challenges and Risks: • Discuss potential challenges such as market saturation, training requirements, or competing technologies. • Highlight external factors that could affect market growth, such as regulations or advances in adjacent industries.

Provide sources where relevant, including reports, studies, or insights from industry experts.”

Hvala.

replies(1): >>42961612 #

221. ◴[06 Feb 25 09:40 UTC] No.42960721{3}[source]▶

>>42951525 #

222. viraptor ◴[06 Feb 25 09:49 UTC] No.42960775{6}[source]▶

>>42960617 #

It's called being an understanding human. Unless you limit the topics to the weather and definitely not in the changing climate style, you will talk about real life quite often. And someone's real life may be a school dropoff one day and a grenade landing on their brother digging trenches in Ukraine the day after. If you hard nope out of tougher subjects, they're unlikely to talk much to you afterwards.

replies(2): >>42962500 #>>42963600 #

223. CRConrad ◴[06 Feb 25 10:41 UTC] No.42961048{5}[source]▶

>>42955296 #

It's funny how by now, you know some xkcd numbers by heart.... Me, I'm a typical 386.

224. darthapple76 ◴[06 Feb 25 11:03 UTC] No.42961211[source]▶

>>42950454 (OP) #

is this AGI yet?

225. DimuP ◴[06 Feb 25 11:17 UTC] No.42961284[source]▶

>>42950454 (OP) #

Finally, it's here!

226. EVa5I7bHFq9mnYK ◴[06 Feb 25 11:27 UTC] No.42961348[source]▶

>>42950454 (OP) #

I am dreaming about an aggregator site where one can select any model - openai, gemini, claude, llama, qwen... and pay API rate + some profit margin for any query. Without having to register with each AI provider, and without sharing my PII with them.

replies(1): >>42961430 #

227. incrudible ◴[06 Feb 25 11:35 UTC] No.42961399{4}[source]▶

>>42955991 #

They aren’t.

replies(1): >>42963300 #

228. artdigital ◴[06 Feb 25 11:40 UTC] No.42961430[source]▶

>>42961348 #

That’s openrouter.ai

replies(1): >>42961459 #

229. champdebloom ◴[06 Feb 25 11:41 UTC] No.42961431[source]▶

>>42953026 #

I can’t say I ever expected to see this here, but thank you for making my day!

I might try this as a filler remover for the novels I find drag on and on.

230. EVa5I7bHFq9mnYK ◴[06 Feb 25 11:44 UTC] No.42961459{3}[source]▶

>>42961430 #

Thanks, registering right now.

replies(1): >>42961594 #

231. jerpint ◴[06 Feb 25 11:59 UTC] No.42961561{3}[source]▶

>>42952974 #

Similar feeling with Gemini g suite integration

232. EVa5I7bHFq9mnYK ◴[06 Feb 25 12:07 UTC] No.42961594{4}[source]▶

>>42961459 #

This one is good, and it even accepts Bitcoins for top-ups. But o3-mini is not available without an api key, so still need an openai subscription ...

replies(1): >>42963957 #

233. A_D_E_P_T ◴[06 Feb 25 12:10 UTC] No.42961612{7}[source]▶

>>42960689 #

Report link: https://chatgpt.com/share/67a4a67b-5060-8005-85b8-65eef3cb60...

Let me know what you think, will ya? I'm curious as to how you'd evaluate the quality of that report.

Also note the follow-up prompt I gave it. This thing needs as much detail as you can give it, and small changes to your prompt can hugely influence the end result.

replies(1): >>42962047 #

234. torginus ◴[06 Feb 25 12:42 UTC] No.42961808[source]▶

>>42955508 #

I wonder what are the legal implications of breaking the ToS - besides termination of your account?

235. nudpiedo ◴[06 Feb 25 13:21 UTC] No.42962047{8}[source]▶

>>42961612 #

Wow, thank you so much, that was 11 pages full pages. I will be a bit critical with it hoping my review helps you as well, since to me, it was very friendly and useful your offer (also I wonder if you can still do 99 more reserachs like this one during the month or this amounts already to many of such research in total).

Superficially:

- Awesome how fast is available and all sources linked to explain each claim

- To export that to a proper doc with footnotes and proper formatting it's already something to be worked by openAI

- it looks like the perfect way to create a gut feeling and a sense of what is going on

Content:

- Wrong studies: It mixed SysML (a particular visual language for which it was specifically requested) with MBSE (the family of tools) which is exactly not the same as the desired study was particular for SysML.

- Quality of data: Most of the data comes from public articles and studies made by others, all the time about MBSE, not SysML, and just quotes their numbers, it does not do its own estimates looking for the benefits of such products on each company and estimating a projection (that would be an actual research, and an AI should be capable of tirelessly do that or even biasedly look for the right pieces if information). For example it was a report on diet, a report like that should avoid debunk articles, bro's blogs, etc.

- Inconsistent scales: at some comparison table, it mentioned at the foot that it will display pricing with such schema:(Pricing: $ = low, $$$$ = high) however it made that in a single row. Why? the source for that field made that as well in its source, but none of the other fields repeated this system to value or adapt results.

- Only googleable data: companies reports or private databases here are key for a high quality report. Sometimes this is not always possible for an AI or a crawler but here am I evaluating the outcome (use case): a market analisis for strategic purposes.

- Quality of the report: Many things mentioned like services around the products are also highly valuable... would be useful to remark case by case the business model of each company and how much is in the product and how much is in the related service (using a pie chats or whatever) and showing particular case studies to remark the market trend, from which model is coming from (product) and where it is going to (services and SaaS).

I could continue longer with many other things and such errors but it is quite long.

Conclusion:

It is very useful, particularly to grasp a general, yet detailed, idea on what is going on, on a market. However it is only as valid as a remix of previous things, not an actual market research for an actual strategy. Many sources, elements, landscape of which companies and products related are there are totally useful, perhaps 30%-40% of the total work and it gives a clear structure where to go from here.

Probably it may improve the more interactive that the tool is, for example asking to correct some sections or improve in specifically suggested ways by the user. Basically the user needs to bring expertise, reasoning from the field and critical thinking (things machines will be lacking in any foreseeable future).

Why am I so critic? Remember, map is not the territory, particularly in strategic terms. And it is also a problem that many professionals do also these kind of failures: uncritically copying data from other's reports without verifying critically any of it which leads to very specific kinds of strategic errors.

It will become more useful as it becomes specialized in the kind of reports, and judges critically (which does not) and if it can be adapted to work on a private repo or preselected amounts of sources, or even prescripted agent behaves for the sort of report.

Verdict:

I would purchase it, not to solve the problem or resell it, but as a way to get started and accelerate the process. It already does what an internship student would do, or a mediocre professional: revamp preexisting mashups to get a general, but detailed, feeling but no more insights or research than what it is already well known (googling after all).

It has a great future as it would be great if such level of non creative work is automated away as their value often is marginal and uncritically propagates previous beliefs and biases (and if there is a centralized tool, that can be tuned to avoid well known issues)

replies(1): >>42967324 #

236. andrewstuart ◴[06 Feb 25 13:28 UTC] No.42962088[source]▶

>>42950454 (OP) #

I’ve tried Gemini many times and never found it to be useful at all compared to OpenAI and Claude.

237. spacebanana7 ◴[06 Feb 25 13:35 UTC] No.42962135{4}[source]▶

>>42951817 #

They could allow users to choose the politics of their models.

238. spacebanana7 ◴[06 Feb 25 13:39 UTC] No.42962167{6}[source]▶

>>42956702 #

In fairness those still follow a relatively straightforward naming convention.

The only thing that's slightly non-intuitive is how standard/digital determines whether a disc can be used to play a game. But every other aspect of the naming could be understood by a person new to the category.

239. bwb ◴[06 Feb 25 14:04 UTC] No.42962402[source]▶

>>42950454 (OP) #

I am blown away by how bad this is compared to the competition. What is Google doing? I asked it to give me something simple like 6 results, and it gives me 3, not to mention bad data hallucinations that ChatGPT and others are working fine with.

240. akimbostrawman ◴[06 Feb 25 14:14 UTC] No.42962500{7}[source]▶

>>42960775 #

Random people aren't your therapist or owe you understanding whatever trauma you pull out of your hat.

241. akimbostrawman ◴[06 Feb 25 14:17 UTC] No.42962525{4}[source]▶

>>42952683 #

AI is online but not human. If a hammer refuses to hammer certain things it is faulty and should be fixed or replaced.

replies(1): >>42963474 #

242. eamag ◴[06 Feb 25 14:35 UTC] No.42962706[source]▶

>>42950454 (OP) #

where can I try coding with it? Is it available in cursor/copilot?

243. ◴[06 Feb 25 15:16 UTC] No.42963144{4}[source]▶

>>42953115 #

244. pjc50 ◴[06 Feb 25 15:22 UTC] No.42963210[source]▶

>>42951332 #

> you get an AI overview answer that seems to be incorrect

It seems AI cannot yet defeat the obfuscation of its own product managers. Great advert for AI, that.

245. behnamoh ◴[06 Feb 25 15:30 UTC] No.42963300{5}[source]▶

>>42961399 #

yes, I was being sarcastic.

246. croisillon ◴[06 Feb 25 15:39 UTC] No.42963388[source]▶

>>42951372 #

on the other hand, i'd be very weary of anyone eating Trump's chili recipe

replies(1): >>42963663 #

247. op00to ◴[06 Feb 25 15:46 UTC] No.42963474{5}[source]▶

>>42962525 #

A hammer that refuses to hammer my thumb when I intend for it to only hammer nails would be amazing.

replies(1): >>42970418 #

248. wellthisisgreat ◴[06 Feb 25 15:53 UTC] No.42963561{5}[source]▶

>>42956515 #

Wait is it?? o1 is more expensive I think

249. MrScruff ◴[06 Feb 25 15:57 UTC] No.42963600{7}[source]▶

>>42960775 #

If you're unable to focus at work because of some personal trauma, then the best thing to do at most companies is talk to your line manager about it to explain the situation. That's entirely reasonable and I would think that most people will respond empathetically in that situation.

I don't think that's typically what is meant when discussing 'leaving politics out of work' though.

replies(1): >>42970518 #

250. adammarples ◴[06 Feb 25 15:59 UTC] No.42963638[source]▶

>>42950454 (OP) #

how many "r"'s in strrawberry

stumped it

251. IncreasePosts ◴[06 Feb 25 16:01 UTC] No.42963650{5}[source]▶

>>42956712 #

Ok, but it does make for poor dinner party conversation.

252. IncreasePosts ◴[06 Feb 25 16:03 UTC] No.42963663{3}[source]▶

>>42963388 #

McDonald's doesn't serve chili

253. RohMin ◴[06 Feb 25 16:27 UTC] No.42963957{5}[source]▶

>>42961594 #

This is a policy enforced by OpenAI, not OpenRouter

254. pmayrgundter ◴[06 Feb 25 16:27 UTC] No.42963959[source]▶

>>42951372 #

Note, had the same convo with ChatG and it blew right by the O word, and commented that it's nice to have an old recipe to work on over time.

replies(1): >>43039521 #

255. glonq ◴[06 Feb 25 16:54 UTC] No.42964215{5}[source]▶

>>42953404 #

Can confirm. I run my family on the free Google Workspace Starter edition that they offered years and years ago. And we are definitely second-class citizens when it comes to old, current, and new Google features.

TBH I'd love to find a way to disband everybody from the workspace but somehow keep their identity, history, photos, etc. Even if it meant getting new @gmail addresses.

256. glonq ◴[06 Feb 25 16:56 UTC] No.42964227{7}[source]▶

>>42959918 #

can't buy a youtube premium family subscription either, which sucks! also can't leave reviews in the google play store.

257. jsight ◴[06 Feb 25 17:19 UTC] No.42964437{4}[source]▶

>>42951784 #

You'd think they would include this kind of detail in the training set, but it gets overlooked.

258. sunaookami ◴[06 Feb 25 18:35 UTC] No.42965138{5}[source]▶

>>42953682 #

Don't forget that o3 is actually the second model in the o-series but can't be named o2 due to obvious reasons.

259. andrewaylett ◴[06 Feb 25 18:49 UTC] No.42965288{4}[source]▶

>>42954874 #

There's a big difference between "no politics" and "no party politics". If it involves people, it involves politics! But it's only party politics if it's discussion that references (possibly very obliquely) a political party.

It's too common that people say "politics" when they mean "party politics", and I know that's not a battle I'm going to win. But it's still necessary to remember that a strict rule of "no politics" is an oxymoron, being itself inherently political.

260. novok ◴[06 Feb 25 19:47 UTC] No.42965815{4}[source]▶

>>42959784 #

How do you get around all the censorship? Gemini is even worse than Claude at times. And it's for things like "Obama's chili recipe" or "find the source of X person's science paper where they talked about Y in another language".

replies(1): >>42965933 #

261. starchild3001 ◴[06 Feb 25 20:00 UTC] No.42965933{5}[source]▶

>>42965815 #

Not a major problem in my experience if you use the aistudio version of the model. Claude tends to be the most conservative in my experience. Gemini 2.0 and 4o appear similar.

262. A_D_E_P_T ◴[06 Feb 25 22:55 UTC] No.42967324{9}[source]▶

>>42962047 #

> It is very useful, particularly to grasp a general, yet detailed, idea on what is going on, on a market. However it is only as valid as a remix of previous things, not an actual market research for an actual strategy. Many sources, elements, landscape of which companies and products related are there are totally useful, perhaps 30%-40% of the total work and it gives a clear structure where to go from here.

> Probably it may improve the more interactive that the tool is, for example asking to correct some sections or improve in specifically suggested ways by the user. Basically the user needs to bring expertise, reasoning from the field and critical thinking (things machines will be lacking in any foreseeable future).

Yeah, that's just the thing. With what you know, you can iterate on the results it gives you. It's very sensitive to how your prompt is written and structured, so some fine tuning, user-provided context, and user expertise, it'll dial-in on any subject very well. It's not top-expert-level yet -- at least not on its own -- but it's close, and it's miles better than asking o1-Pro (or Deepseek r1) for a detailed report.

263. nitwit005 ◴[06 Feb 25 23:18 UTC] No.42967480[source]▶

>>42950852 #

Clearly, the next step is to rename one to "Google Chat".

replies(1): >>42968212 #

264. dontseethefnord ◴[06 Feb 25 23:31 UTC] No.42967567[source]▶

>>42950454 (OP) #

Does it still say that water isn’t frozen at 0 degrees Fahrenheit?

265. pkaye ◴[07 Feb 25 01:02 UTC] No.42968149{4}[source]▶

>>42951246 #

> GPT-4o-2024-08-06 GPT-4o

From what I understand GPT-4o will map to the most recent generated model so it will change over time. The models with the date appended will not change over time.

266. FleetAdmiralJa ◴[07 Feb 25 01:15 UTC] No.42968212{3}[source]▶

>>42967480 #

And kill it the 10th time

267. akimbostrawman ◴[07 Feb 25 07:45 UTC] No.42970418{6}[source]▶

>>42963474 #

But that should be optionally since whatever system you would use to do that can make mistakes too

268. viraptor ◴[07 Feb 25 08:10 UTC] No.42970518{8}[source]▶

>>42963600 #

Not sure where you got the "unable to focus at work" part from. It's not about anything affecting work, just normal conversation.

269. thatguy0900 ◴[07 Feb 25 14:50 UTC] No.42973226{5}[source]▶

>>42956712 #

You could also argue that politics can't be discussed civally in normal conversations because there is a rule not to do it and Noone has the experience. If we all grew up in a environment where people civally discussed politics we might be able to do it. It really wasn't that long ago when mitt Romney defended Obama on stage as a good, honest family man, as impossible to imagine as that is today

270. radicality ◴[07 Feb 25 16:13 UTC] No.42974312[source]▶

>>42951332 #

Google workspace is always such a mess. I also have Google workspace, and it did let me do some chatting in the Gemini app few days ago. No idea what model, and of course there was no dropdown.

Just today I wanted to continue a conversation from two days ago, and after writing to the chat, I just get back an error “This chat was created with Gemini Advanced. Get it now to continue this chat.” And I don’t even know if that’s a bug, or some expected sales funnel where they gave me a nibble of it for free and now want me to pay up.

271. mring33621 ◴[07 Feb 25 16:22 UTC] No.42974438{3}[source]▶

>>42954910 #

heuristic for Google Gemini API usage:

if the model name contains '-exp' or '-preview', then API version is 'v1alpha'

otherwise, use 'v1beta'

272. okdood64 ◴[07 Feb 25 17:42 UTC] No.42975332[source]▶

>>42952286 #

> is going to kill a lot of RAG use cases.

I have a high level understanding of LLMs and am a generalist software engineer.

Can you elaborate on how exactly these insanely large (and now cheap) context windows will kill a lot of RAG use cases?

replies(1): >>42977907 #

273. jiggawatts ◴[07 Feb 25 21:50 UTC] No.42977907{3}[source]▶

>>42975332 #

If a model has 4K input context and you have a document or code base with 40K, then you have to split it up. The system prompt, user prompt, and output token budget all eat into this. You might need hundreds of small pieces, which typically end up in a vector database for RAG retrieval.

With a million tokens you can shove several short books into the prompt and just skip all that. That’s an entire small-ish codebase.

A colleague used a HTML dump of every config and config policy from a Windows network, pasted it into Gemini and started asking questions. It’s just that easy now!

274. Alifatisk ◴[09 Feb 25 14:17 UTC] No.42990826[source]▶

>>42952310 #

Update, I found that AiStudios changelog gives an explanation of where each model is best fit.

  Gemini 2.0 Flash is just Googles production ready model

  Gemini 2.0 Flash-Lite Preview is their smallest model for high volume tasks

  Gemini 2.0 Pro Experimental is the strongest Gemini model

  Gemini 2.0 Flash Thinking Experimental (gemini-2.0-flash-thinking-exp-1219 explicitly shows its thoughts

https://aistudio.google.com/app/changelog#february-5th-2025

https://aistudio.google.com/app/changelog#december-19-2024

275. theragra ◴[10 Feb 25 01:33 UTC] No.42996033[source]▶

>>42951372 #

I asked why Trump likes numbers 4 and 7. Apparently, this is forbidden topic! Google is insane.

276. Breza ◴[13 Feb 25 18:38 UTC] No.43039521{3}[source]▶

>>42963959 #

I just got rejected by Gemini then Claude gave me the exact recipe.

277. Breza ◴[13 Feb 25 18:40 UTC] No.43039532{3}[source]▶

>>42954011 #

I used to agree before one of the "Sonnet" models overtook the best "Opus" model

278. Breza ◴[13 Feb 25 18:41 UTC] No.43039553[source]▶

>>42951472 #

I completely agree! I'm currently using Gemini's "2.0 Flash Thinking Experimental with apps" model.

279. Breza ◴[13 Feb 25 18:42 UTC] No.43039562{3}[source]▶

>>42959671 #

GPT-2, GPT-3, GPT-4.

↑