Chrome is adding `window.ai` – a Gemini Nano AI model right inside the browser

1. simonw ◴[30 Jun 24 06:52 UTC] No.40835549[source]▶

If this is the API that Google are going with here:

    const model = await window.ai.createTextSession();
    const result = await model.prompt("3 names for a pet pelican");

There's a VERY obvious flaw: is there really no way to specify the model to use?

Are we expecting that Gemini Nano will be the one true model, forever supported by this API baked into the world's most popular browser?

Given the rate at which models are improving that would be ludicrous. But... if the browser model is being invisibly upgraded, how are we supposed to test out prompts and expect them to continue working without modifications against whatever future versions of the bundled model show up?

Something like this would at least give us a fighting chance:

    const supportedModels = await window.ai.getSupportedModels();
    if (supportedModels.includes("gemini-nano:0.4")) {
        const model = await window.ai.createTextSession("gemini-nano:0.4");
        // ...

replies(7): >>40835678 #>>40835703 #>>40835717 #>>40835757 #>>40836197 #>>40836971 #>>40843533 #

2. j10u ◴[30 Jun 24 07:35 UTC] No.40835678[source]▶

>>40835549 (TP) #

I'm pretty sure that with time, they will be forced to let users choose the model. Just like it happened with the search engine...

replies(2): >>40835820 #>>40835903 #

3. Kwpolska ◴[30 Jun 24 07:43 UTC] No.40835703[source]▶

>>40835549 (TP) #

Since when can you expect stability with random bullshit generators? They are constantly changed, and they involve a lot of randomness.

4. luke-stanley ◴[30 Jun 24 07:47 UTC] No.40835717[source]▶

>>40835549 (TP) #

Presumably something like model.includes("gemini-nano:0.4") could work?

replies(1): >>40836178 #

5. pmg0 ◴[30 Jun 24 08:00 UTC] No.40835757[source]▶

>>40835549 (TP) #

> But... if the browser model is being invisibly upgraded, how are we supposed to test out prompts and expect them to continue working without modifications against whatever future versions of the bundled model show up?

Pinning the design of a language model task against checkpoint with known functionality is critical to really support building cool and consistent features on top of it

However the alternative to an invisibly evolving model is deploying an innumerable number of base models and versions, which web pages would be free to select from. This would rapidly explode the long tail of models which users would need to fetch and store locally to use their web pages, eg HF's long tail of LoRA fine tunes all combinations of datasets & foundation models. How many foundation model + LoRAs can people store and run locally?

So it makes some sense for google to deploy a single model which they believe strikes a balance in the size/latency and quality space. They are likely looking for developers to build out on their platform first, bringing features to their browser first and directing usage towards their models. The most useful fuel to steer the training of these models is knowing what clients use it for

6. LunaSea ◴[30 Jun 24 08:17 UTC] No.40835820[source]▶

>>40835678 #

Wouldn't this require Chrome to download models on the fly or pre-package multiple models?

That doesn't really seem possible (mobile data connection) or convenient (Chrome binary size, disk space) for the user.

replies(1): >>40837782 #

7. zelphirkalt ◴[30 Jun 24 08:42 UTC] No.40835903[source]▶

>>40835678 #

Only that it will take 5-10y to be regulated, until they will have to pay a measly fine, and let users choose. But then we will have the same game as with GDPR conformity now, companies left and right acting as if they just misunderstood the "new" rules and are still learning how this big mystery is to be understood, until a judge tells them to cut the crap. Then we will have the big masses, that will not care and feed all kinds of data into this AI thing, even without asking people it concerns for consent. Oh and then of course Google will claim, that it is all the bad users doing that, and that it is so difficult to monitor and prevent.

8. damacaner ◴[30 Jun 24 09:57 UTC] No.40836178[source]▶

>>40835717 #

can we make everything constant like C# does please

Models.GeminiNano04

boom

replies(2): >>40837121 #>>40843547 #

9. sensanaty ◴[30 Jun 24 10:01 UTC] No.40836197[source]▶

>>40835549 (TP) #

Testing LLM/AI output sounds like an oxymoron to me.

replies(1): >>40840327 #

10. lolinder ◴[30 Jun 24 13:12 UTC] No.40836971[source]▶

>>40835549 (TP) #

See this reply from someone on the Chrome team [0]. It's not a final API by any stretch, which is why you can't find any official docs for it anywhere.

[0] https://news.ycombinator.com/item?id=40835578

11. jitl ◴[30 Jun 24 13:40 UTC] No.40837121{3}[source]▶

>>40836178 #

What’s the point in JavaScript? At the end of the day that’s still equivalent to Models[“GeminiNano04”]

In C# you can’t compile a reference to Models.Potato04 unless Potato04 exists. In JS it’s perfectly legal to have code that references non-existant properties, so there’s no real developer ergonomics benefit here.

On the contrary, code like `ai.createTextSession(“Potato:4”)` can throw an error like “Model Potato:4 doesn’t exist, try Potato:1”, whereas `ai.createTextSession(ai.Models.Potato04)` can only throw an error like “undefined is not a Model. Pass a string here”.

Or you can make ai.Models a special object that throws when undefined properties are accessed, but then it’s annoying to write code that sniffs out which models are available.

12. hysan ◴[30 Jun 24 15:36 UTC] No.40837782{3}[source]▶

>>40835820 #

I don’t think you’d need to download on the fly. You can imagine models being installed like extensions where chrome comes with Gemini installed by default. Then have the API allow for falling back to the default (Gemini) or throwing an error when no model is available. I’d contend that this would be a better API design because the user can choose to remove all models to save space on devices where AI is not needed (ex: kiosk).

13. simonw ◴[30 Jun 24 21:00 UTC] No.40840327[source]▶

>>40836197 #

In the LLM engineering community we call those "evals", and they're critical to deploying useful solutions. More on that here: https://hamel.dev/blog/posts/evals/

14. langcss ◴[01 Jul 24 07:58 UTC] No.40843533[source]▶

>>40835549 (TP) #

Let alone temperature, max tokens, system, assistant, user, functions etc.

15. langcss ◴[01 Jul 24 08:01 UTC] No.40843547{3}[source]▶

>>40836178 #

That would be like making a constant for every nuget package/version tuple: unworkable because new versions and packages come out all the time.

Or making constants for every device manufacturer you can connect to via web Bluetooth.