It seems like it could be nice for something like a bookmarklet or a one-off script, but I don't think it'll really reduce friction in engaging with Gemini for serious web apps.
And the right way to think about it isn't other browsers. It's Google seeing what Apple is doing in iOS 18 and imitating that.
That's what people said about Internet Explorer
At this point I'm going to create an image generator that's just an api to return random images from pixabay. pix.ai (opensource of course)
It should be simple enough to do that I believe at least 3-5 people are going to be doing this if it's not done already
Hell, if nobody does it I will do it
Chrome may have been a darling thing when it was young, but is now just a fresh take on Microsoft's Internet Explorer strategy. MS lost it's hold on the web because of regulatory action, and Google's just been trying to find a permissible road to that same opportunity.
I had been thinking and speaking in public about how to make a "Metamask but for AI instead of crypto" but I thought it would be impossible for websites to adopt it
Now thanks to Google it's possible to piggy back onto the API
I'm very happy about this
But lots of both stakeholders and users currently value the "magic" itself over anything practical.
Sort of like using ChatGPT to help figure out how to use FFmpeg to accomplish a task from the command prompt, but used to create the equivalent of greasemonkey scripts.
If Mozilla jumps on board and makes a compatible implementation that back ends to eg: local llama then you would have the preconditions necessary for it to become standardised. As long as Google hasn't booby trapped it by making it somehow highly specific to chrome / google / Gemini etc.
I used to hold Google Chrome in high esteem due to its security posture. Shoehorning AI into it has deleted any respect I held for Chrome or the team that develops it.
Trust arrives on foot and leaves on horseback.
* The sign: https://ludic.mataroa.blog/blog/i-will-fucking-piledrive-you...
[0]: https://blog.nightly.mozilla.org/2024/06/24/experimenting-wi...
https://developer.chrome.com/docs/ai/built-in
https://github.com/jeasonstudio/chrome-ai
I can’t seem to find public documentation for the API with a cursory search, so https://github.com/jeasonstudio/chrome-ai/blob/ec9e334253713... might be the best documentation (other than directly inspecting the window.ai object in console) at the moment.
It’s not really clear if the Gemini Nano here is Nano-1 (1.8B) or Nano-2 (3.25B) or selected based on device.
I don't think this is a terrible idea. LLM-powered apps are here to stay, so browsers making them better is a good thing. Using a local model so queries aren't flying around to random third parties is better for privacy and security. If Google can make this work well it could be really interesting.
https://news.ycombinator.com/item?id=39920803
> So while I am usually the person who would much rather the browser do almost nothing that isn't a hardware interface, requiring all software (including rendering) to be distributed as code by the website via the end-to-end principal--making the browser easy to implement and easy to secure / sandbox, as it is simply too important of an attack surface to have a billion file format parsing algorithms embedded within it--I actually would love (and I realize this isn't what Opera is doing, at least yet) to have the browser provide a way to get access to a user-selected LLM: the API surface for them--opaque text streaming in both directions--is sufficiently universal that I don't feel bad about the semantic lock-in and I just don't see any reasonable way to do this via the end-to-end principal that preserves user control over tradeoffs in privacy, functionality, and cost... if I go to a website that uses an LLM I should be the one choosing which LLM it is using, NOT the website!!, and if I want it to use some local model or the world's most powerful cloud model, I 1) should be in control of that selection and 2) pretty much have to be for local models to be feasible at all as I can't sit around downloading and caching gigabytes of data, separately, from every service that might make use of an LLM. (edit: Ok, in thinking about it a lot more maybe it makes more sense for this to be a separate daemon run next to the web browser--even if it comes with the web browser--which merely provides a localhost HTTP interface to the LLM, so it can also be shared by native apps... though, I am then unsure how web applications would be able to access them securely due to all of the security restrictions on cross-origin insecure port access.)
So no, I don't have much technical objections.
This a major leap forward in human innovation and engineering. IMO, this could be as influential as the adoption of electricity/setting up of the power grid.
At the same time, it is a major risk for browser compatibility. Despite many articles claiming otherwise, I think we mostly avoided repeating the "works only on IE6" situation with chrome. Google did kinda try at times, but most things didn't catch on. This I think has the potential to do some damage on that front.
If Copilot is so great, why does your employer even need you? Replacing you with Copilot would be more capital-efficient.
Because there is a ton of hyper fixation and rash decisions being made over something that puts words together. It seems very unwise to add a new browser API for something that is in its infancy and being developed.
For instance I don't need my browser to pass the Turing test. I might need better filtering and better search, but it also doesn't need to be baked in the browser.
Your analogy to electricity is interesting: do you feel the need to add electricity to your bed, dining table, chairs, shelves, bathroom shower, nose clip etc.
We kept electric and non electric things somewhat separate, even as each tool and appliance can work together (e.g. my table has a power strip clipped to it, but both are completely separate things)
triggers the part of you that says "this tastes good" but will rot your teeth
It's handy if I want a snippet of example code that I could've just found on Stackoverflow, but not useful for anything I actually have to think about.
> - Src: https://github.com/webmachinelearning/webnn
W3C Candidate Recommendation Draft:
> - Spec: https://www.w3.org/TR/webnn/
> WebNN API: https://www.w3.org/TR/webnn/#api :
>> 7.1. The `navigator.ml` interface
>> webnn-polyfill
E.g. Promptfoo, ChainForge, and LocalAI all have abstractions over many models; also re: Google Desktop and GNU Tracker and NVIDIA's pdfgpt: https://news.ycombinator.com/item?id=39363115
promptfoo: https://github.com/promptfoo/promptfoo
ChainForge: https://github.com/ianarawjo/ChainForge
LocalAI: https://github.com/go-skynet/LocalAI
Leave it to Vercel to announce `window.ai` on Google's behalf by showing off their own abstraction but not the actual Chrome API.
Here's a blog post from a few days ago that shows how the actual `window.ai` API works [0]. The code is extremely simple and really shouldn't need a wrapper:
const model = await window.ai.createTextSession();
const result = await model.prompt("What do you think is the meaning of life?");
[0] https://afficone.com/blog/window-ai-new-chrome-feature-api/By the way, haven't touch the lastest JS code for a while, what does this new syntax mean: "import { chromeai } "
Also not get the textStream code: for await (const textPart of textStream) { result = textPart; } does result get override for each loop step?
Look at WebNN [1]. It's from Microsoft and is basically DirecttML but they at least pretend to make it a Web thing.
The posture matters. Apple tried to expose Metal through WebGPU [2] then silent-abandoned it. But they had the posture, and other vendors picked it up and made it real.
That won't happen to window.ai until they stop sleepwalking.
const session = await window.ai.createTextSession()
const outputText = await session.prompt(inputText)
That's all there is for now (createGenericSession does the same at this time, and there are canCreateTextSession/canCreateGenericSession).Now not only can Google front the web pages who feed them content they make summaries from, but the browser can front Google.
“Your honour, this is just what Google has been saying is a good thing. We just moved it to the edge. The users win, no?”
I believe we can start compressing down the amount of data going over the wire 100x this way...
This is one of those places where Apple's vertical integration has a clear benefit, but even as a bit of a skeptic regarding "AI" technology, it does seem there's a good chance that accelerated ML inference is going to be one of the next battlegrounds for processor mobile performance and capability, if it hasn't started already.
In fact any kind of decoder model, including text models can use the same principle to lossily compress data. Of course, hallucination will be a thing...
Diffusion models, depending on the full architecture might not have smaller dimension layers that could be used for compression.
Internet has been already like genAI for decades. Need a picture? Prompt in the Google Image search a few keywords. There are billions of human made images to choose from. Need to find information about something? Again prompt the search engine, or use Wikipedia directly, it's more up to date than LLMs.
Need personalized response? Post on a forum, real humans will respond, better than GPT. Need help with coding? Stack overflow and Github Issues.
We already had a kind of manual-AI for 25 years. That is why I don't think the impact shock of AI will be as great as it is rumored to be. Various efficiencies of having access to an internet-brain have already been used by society. Even in art, the situation is that a new work competes with decades of history, millions of free works one click away, better than AI outputs, no weird artifacts and giveaways.
- Massively broaden the input for forms because the AI can accept or validate inputs better
- Prefill forms from other known data, at the application level
- Understand files/docs/images before they even go up, if they go up at all
- Provide free text instructions to interact with complex screens/domain models
Using the word AI everywhere is marketing, not dev
Give my personal local data to a model running in the browser? Just feels a bit more risky.
Which, in a way, is similar to building a browser leveraging all of the local GPUs to do render and HW-accelerated video decoding.
Is Safari on Apple Silicon better than Chrome on random Windows laptop for playing YouTube in the last 5 years? Hardly.
const model = await window.ai.createTextSession();
const result = await model.prompt("3 names for a pet pelican");
There's a VERY obvious flaw: is there really no way to specify the model to use?Are we expecting that Gemini Nano will be the one true model, forever supported by this API baked into the world's most popular browser?
Given the rate at which models are improving that would be ludicrous. But... if the browser model is being invisibly upgraded, how are we supposed to test out prompts and expect them to continue working without modifications against whatever future versions of the bundled model show up?
Something like this would at least give us a fighting chance:
const supportedModels = await window.ai.getSupportedModels();
if (supportedModels.includes("gemini-nano:0.4")) {
const model = await window.ai.createTextSession("gemini-nano:0.4");
// ...
What about running LoRAs, adjusting temperature, configuring prompt templates, etc? It seems pretty early to build something like this into the browser. The technology is still changing so rapidly, it might look completely different in 5 years.
I'm a huge fan of local AI, and of empowering web browsers as a platform, but I'm feeling pretty stumped by this one. Is this a good inclusion at this time? Or is the Chrome team following the Google-wide directive to integrate AI _everywhere_, and we're getting a weird JS API as a result?
At the very least, I hope to see the model decoupled from the interface. In the same way that font-family loads locally installed fonts, it should be pluggable for other local models.
Overview: https://developer.chrome.com/docs/ai/built-in
Sign-up: https://docs.google.com/forms/d/e/1FAIpQLSfZXeiwj9KO9jMctffH...
As for temperature and topK, you can set them in the AITextSessionOptions object as an argument to `window.ai.createTextSession(options)` (source: https://source.chromium.org/chromium/chromium/src/+/main:thi...)
You should also be able to set it by adding the switches: `chrome --args --enable-features=OptimizationGuideOnDeviceModel:on_device_model_temperature/0.5/on_device_model_topk/8` (source: https://issues.chromium.org/issues/339471377#comment12)
The default temperature is 0.8 and default topK is 3 (source: https://source.chromium.org/chromium/chromium/src/+/main:com...)
As for LoRA, Google will provide a Fine-Tuning (LoRA) API in Chrome: https://developer.chrome.com/docs/ai/built-in#browser_archit...
In particular I get to choose the best options for each of them (in particular search, filtering and security being independent from each other seems like a core requirement to me). The most telling part to me is how extensions come and go, and we move on from one to the other. The same kind of rollover won't be an option with everything in Apple's AI for instance.
This could come down the divide between the Unix philosophy of a constellation of specialized tools working together or a huge monolith responsible for everything.
I don't see the latter as a viable approach at scale.
Pinning the design of a language model task against checkpoint with known functionality is critical to really support building cool and consistent features on top of it
However the alternative to an invisibly evolving model is deploying an innumerable number of base models and versions, which web pages would be free to select from. This would rapidly explode the long tail of models which users would need to fetch and store locally to use their web pages, eg HF's long tail of LoRA fine tunes all combinations of datasets & foundation models. How many foundation model + LoRAs can people store and run locally?
So it makes some sense for google to deploy a single model which they believe strikes a balance in the size/latency and quality space. They are likely looking for developers to build out on their platform first, bringing features to their browser first and directing usage towards their models. The most useful fuel to steer the training of these models is knowing what clients use it for
[0]: https://connect.mozilla.org/t5/discussions/share-your-feedba...
I'm looking forward to see a cross-browser polyfill, possibly as a web extension.
That’s why people chose chrome? Citation needed. I’ve very rarely seen websites rely on new browser specific capabilities, except for demos/showcases.
Didn’t Chrome slowly become popular using Google's own marketing channel, search? That’s what I thought.
> MS lost it's hold on the web because of regulatory action
Well, not only. They objectively made a worse product for decades and used their platform to push it, much more effectively than Google too. They are still pushing Edge hard, with darker patterns than Google imo.
In either case, the decision to adopt Chromium wasn’t forced. Microsoft clearly must have been aligned enough on the capability model to not deem it a large risk, and continued to push for Edge just as they did with IE.
Yesterday upon restarting my PC a Skype dialog popped up inviting me to see how CoPilot could help me. So naturally I went into the task manager and shut down the Skype processes.
There's already so many ways to fingerprint users which are far more reliable though.
Yes, wrapping stuff to give a different developer experience contributes to new ideas, and can evolve into something more.
In C# you can’t compile a reference to Models.Potato04 unless Potato04 exists. In JS it’s perfectly legal to have code that references non-existant properties, so there’s no real developer ergonomics benefit here.
On the contrary, code like `ai.createTextSession(“Potato:4”)` can throw an error like “Model Potato:4 doesn’t exist, try Potato:1”, whereas `ai.createTextSession(ai.Models.Potato04)` can only throw an error like “undefined is not a Model. Pass a string here”.
Or you can make ai.Models a special object that throws when undefined properties are accessed, but then it’s annoying to write code that sniffs out which models are available.
More recently, and on topic, I am dubious about langchain and the notion of doing away with composing your own natural language prompts from the start. I know of at least some devs whose interactions with llm are restricted solely to using langchain, and have never realized how easy it is to, say, prompt the llm for json adhering to a schema by just, you know, asking it. I suppose eventually frameworks/ wrappers will arise around in-browser ai models. But I see a danger in people being so eager to incuriously adopt the popular even as it bloats their project size unnecessarily. If we forecast ahead, if LLMs become ever better, then the need for wrappers should diminish I would think. It would suck if AI and language models got ever better but we still were saddled with the same bloat, cognitive and code size, just because of human nature.
[0] https://github.com/mdn/yari/issues/9208 [1] https://github.com/mdn/yari/issues/9230
With this, Goog gets to offload AI stuff to clients, but can (and will, I guarantee) sample the interactions, calling it "telemetry" and perhaps saying it's for "safety" as opposed to being blatant Orwellian spying.
Or making constants for every device manufacturer you can connect to via web Bluetooth.
So there is a tradeoff so Generative AI is more useful in many circumstances.
AI is getting more accurate with time not less. It is using less energy per byte with time too for a given quality.
Guess where things will be in 2030?
Another few years and most of these won’t even make it to the front page.