Most active commenters
  • bayindirh(4)
  • bearjaws(3)
  • kanbankaren(3)

←back to thread

172 points marban | 24 comments | | HN request time: 1.842s | source | bottom
1. bearjaws ◴[] No.40052158[source]
The focus on TOPS seems a bit out of line with reality for LLMs. TOPs doesn't matter for LLMs if your memory bandwidth can't keep up. Since it doesn't have quad channel memory mentioned I guess it's still dual channel?

Even top of the line DDR5 is around 128GB/s vs a M1 at 400GB/s.

At the end of the day, it still seems like AI in consumer chips is chasing a buzzword, what is the killer feature?

On mobile there are image processing benefits and voice to text, translation... but on desktop those are no where near common use cases.

replies(3): >>40052204 #>>40052260 #>>40052353 #
2. postalrat ◴[] No.40052204[source]
https://www.neatvideo.com/blog/post/m3

That says M1 is 68.25 GB/s

replies(3): >>40052258 #>>40052307 #>>40052631 #
3. givinguflac ◴[] No.40052258[source]
The op were obviously talking about M1 Max.
replies(1): >>40052494 #
4. VHRanger ◴[] No.40052260[source]
The killer feature is presumably inference at the edge, but I don't see that being used on desktop much at all right now.

Especially since most desktop applications people use are web apps. Of the native apps people use that leverage this sort of stuff, almost all are GPU accelerated already (eg. image and video editing AI tools)

replies(1): >>40052360 #
5. bearjaws ◴[] No.40052307[source]
M1 Max sorry, I don't mean to compare a 4 year old tablet processor to the latest generation of laptop CPUs.
6. futureshock ◴[] No.40052353[source]
Upscaling for gaming or video.

Local context aware search

Offline Speech to text and TTS

Offline generation of clip art or stock images for document editing

Offline LLM that can work with your documents as context and access application and OS APIs

Improved enemy AI in gaming

Webcam effects like background removal or filters.

Audio upscaling and interpolation like for bad video call connections.

replies(2): >>40052491 #>>40052497 #
7. jzig ◴[] No.40052360[source]
What does “at the edge” mean here?
replies(4): >>40052515 #>>40052529 #>>40052531 #>>40052991 #
8. bearjaws ◴[] No.40052491[source]
> Upscaling for gaming or video.

Already exists on all three major GPU manufacturers, and it definitely makes sense as a GPU workload.

> Local context aware search

You don't need an AI processor to do this, Windows search used to work better and had even less compute resources to work with.

> Offline Speech to text and TTS

See my point about not a very common use case for desktops & laptops vs cell phones.

> Offline LLM that can work with your documents as context and access application and OS APIs

Maybe for some sort of background task or only using really small models <13B parameters. Anything real time is going to run at 1-2t/s with a large model.

Small models are pretty terrible though, I doubt people want even more incorrect information and hallucinations.

> Improved enemy AI in gaming

See Ageia PhysX

> Webcam effects like background removal or filters.

We already have this without NPUs.

> Audio upscaling and interpolation like for bad video call connections.

I could see this, or noise cancellation.

replies(3): >>40052653 #>>40052700 #>>40052949 #
9. postalrat ◴[] No.40052494{3}[source]
How is it obvious? Anyone reading that could assume that any M1 gets that bandwidth.
10. kanbankaren ◴[] No.40052497[source]
All of this(except upscaling) is possible with iGPU/CPU without breaking a sweat?
replies(1): >>40052666 #
11. georgeecollins ◴[] No.40052515{3}[source]
Not using AI on the cloud. So if your connection is uncertain or you want use your bandwidth for something else-- like video conferencing or gaming. Probably the killer app is something that wants to use AI but doesn't involve paying a cloud provider. I was talking to a vendor about their chat bot built to put into MMOs or mobile games. It woudl be killer to have a character have life like conversation in those kinds of experiences. But the last thing you want to do is increase your server costs the way this AI would. Edge computing could solve that.
12. PeterSmit ◴[] No.40052529{3}[source]
Not in the cloud.
13. Zach_the_Lizard ◴[] No.40052531{3}[source]
I'm guessing "the edge" is doing inference work in the browser, etc. as opposed to somewhere in the backend of the web app.

Maybe your local machine can run, I don't know, a model to make suggestions as you're editing a Google Doc, which frees up the Big Machine in the Sky to do other things.

As this becomes more technically feasible, it reduces the effective cost of inference for a new service provider, since you, the client, are now running their code.

The Jevons paradox might kick in, causing more and more uses of LLMs for use cases that were too expensive before.

14. ◴[] No.40052631[source]
15. bayindirh ◴[] No.40052653{3}[source]
It's about power management, and doing more things with less power. These specialized IP blocks on CPUs allow these things to be done with less power and less latency.

Intel's bottom of the barrel N95 & N100 CPUs have Gaussian & Neural accelerators for simple image processing and object detection tasks, plus a voice processor for low power voice based activation and command capture and process.

You can always add more power hungry, general purpose components to add capabilities. Heck, video post processing entered hardware era with ATI Radeon 8500. But doing these things with negligible power costs is the new front.

Apple is not adding coprocessors to their iPhones because it looks nice. All of these coprocessors reduce CPU wake-up cycles tremendously and allows the device to monitor tons of things out of bands with negligible power costs.

16. bayindirh ◴[] No.40052666{3}[source]
The things which doesn't make GPU to break a sweat has its own specialized (or semi-specialized) processing blocks on the GPU, too.
replies(1): >>40052781 #
17. pdpi ◴[] No.40052700{3}[source]
>> Upscaling for gaming or video.

> Already exists on all three major GPU manufacturers, and it definitely makes sense as a GPU workload.

"makes sense as a GPU workload" is underselling it a bit. Doing it on the CPU is basically insane. Games typically upscale only the world view (the expensive part to render) while rendering the UI at full res. So to do CPU-side upscaling we're talking about a game rendering a surface on the GPU, sending it to the CPU, upscaling it there, sending it back to the GPU, then compositing with the UI. It's just needlessly complicated.

18. kanbankaren ◴[] No.40052781{4}[source]
I meant the current generation of GPUs that don't have any AI acceleration blocks.
replies(1): >>40052810 #
19. bayindirh ◴[] No.40052810{5}[source]
They are MATMUL machines by design already. They do not need to "accelerate" AI to begin with.

Their cores/shaders can be programmed to do that.

Also, name a current gen GPU which doesn't have video encoding/decoding capabilities/facilities in silicon, even ones which do not allow shaders to be used in this process for post-processing. It's impossible (to not to have these capabilities) at this point in time.

replies(1): >>40053026 #
20. futureshock ◴[] No.40052949{3}[source]
> Upscaling for gaming or video. > Already exists on all three major GPU manufacturers, and it definitely makes sense as a GPU workload. These AMD chips are APUs that are often the only GPU, not every user will have a dedicated GPU.

> Local context aware search > You don't need an AI processor to do this, Windows search used to work better and had even less compute resources to work with. You could still improve it with increased natural language understanding instead of simple keyword. “Give me all documents about dogs” instead of searching for each breed as a keyword.

> Offline Speech to text and TTS > See my point about not a very common use case for desktops & laptops vs cell phones. Maybe not for you but accessibility is a key feature for many users. You think blind users should suffer through bad TTS?

> Offline LLM that can work with your documents as context and access application and OS APIs > Maybe for some sort of background task or only using really small models <13B parameters. Anything real time is going to run at 1-2t/s with a large model. > Small models are pretty terrible though, I doubt people want even more incorrect information and hallucinations. Small model have been improving and better capabilities in consumer chips will allow larger models to run faster.

> Improved enemy AI in gaming > See Ageia PhysX Surely you’re not suggesting that enemy AI is solved problem in gaming?

> Webcam effects like background removal or filters. > We already have this without NPUs. Sure but it could go from obvious and distracting to seamless and convincing.

> Audio upscaling and interpolation like for bad video call connections. I could see this, or noise cancellation.

21. VHRanger ◴[] No.40052991{3}[source]
Edge is doing computing on the client (eg. browser, phone, laptop, etc.) instead of the server
replies(1): >>40055563 #
22. kanbankaren ◴[] No.40053026{6}[source]
I was talking about AI blocks and you moved the goal post to video codec blocks.
replies(1): >>40053078 #
23. bayindirh ◴[] No.40053078{7}[source]
No. I didn't move anything.

I said that the core (3D rendering hardware) of a GPU with shaders is the AI block already, and said that other tasks like video encoders have their own blocks, but still pull capabilities from the "core" to improve things.

24. Dylan16807 ◴[] No.40055563{4}[source]
Half the definitions I see of edge include client devices, and half of them don't include client devices.

I like the latter. Why even use a new word if it's just going to be the same as "client"?