When I saw the title, I thought this was running models in the browser. IMO that's way more interesting and you can do it with transformers.js and onnx runtime. You don't even need a gpu.
https://huggingface.co/spaces/webml-community/llama-3.2-webg...
replies(1):