(sutro.sh)

113 points sethkim | 1 comments | 03 Jul 25 17:34 UTC | HN request time: 0.001s | source

Show context

ramesh31 ◴[03 Jul 25 18:50 UTC] No.44458079[source]▶

>By embracing batch processing and leveraging the power of cost-effective open-source models, you can sidestep the price floor and continue to scale your AI initiatives in ways that are no longer feasible with traditional APIs.

Context size is the real killer when you look at running open source alternatives on your own hardware. Has anything even come close to the 100k+ range yet?

replies(2): >>44458100 #>>44458707 #

sethkim ◴[03 Jul 25 18:53 UTC] No.44458100[source]▶

>>44458079 #

Yes! Both Llama 3 and Gemma 3 have 128k context windows.

replies(1): >>44458695 #

1. ryao ◴[03 Jul 25 20:03 UTC] No.44458695[source]▶

>>44458100 #

Llama 3 had a 8192 token context window. Llama 3.1 increased it to 131072.

↑

The End of Moore's Law for AI? Gemini Flash Offers a Warning