Show HN: AI toy I worked on is in stores

(www.walmart.com)

Alt link: https://mrchristmas.com/products/santas-magical-telephone

Video demo: https://www.youtube.com/watch?v=0z7QJxZWFQg

The first time I talked with AI santa and it responded with a joke I was HOOKED. The fun/nonsense doesn't click until you try it yourself. What's even more exciting is you can build it yourself:

libpeer: https://github.com/sepfy/libpeer

pion: https://github.com/pion/webrtc

Then go do all your fun logic in your Pion server. Connect to any Voice AI provider, or roll your own via Open Source. Anything is possible.

If you have questions or hit any roadblocks I would love to help you. I have lots of hardware snippets on my GitHub: https://github.com/sean-der.

Show context

architectonic ◴[13 Oct 25 22:30 UTC] No.45574021[source]▶

>>45558375 (OP) #

How much computing power would one need to get this working completely local running a half decent llm fine tuned to sound like santa with all tts, stt and the pipecat inbetween?

replies(4): >>45574195 #>>45575432 #>>45576228 #>>45583058 #

1. oofbey ◴[14 Oct 25 01:56 UTC] No.45575432[source]▶

>>45574021 #

More than you can physically fit in a phone like that. Many hundreds if not thousands of watts of GPU.

replies(2): >>45575717 #>>45575785 #

2. margalabargala ◴[14 Oct 25 02:43 UTC] No.45575717[source]▶

>>45575432 (TP) #

That's not true. You could run such an LLM on a lower end laptop GPU, or a phone GPU. Very low power and low space. This isn't 2023 anymore, a Santa-specific LLM would not be so intensive.

replies(2): >>45575967 #>>45586091 #

3. trenchpilgrim ◴[14 Oct 25 02:57 UTC] No.45575785[source]▶

>>45575432 (TP) #

I run LLMs and TTS capable of this on my laptop since last year

4. oofbey ◴[14 Oct 25 03:31 UTC] No.45575967[source]▶

>>45575717 #

But on that compute budget it’s gonna sound so stupid. Oh right. Santa.

replies(1): >>45576475 #

5. margalabargala ◴[14 Oct 25 05:10 UTC] No.45576475{3}[source]▶

>>45575967 #

It's a children's toy, how nuanced does its responses need to be?

replies(1): >>45597473 #

6. kwindla ◴[14 Oct 25 23:09 UTC] No.45586091[source]▶

>>45575717 #

I've done a fair amount of fine-tuning for conversational voice use cases. Smaller models can do a really good job on a few things: routing to bigger models, constrained scenarios (think ordering food items from a specific and known menu), and focused tool use.

But medium-sized and small models never hit that sweet spot between open-ended conversation and reasonably on-the-rails responsiveness to what the user has just said. We don't know yet know how to build models <100B parameters that do that, yet. Seems pretty clear that we'll get there, given the pace of improvement. But we're not there yet.

Now maybe you could argue that a kid is going to be happy with a model that you train to be relatively limited and predictable. And given that kids will talk for hours to a stuffie that doesn't talk back at all, on some level this is a fair point! But you can also argue the other side: kids are the very best open-ended conversationalists in the world. They'll take a conversation anywhere! So giving them an 8B parameter, 4-bit quantized Santa would be a shame.

7. oofbey ◴[15 Oct 25 19:46 UTC] No.45597473{4}[source]▶

>>45576475 #

I agree. It just took me a while to figure it out. A 3B param LLM would do perfectly well.

↑