If I remember correctly, "ichigo" means strawberry in japanese. You are welcome.
There were several links:
- Blog for details: https://homebrew.ltd/blog/llama-learns-to-talk
- Code: https://github.com/homebrewltd/ichigo
- Run locally: https://github.com/homebrewltd/ichigo-demo/tree/docker
- Demo on a single 3090: https://ichigo.homebrew.ltd/
A quick intro: We're a Local AI company building local AI tools and training open-source models.
Ichigo is our training method that enables LLMs to understand human speech and talk back with low latency - thanks to FishSpeech integration. It is open data, open weights, and weight initialized with Llama 3.1, extending its reasoning ability.
Plus, we are the creators and lead maintainers of: https://jan.ai/, Local AI Assistant - an alternative to ChatGPT & https://cortex.so/, Local AI Toolkit (soft launch coming soon)
Everything we build and train is done out in the open - we share our progress on:
https://x.com/homebrewltd https://discord.gg/hTmEwgyrEg
You can check out all our products on our simple website: https://homebrew.ltd/
The documentation isn't very detailed yet, but we're planning to improve it and add support for various hardware.
I'm trying to use chatgpt for ai translation, but the other big problem I run into is TTS and SST on non-top 40 languages (e.g. lao). Facebook has a TTS library, but it isn't open for commercial use unfortunately.
GPT 4o: The word "ichigo," which is the Romanized spelling (romaji) of いちご, contains one "r." It appears in the letter "r" in "chi," as the "ch" sound in romaji represents a combination of the "r" sound from "r" and "t" sound from "i."
Thank you chatgpt. I'm glad we've burned down a bunch of forests for this.
You can consistently get the right answer with a prompt of:
> Write python code, and run it, to count the number of 'r' characters in いちご.
though. For numeric stuff, telling the thing to just write python code makes it significantly better at getting right answers.
To clarify, while you can enable transcription to see what Ichigo says, Ichigo's design skips directly from audio to speech representations without creating a text transcription of the user’s input. This makes interactions faster but does mean that the user's spoken input isn't transcribed to text.
The flow we use is Speech → Encoder → Speech Representations → LLM → Text → TTS. By skipping the text step, we're able to speed things up and focus on the verbal experience.
Hope this clears things up!
Bringing AI into this space enhances user experience while respecting their autonomy over data. It feels like a promising step toward a future where we can leverage the power of AI without compromising on privacy or control. Really looking forward to seeing how this evolves!
Can you help me wrap my brain around this? Does it mean six? I'm struggling to understand how a word can mean two numbers and how this would actually be used in a conversation.
Thanks. I'm curious and trying to search for this to understand just returns anime.
Ichi is the word for 1. Go is the word for 5.
There are no “r”s in the word “ichigo.”
Maybe your instructions are bad.
Can't believe I fell for that.
I think Matrix is not publicly indexable unless the channel is unencrypted and set to public.