←back to thread

2 points benjah | 1 comments | | HN request time: 0.201s | source

Could I get some opinions on my experiential learning language app?

When I learned Japanese with new friends, I didn't know how to ask where we were going or where the bathroom was. It was really motivating. As in-country immersion isn't option in most cases, language exchange is next best option. However, in-person language exchange can be expensive or difficult to schedule. So I built ConvoLive to make language practice more engaging for myself.

Open Beta on Android and iOS https://convolive.com

Technically interesting bits:

  - Lip-synced avatars using WebGL.
  - Continuous speech recognition on recent phone models so you don't have to press to speak
  - Using device speech recognition and caching assets means that most external LLM calls are limited to the free-form chat mode.
  - Unfortunately, on-device local LLM ended up being too demanding and slow.
  - GPT 4o provided better speed and results than GPT 5.
  - Multimodal quiz system with drag-and-drop, fill-in-the-blank, and multi-choice exercises
  - Freeform conversation gives you suggestions to keep the chat going, but you can also ask how to say something.
What works:

  - People seem to love or hate the avatars.
  - Testers are saying they feel more comfortable speaking.
  - The avatars aren't groundbreaking but they do make it feel less like talking to a chatbot.
  - Being able to speak freely without clicking seems more natural.
What doesn't:

  - People seem to love or hate the avatars.
  - As an app that promotes speaking out loud, people might be wary of speaking as a beginner outside their home.
  - The app might be better suited for people who already understand fundamentals like tenses or different alphabets which don't fit into a per conversation approach.
  - Newer better quality TTS models do not provide the viseme lip sync data I need for animation.
Currently supports Spanish, Japanese, Italian, German, Portuguese, and French.

Curious what other language learners here think – is this approach useful or do we not want to talk to our phones in broken Spanish? Should I keep working on this?

Show context
pickettd ◴[] No.45813480[source]
Was the on-device local LLM stack that you tried llama.cpp or something like MLC? I've seen better performance with MLC than llama.cpp in the past - but it has been probably at a least a year since I tested iphones and androids for local inference
replies(1): >>45814310 #
1. benjah ◴[] No.45814310[source]
I looked into using https://github.com/mybigday/llama.rn. Ultimately, it was too slow to be conversational. The demands of the rendering the WebGL would likely not help.

It was a while ago. If I was to do it over again I might try https://github.com/tirthajyoti-ghosh/expo-llm-mediapipe. Maybe newer models will help.