←back to thread

85 points edweis | 1 comments | | HN request time: 0s | source

Hi all,

I want to learn Dutch and by experience I know I learn better when talking with native speakers.

From your experience, is there a good AI I can converse with in Dutch? It would be even better if I could see the transcription in Dutch.

Gliglish (https://gliglish.com) looks good but is more for speaking than for learning. I'd like to be able to set a situation (negotiating a job offer, calling a supplier ...).

Show context
jc4p ◴[] No.44047779[source]
Hi! I have a WIP of this over at https://talktrainer.app/ -- I just added Dutch to it.

It uses OpenAI's realtime API to simulate either a tutoring session (the speaker will revert to English to help you) or a first date or business meeting (the speaker will always speak the target language)

You can see the AI's transcriptions but not your own, limitation of the current OpenAI API but definitely something I can fix.

The prompts are like this: https://gist.github.com/jc4p/d8b9d121425ec191d62602d8720eeed... and the rest of it is a Nextjs app wrapped around the WebRTC connection.

I'm not fully in love with the app so I'd love any feedback or hearing if it works well for you -- It doesn't have a lot of features yet (including saving context) and if you bump into the time limit just open it up in incognito to keep going.

replies(11): >>44048194 #>>44048200 #>>44048331 #>>44048809 #>>44049339 #>>44049611 #>>44050311 #>>44051951 #>>44051989 #>>44052042 #>>44068898 #
valleyer ◴[] No.44048200[source]
This is great! Well done.

I've used the realtime API for something similar (also related to practicing speaking, though not for foreign languages). I just wanted to comment that the realtime API will definitely give you the user's transcriptions -- they come back as an `server.conversation.item.input_audio_transcription.completed` event. I use it in my app for exactly that purpose.

replies(1): >>44048341 #
jc4p ◴[] No.44048341[source]
Thank you so much!! While the transcription is technically in the API it's not a native part of the model and runs through Whisper separately, in my testing with it I often end up with a transcription that's a different language than what the user is speaking and the current API has no way to force a language on the internal Whisper call.

If the language is correct, a lot of the times the exact text isn't 100% accurate, if that's 100% accurate, it comes in slower than the audio output and not in real time. All in all not what I would consider feature ready to release in my app.

What I've been thinking about is switching to a full audio in --> transcribe --> send to LLM --> TTS pipeline, in which case I would be able to show the exact input to the model, but that's way more work than just one single OpenAI API call.

replies(2): >>44048568 #>>44068597 #
1. pbbakkum ◴[] No.44068597[source]
Heyo, I work on the realtime api, this is a very cool app!

With transcription I would recommend trying out "gpt-4o-transcribe" or "gpt-4o-mini-transcribe" models, which will be more accurate than "whisper-1". On any model you can set the language parameter, see docs here: https://platform.openai.com/docs/api-reference/realtime-clie.... This doesn't guarantee ordering relative to the rest of the response, but the idea is to optimize for conversational-feeling latency. Hope this is helpful.