←back to thread

139 points the_king | 9 comments | | HN request time: 0.394s | source | bottom

Hey HN - It’s Finn and Jack from Aqua Voice (https://withaqua.com). Aqua is fast AI dictation for your desktop and our attempt to make voice a first-class input method.

Video: https://withaqua.com/watch

Try it here: https://withaqua.com/sandbox

Finn is uber dyslexic and has been using dictation software since sixth grade. For over a decade, he’s been chasing a dream that never quite worked — using your voice instead of a keyboard.

Our last post (https://news.ycombinator.com/item?id=39828686) about this seemed to resonate with the community - though it turned out that version of Aqua was a better demo than product. But it gave us (and others) a lot of good ideas about what should come next.

Since then, we’ve remade Aqua from scratch for speed and usability. It now lives on your desktop, and it lets you talk into any text field -- Cursor, Gmail, Slack, even your terminal.

It starts up in under 50ms, inserts text in about a second (sometimes as fast as 450ms), and has state-of-the-art accuracy. It does a lot more, but that’s the core. We’d love your feedback — and if you’ve got ideas for what voice should do next, let’s hear them!

1. adamesque ◴[] No.43640539[source]
I was very delighted by Aqua v1, which felt like magic at first.

But I’ve noticed/learned that I can’t dictate written content. My brain just does not work that way at all — as I write I am constantly pausing to think, to revise, etc and it feels like a completely different part of my brain is engaged. Everything I dictated with Aqua I had to throw away and rewrite.

Has anyone had similar problems, and if so, had any success retraining themselves toward dictation? There are fleeting moments where it truly feels like it would be much faster.

replies(6): >>43640621 #>>43640635 #>>43640654 #>>43641249 #>>43642013 #>>43643285 #
2. the_king ◴[] No.43640621[source]
I think Aqua v1 had two problems:

1. The models weren't ready.

2. The interactions were often strained. Not every edit/change is easy to articulate with your voice.

If 1 had been our only problem, we might have had a hit. In reality, I think optimizing model errors allowed us to ignore some fundamental awkwardness in the experience. We've tried to rectify this with v2 by putting less emphasis on streaming for every interaction and less emphasis on commands, replacing it with context.

Hopefully it can become a tool in the toolbox.

replies(1): >>43643305 #
3. jmcintire1 ◴[] No.43640635[source]
Imo it is a question of right tool for the right job, adjusted for differences between people. For me, the use case that made our product click was prompting Cursor while coding. Then I wanted to use it whenever I talked to chatgpt -- it's much faster to talk and then read, and repeat.

Voice is great for whenever the limiting factor to thought is speed of typing.

4. noahjk ◴[] No.43640654[source]
Same here. My two biggest hurdles are:

1. like you mentioned, the second I start talking about something, I totally forget where I'm going, have to pause, it's like my thoughts aren't coming to me. Probably some sort of mental feedback loop plus, like you mentioned, different method of thinking.

2. in the back of my mind, I'm always self-conscious that someone is listening, so it's a privacy / being judged / being overheard feeling which adds a layer of mental feedback.

There's also not great audio clues for handling on-the-fly editing. I've tried to say "parentheses word parentheses" and it just gets written out. I've tried to say "strike that" and it gets written out. These interfaces are very 'happy path' and don't do a lot of processing (on iOS, I can say "period" and get a '.' (or ?,!) but that's about the extent).

I have had some success with long-form recording sessions which are transcribed afterwards. After getting over the short initial hump, I can brain-dump to the recording, and then trust an app like Voice Notes or Superwhisper to transcribe, and then clean up after.

The main issue I run into there, though, is that I either forget to record something (ex. a conversation that I want to review later) or there is too much friction / I don't record often enough to launch it quickly or even remember to use that workflow.

I get the same feeling with smart home stuff - it was awesome for a while to turn lights on and off with voice, but lately there's the added overhead of "did it hear me? do I need to repeat myself? What's the least amount of words I can say? Why can't I just think something into existence instead? Or have a perfect contextual interface on a physical device?"

5. SCdF ◴[] No.43641249[source]
I use my (work) computer entirely with my voice, and it takes a lot of effort to work out what to actually write and to not ramble. Like you I've found that it's better to throw out words in sort of half sentence chunks, to give your brain time to work out what the next chunk is.

It's very hard, and I wouldn't do it if I didn't have to.

(which is why I'm always perplexed by these apps which allow voice dictation or voice control, but not as a complete accessibility package. I wouldn't be using my voice if my hands worked!)

It's also critically important (and after 3-4 years of this I still regularly fail at this) to actually read what you've written, and edit it before send, because those chunks don't always line up into something that I'd consider acceptably coherent. Even for a one sentence slack message.

(also, I have a kiwi accent, and the dictation software I use is not always perfect at getting what I wanted to say on the page)

replies(1): >>43642620 #
6. cloogshicer ◴[] No.43642013[source]
I'm exactly the same. Aqua is so incredible and I really tried to like it, but I just can't get my brain to think of what I want to say first, I have to pause to think constantly.
7. e12e ◴[] No.43642620[source]
Curious about your current setup, and if maybe adding a macro/functionality to clean up input via an LLM would help?

In my experience LLM can be quite forgiving when given some unfinished input and asked to expand/clean up?

8. ◴[] No.43643285[source]
9. adamesque ◴[] No.43643305[source]
Looking forward to giving it another try!