Show HN: Aqua Voice 2 – Fast Voice Input for Mac and Windows

1. adamesque ◴[10 Apr 25 04:09 UTC] No.43640539[source]▶

I was very delighted by Aqua v1, which felt like magic at first.

But I’ve noticed/learned that I can’t dictate written content. My brain just does not work that way at all — as I write I am constantly pausing to think, to revise, etc and it feels like a completely different part of my brain is engaged. Everything I dictated with Aqua I had to throw away and rewrite.

Has anyone had similar problems, and if so, had any success retraining themselves toward dictation? There are fleeting moments where it truly feels like it would be much faster.

replies(6): >>43640621 #>>43640635 #>>43640654 #>>43641249 #>>43642013 #>>43643285 #

2. the_king ◴[10 Apr 25 04:29 UTC] No.43640621[source]▶

>>43640539 (TP) #

I think Aqua v1 had two problems:

1. The models weren't ready.

2. The interactions were often strained. Not every edit/change is easy to articulate with your voice.

If 1 had been our only problem, we might have had a hit. In reality, I think optimizing model errors allowed us to ignore some fundamental awkwardness in the experience. We've tried to rectify this with v2 by putting less emphasis on streaming for every interaction and less emphasis on commands, replacing it with context.

Hopefully it can become a tool in the toolbox.

replies(1): >>43643305 #

3. jmcintire1 ◴[10 Apr 25 04:33 UTC] No.43640635[source]▶

>>43640539 (TP) #

Imo it is a question of right tool for the right job, adjusted for differences between people. For me, the use case that made our product click was prompting Cursor while coding. Then I wanted to use it whenever I talked to chatgpt -- it's much faster to talk and then read, and repeat.

Voice is great for whenever the limiting factor to thought is speed of typing.

4. noahjk ◴[10 Apr 25 04:37 UTC] No.43640654[source]▶

>>43640539 (TP) #

Same here. My two biggest hurdles are:

1. like you mentioned, the second I start talking about something, I totally forget where I'm going, have to pause, it's like my thoughts aren't coming to me. Probably some sort of mental feedback loop plus, like you mentioned, different method of thinking.

2. in the back of my mind, I'm always self-conscious that someone is listening, so it's a privacy / being judged / being overheard feeling which adds a layer of mental feedback.

There's also not great audio clues for handling on-the-fly editing. I've tried to say "parentheses word parentheses" and it just gets written out. I've tried to say "strike that" and it gets written out. These interfaces are very 'happy path' and don't do a lot of processing (on iOS, I can say "period" and get a '.' (or ?,!) but that's about the extent).

I have had some success with long-form recording sessions which are transcribed afterwards. After getting over the short initial hump, I can brain-dump to the recording, and then trust an app like Voice Notes or Superwhisper to transcribe, and then clean up after.

The main issue I run into there, though, is that I either forget to record something (ex. a conversation that I want to review later) or there is too much friction / I don't record often enough to launch it quickly or even remember to use that workflow.

I get the same feeling with smart home stuff - it was awesome for a while to turn lights on and off with voice, but lately there's the added overhead of "did it hear me? do I need to repeat myself? What's the least amount of words I can say? Why can't I just think something into existence instead? Or have a perfect contextual interface on a physical device?"

5. SCdF ◴[10 Apr 25 06:32 UTC] No.43641249[source]▶

>>43640539 (TP) #

I use my (work) computer entirely with my voice, and it takes a lot of effort to work out what to actually write and to not ramble. Like you I've found that it's better to throw out words in sort of half sentence chunks, to give your brain time to work out what the next chunk is.

It's very hard, and I wouldn't do it if I didn't have to.

(which is why I'm always perplexed by these apps which allow voice dictation or voice control, but not as a complete accessibility package. I wouldn't be using my voice if my hands worked!)

It's also critically important (and after 3-4 years of this I still regularly fail at this) to actually read what you've written, and edit it before send, because those chunks don't always line up into something that I'd consider acceptably coherent. Even for a one sentence slack message.

(also, I have a kiwi accent, and the dictation software I use is not always perfect at getting what I wanted to say on the page)

replies(1): >>43642620 #

6. cloogshicer ◴[10 Apr 25 08:47 UTC] No.43642013[source]▶

>>43640539 (TP) #

I'm exactly the same. Aqua is so incredible and I really tried to like it, but I just can't get my brain to think of what I want to say first, I have to pause to think constantly.

7. e12e ◴[10 Apr 25 10:46 UTC] No.43642620[source]▶

>>43641249 #

Curious about your current setup, and if maybe adding a macro/functionality to clean up input via an LLM would help?

In my experience LLM can be quite forgiving when given some unfinished input and asked to expand/clean up?

8. ◴[10 Apr 25 12:53 UTC] No.43643285[source]▶

>>43640539 (TP) #

9. adamesque ◴[10 Apr 25 12:55 UTC] No.43643305[source]▶

>>43640621 #

Looking forward to giving it another try!