←back to thread

139 points the_king | 2 comments | | HN request time: 0.507s | source

Hey HN - It’s Finn and Jack from Aqua Voice (https://withaqua.com). Aqua is fast AI dictation for your desktop and our attempt to make voice a first-class input method.

Video: https://withaqua.com/watch

Try it here: https://withaqua.com/sandbox

Finn is uber dyslexic and has been using dictation software since sixth grade. For over a decade, he’s been chasing a dream that never quite worked — using your voice instead of a keyboard.

Our last post (https://news.ycombinator.com/item?id=39828686) about this seemed to resonate with the community - though it turned out that version of Aqua was a better demo than product. But it gave us (and others) a lot of good ideas about what should come next.

Since then, we’ve remade Aqua from scratch for speed and usability. It now lives on your desktop, and it lets you talk into any text field -- Cursor, Gmail, Slack, even your terminal.

It starts up in under 50ms, inserts text in about a second (sometimes as fast as 450ms), and has state-of-the-art accuracy. It does a lot more, but that’s the core. We’d love your feedback — and if you’ve got ideas for what voice should do next, let’s hear them!

Show context
hu3 ◴[] No.43637698[source]
How does it compare to https://wisprflow.ai ?

btw, grats!

replies(1): >>43637878 #
the_king ◴[] No.43637878[source]
Thanks!

We're faster, more accurate, and have a streaming option. Aqua can go from key-up to paste in as little as 450ms. Flow was closer to 1000 in our tests.

Overall, you'll notice we make a few more tweaks to the output than Wisprflow.

For example, Aqua + Cursor is very powerful - we syntax highlight your transcript. The easiest way to see this is to use streaming mode (double press Fn) + deep context + cursor and try asking it to change something.

This also works in other "context rich" environments.

replies(2): >>43638058 #>>43638425 #
1. redcanvas ◴[] No.43638425[source]
Hey, love the what you are building in this category. I've been using a competing product which you know very well about. They advertised about how you can improve your work per minute by dictation, which was the main draw for me because I do a founder. There's a lot of managerial work that I'm doing.

It has been a godsend in terms of increasing my productivity because I no longer have to type. I think your product's accuracy and latency shortening just make this even better. I often use it and then find out, "Hey, I need to make some changes," and I need to re-edit some of the stuff, which reduces the WPM productivity amount. So I think accuracy is definitely key here. Key metric to differentiate a product.

I am pushing this to other colleagues to get them to adopt. One challenge people are saying is that. One is that some people may not be as organized (you know, they might be a lot more organizationally structured in their mind). So for them, they're having trouble - they'd like to write things out, and by the time if things go out of their mouth, you know it's already formulated logical thought. Whereas you know people like me are a lot more verbal vomit type of person. For me it's huge because I say a lot of um like in all the other things I just dump stuff out and then organize it later.

Whereas other people organize stuff in their brain and then dump the information out. So people who do a lot of coordination and just you know so I feel like this could be two different segments to take into account.

Another one that's been fantastic is that we have multilingual colleagues who are speaking in Mandarin or something else and then they speak it and then ask Flow to be sent to translate it to a different language. That part I think has been fantastic.

I think the ability to edit what you wrote with AI is going to be the next key feature. Providing the context in the window is all wiithin the conversation right? For example, you just ask after because what you write out is not the final and you need to do a lot of editing and formatting. Sometimes when you say too much stuff, it's just like a huge jumble paragraph with a lot of fluff words. Make it clear, concise, trim non-effective words. I think those are a key feature because it's not about your productivity, it's about other people being able to ingest your information efficiently. At least that's what I look at from a managerial perspective.

To give you an example, everything I laid out above came from dictation. You can see how this is inefficient. There's a lot of inefficiencies here.

replies(1): >>43638591 #
2. redcanvas ◴[] No.43638591[source]
A feature that would be great is similar to how you can write snippets in all the other tools where you can say "calendar" or "cal" and then it gives you the link. If this is something possible, I think that would make this fantastic.

Another feature that would be great is actually being able to have a conversation with an AI model first and then refine the output iteratively until you're ready and then pipe all that over. The ability to have a chat is very good or do this all through voice.