Most active commenters

    ←back to thread

    139 points the_king | 14 comments | | HN request time: 0.411s | source | bottom

    Hey HN - It’s Finn and Jack from Aqua Voice (https://withaqua.com). Aqua is fast AI dictation for your desktop and our attempt to make voice a first-class input method.

    Video: https://withaqua.com/watch

    Try it here: https://withaqua.com/sandbox

    Finn is uber dyslexic and has been using dictation software since sixth grade. For over a decade, he’s been chasing a dream that never quite worked — using your voice instead of a keyboard.

    Our last post (https://news.ycombinator.com/item?id=39828686) about this seemed to resonate with the community - though it turned out that version of Aqua was a better demo than product. But it gave us (and others) a lot of good ideas about what should come next.

    Since then, we’ve remade Aqua from scratch for speed and usability. It now lives on your desktop, and it lets you talk into any text field -- Cursor, Gmail, Slack, even your terminal.

    It starts up in under 50ms, inserts text in about a second (sometimes as fast as 450ms), and has state-of-the-art accuracy. It does a lot more, but that’s the core. We’d love your feedback — and if you’ve got ideas for what voice should do next, let’s hear them!

    1. fxtentacle ◴[] No.43637679[source]
    This looks like it'll slurp up all your data and upload it into a cloud. Thanks, no. I want privacy, offline mode and source code for something as crucial to system security as an input method.

    "we also collect and process your voice inputs [..] We leverage this data for improvements and development [..] Sharing of your information [..] service providers [..] OpenAI" https://withaqua.com/privacy

    replies(7): >>43637923 #>>43638662 #>>43638673 #>>43638808 #>>43639318 #>>43639535 #>>43640415 #
    2. FloatArtifact ◴[] No.43637923[source]
    Local inference only is an absolute requirement. It's not even really all that accessible if it's online only. I can say this as someone that's used over 20000 hours worth of voice dictation and computer control.
    3. pokstad ◴[] No.43638662[source]
    This should be on the FAQ. I was trying to find out if it was 100% processed locally.
    4. jackthetab ◴[] No.43638673[source]
    Agreed.

    This is where I bounce (out of this discussion).

    5. thmsmlr ◴[] No.43638808[source]
    I totally agree, I created BetterDictation (.com) exactly because of that. Offline was a super important requirement for me.
    6. canada_dry ◴[] No.43639318[source]
    First thing I looked for and read: the FAQ.

    No mention of privacy (or on prem) - so assumed it's 100% cloud.

    Non-starter for me. Accuracy is important, but privacy is more so.

    Hopefully a service with these capabilities will be available where the first step has the user complete a brief training session, sends that to the cloud to tailor the recognition parameters for their voice and mannerisms... then loads that locally.

    replies(1): >>43650975 #
    7. toddmorey ◴[] No.43639535[source]
    And man it's another monthly subscription. I'm not mad at them for finding a gap in the market and putting a business around it. I'm mad at Apple for leaving that gap... hopefully built in voice dictation improves quickly.
    replies(2): >>43639713 #>>43650211 #
    8. FireBeyond ◴[] No.43639713[source]
    Is there a gap in the market? It's being rapidly filled with the likes of MacWhisper, etc., which offer local-only, one-off pricing.
    9. jmcintire1 ◴[] No.43640415[source]
    fair point. offline+local would be ideal, but as it stands we can't run asr and an llm locally at the speed that is required to provide the level of service we want to.

    given that we need the cloud, we offer zero data retention -- you can see this in the app. your concern is as much about ux and communications as it is privacy

    replies(2): >>43641065 #>>43642213 #
    10. mrtesthah ◴[] No.43641065[source]
    MacWhisper does realtime system-wide dictation on your local machine (among other things). Just a one-time fee for an app you download -- the way shareware is supposed to be. Of course it doesn't use MoE transcription with 6 models like Aqua Voice, but if you guys expect to be acquired by Apple (that is your exit strategy, right?), you're going to need better guarantees of privacy than "we don't log".
    replies(1): >>43642111 #
    11. shinycode ◴[] No.43642111{3}[source]
    I downloaded the turbo whisper model optimized for Mac, created a python script that get the mic input and paste the result. The python script is LLM generated and it works with pushing a key. For 80% of the functionality for free and done locally.
    12. fxtentacle ◴[] No.43642213[source]
    The problem if you actually need the cloud is that it kind of completely destroys your business model. OpenAI is bleeding money every month because they massively subsidize the hosting cost of their models. But eventually they will have to post a profit. And then if they know that your product is completely dependent on their API, they can milk you until there's no profits left for you.

    And self-hosting real-time streaming LLMs will probably also come out at 50 cents per hour. Arguing a $120/month price for power users is probably going to be very difficult. Especially so if there is free open-source alternatives.

    13. pablopeniche ◴[] No.43650211[source]
    "hopefully built in voice dictation improves quickly." I would not hold my breath on that one lol
    14. oulipo ◴[] No.43650975[source]
    A similar but offline tool is VoiceInk, it's also open-source so you can extend it