←back to thread

279 points nnx | 7 comments | | HN request time: 0.71s | source | bottom
1. pugio ◴[] No.43543086[source]
> The second thing we need to figure out is how we can compress voice input to make it faster to transmit. What’s the voice equivalent of a thumbs-up or a keyboard shortcut? Can I prompt Claude faster with simple sounds and whistles?

This reminds me of the amazing 2013 video of Travis Rudd coding python by voice: https://youtu.be/8SkdfdXWYaI?si=AwBE_fk6Y88tLcos

The number of times in the last few years I've wanted that level of "verbal hotkeys"... The latencies of many coding llms are still a little bit too low to allow for my ideal level of flow (though admittedly I haven't tried one's hosted on services like groq), but I can clearly envision a time when I'm issuing tight commands to a coder model that's chatting with me and watching my program evolve on screen in real time.

On a somewhat related note to conversational interfaces, the other day I wanted to study some first aid stuff - used Gemini to read the whole textbook and generate Anki flash cards, then copied and pasted the flashcards directly into chat GPT voice mode and had it quiz me. That was probably the most miraculous experience of voice interface I've had in a long time - I could do chores while being constantly quizzed on what I wanted to learn, and anytime I had a question or comment I could just ask it to explain or expound on a term or tangent.

replies(2): >>43543436 #>>43543497 #
2. WhyIsItAlwaysHN ◴[] No.43543436[source]
I worked like that for a year in uni because of RSI and it's very easy to get voice strain if you use your voice for coding like that. Many short commands is very tiring for the voice.

It's also hard to dictate code without a lot of these commands because it's very dense in information.

I hope something else will be the solution. Maybe LLMs being smart enough to guess the code out of a very short description and then a set of corrections.

replies(1): >>43557189 #
3. szszrk ◴[] No.43543497[source]
Oh wow. That video is 12 years old. Early in the presentation Travis reveals he used Dragon back then.

Do you recall Swype keyboard for Android? The one that popularized swyping to write on touch screens? It had Dragon at some point.

IT WAS AMAZING.

Around 12-14 years ago (Android 2.3? Maybe 3?) I was able to easily dictate full long text messages and emails, in my native tongue, including punctuation and occasional slang or even word formation. I could dictate a decent long paragraph of text on the first try and not have to fix a single character.

It's 2025 and the closest I can find is a dictation app on my newest phone that uses online AI service, yet it's still not that great when it comes to punctuation and requires me to spit the whole paragraph at once, without taking a breath.

Is there anything equally effective for any of you nowadays? That actually works across the whole device?

replies(2): >>43544061 #>>43554035 #
4. Cthulhu_ ◴[] No.43544061[source]
It sounds like Dragon was never ambitious enough, and / or the phone manufacturers were too closed off to allow them entry into that market.

But now Microsoft bought them a few years ago. Weird that it took so long though.

5. davvid ◴[] No.43554035[source]
> It's 2025 and the closest I can find is a dictation app on my newest phone that uses online AI service, yet it's still not that great [...]

> Is there anything equally effective for any of you nowadays?

I'm not affiliated in any way. You might be interested in the "Futo Keyboard" and voice input apps - they run completely offline and respect your privacy.

The source code is open and it does a good job at punctuation without you needing to prompt it by saying, "comma," or, "question mark," unlike other voice input apps such as Google's gboard.

https://keyboard.futo.org/

replies(1): >>43554162 #
6. szszrk ◴[] No.43554162{3}[source]
Thanks for that suggestion.

I know and like Futo, very interesting project. Unfortunately multilang models are not great in my case. Still not bad for an offline tool, but far from "forget it's there, just use it" vibe I had with Dragon.

Funny thing is that I may have missgonfigured something in futo, because my typing corrections are phonetical :) so I type something in Polish and get autocorrect in English composed of different letters, but kind of similar sounding word.

7. mplanchard ◴[] No.43557189[source]
Would be nice to be able to do something like write a function signature and then just say “fill out this function,” with it having the implicit needed context, as though it had been pairing with you all along and is just taking the wheel for a second. Or when you’ve finished writing a function, “test this function with some happy path inputs.” I feel like I’d appreciate that kind of use, which could integrate decently into the flow state I get into when programming. The current suite of tools for me often feels too clunky, with the need to explicitly manage context and queries: it takes me out of my flow state and feels slower than just doing it myself.