Most active commenters

littlestymaar(7)
TeMPOraL(5)
fragmede(5)
furyofantares(5)
namaria(4)
shakna(3)

Popular/hot comments

>>43543791 #
>>43544403 #
>>43543986 #
>>43544764 #

←back to thread

The case against conversational interfaces

(julian.digital)

1. ChuckMcM ◴[01 Apr 25 06:35 UTC] No.43543501[source]▶

>>43542131 (OP) #

This clearly elucidated a number of things I've tried to explain to people who are so excited about "conversations" with computers. The example I've used (with varying levels of effectiveness) was to get someone to think about driving their car by only talking to it. Not a self driving car that does the driving for you, but telling it things like: turn, accelerate, stop, slow down, speed up, put on the blinker, turn off the blinker, etc. It would be annoying and painful and you couldn't talk to your passenger while you were "driving" because that might make the car do something weird. My point, and I think it was the author's as well, is that you aren't "conversing" with your computer, you are making it do what you want. There are simpler, faster, and more effective ways to do that then to talk at it with natural language.

replies(11): >>43543657 #>>43543721 #>>43543740 #>>43543791 #>>43543890 #>>43544393 #>>43544444 #>>43545239 #>>43546342 #>>43547161 #>>43551139 #

2. moffkalast ◴[01 Apr 25 06:57 UTC] No.43543657[source]▶

>>43543501 (TP) #

Honestly that just says that the interface is too low level. Telling a car to drive you to some place and make it fast is how we interact with taxi drivers. It works fine as a concept, it just needs a higher level of abstraction that isn't there yet.

replies(2): >>43543752 #>>43543765 #

3. guestbest ◴[01 Apr 25 07:08 UTC] No.43543721[source]▶

>>43543501 (TP) #

If the driver could queue actions it would make chat interfaced driving easier since the desired actions could be prepared for implementation by button press rather than needed a dedicated button built at a factory built by an engineer.

4. phyzix5761 ◴[01 Apr 25 07:10 UTC] No.43543740[source]▶

>>43543501 (TP) #

You're onto something. We've learned to make computers and electronic devices feel like extensions of ourselves. We move our bodies and they do what we expect. Having to switch now to using our voice breaks that connection. Its no longer an extension of ourselves but a thing we interact with.

replies(1): >>43543986 #

5. MatekCopatek ◴[01 Apr 25 07:12 UTC] No.43543752[source]▶

>>43543657 #

This only works for tasks where the details of execution are not important. Driving fits that category well, but many other tasks we're throwing at AI don't.

6. citrin_ru ◴[01 Apr 25 07:15 UTC] No.43543765[source]▶

>>43543657 #

Selecting pick-up/drop-off points on a map for me easier than explaining in words and that’s on of appeals of uber like services.

replies(1): >>43544519 #

7. shubhamjain ◴[01 Apr 25 07:20 UTC] No.43543791[source]▶

>>43543501 (TP) #

I had the same thoughts on conversational interfaces [1]. Humane AI failed not only because of terrible execution, the whole assumption of voice being a superior interface (and trying to invent something beyond smartphones) was flawed.

> Theoretically, saying, “order an Uber to airport” seems like the easiest way to accomplish the task. But is it? What kind of Uber? UberXL, UberGo? There’s a 1.5x surge pricing. Acceptable? Is the pickup point correct? What would be easier, resolving each of those queries through a computer asking questions, or taking a quick look yourself on the app?

> Another example is food ordering. What would you prefer, going through the menu from tens of restaurants yourself or constantly nudging the AI for the desired option? Technological improvement can only help so much here since users themselves don’t clearly know what they want.

[1]: https://shubhamjain.co/2024/04/16/voice-is-bad-ui/

replies(5): >>43543915 #>>43544743 #>>43544877 #>>43544978 #>>43545602 #

8. d3vmax ◴[01 Apr 25 07:35 UTC] No.43543890[source]▶

>>43543501 (TP) #

Agree. Not all systems require convo mode. I personally find Chat/Convo/IVR type interface slow/tedious. Keyboard/Mouse ftw.

However, A CEO using Power BI with Convo to can get more insights/graphs rather than slice/dicing his data. They do have fixed metrics but incase they want something not displayed.

9. JSR_FDED ◴[01 Apr 25 07:38 UTC] No.43543915[source]▶

>>43543791 #

And 10x worse than that is booking a flight: I found one that fits your budget, but it leaves at midnight, or requires an extra stop, or is on an airline for which you don't collect frequent flyer miles, or arrives it at a secondary airport in the same city, or it only has a middle seat available.

How many of these inconveniences will you put up with? Any of them, all of them? What price difference makes it worthwhile? What if by traveling a day earlier you save enough money to even pay for a hotel...?

All of that is for just 1 flight, what if there are several alternatives? I can't imagine have a dialogue about this with a computer.

replies(1): >>43544403 #

10. namaria ◴[01 Apr 25 07:53 UTC] No.43543986[source]▶

>>43543740 #

Two key things that make computers useful, specificity and exactitude, are thrown out of the window by interposing NLP between the person and the computer.

I don't get it at all.

replies(3): >>43544143 #>>43546069 #>>43546495 #

11. TeMPOraL ◴[01 Apr 25 08:18 UTC] No.43544143{3}[source]▶

>>43543986 #

   [imprecise thinking]
         v <--- LLMs do this for you
   [specific and exact commands]
         v
   [computers]
         v
   [specific and exact output]
         v <--- LLMs do this for you
   [contextualized output]

In many cases, you don't want or need that. In some, you do. Use right tool for the job, etc.

replies(2): >>43544577 #>>43551590 #

12. scott_w ◴[01 Apr 25 08:55 UTC] No.43544393[source]▶

>>43543501 (TP) #

> you couldn't talk to your passenger while you were "driving" because that might make the car do something weird.

This even happens while walking my dog. If my wife messages me, my iPhone reads it out and, at the same time, I'm trying to cross a road, she'll get a garbled reply which is just me shouting random words at my dog to keep her under control.

13. fragmede ◴[01 Apr 25 08:56 UTC] No.43544403{3}[source]▶

>>43543915 #

But that is how we used to buy a plane ticket. Long before flights.google.com's price table, you'd call a human up and tell them you'd like to go on holiday. They'd ask you where and when and how much you could afford, and then after a while with the old system (SABRE) clicking and clacking they'd find you a good deal. After a few flights with that travel agent, they'd hey to know you and wouldn't have to ask so many questions.

Similarly, long before Waymo, you'd get into a taxi, and tell the human driver you're going to the airport, and they'd take you there. In fact, they'd get annoyed at you if you backseat drove, telling them how to use the blinker and how hard to brake and accelerate.

The thing about conversational interfaces is that we're used to them, because we (well, some of us) interface with other humans fairly regularly, and so it's a fairly baseline level skill to have to exist in the world today. There's a case to be made against them, but since everyone can be assumed to be conversational (though perhaps not in a given language), it's here to stay. Restaurants have menus that customers look at before using the conversation interface to get food, in order to guide the discussion, and that's had thousands of years to evolve, so it might be a local maxima, but it's a pretty good one.

replies(4): >>43544764 #>>43545138 #>>43546443 #>>43547779 #

14. pydry ◴[01 Apr 25 09:01 UTC] No.43544444[source]▶

>>43543501 (TP) #

This rules out conversational UI for some tasks and applications but there are many where it will be useful and many where a hybrid would be best.

Even in a car, being able to control the windscreen wipers, radio, ask how much fuel is left are all tasks it would be useful to do conversationally.

There are some apps (im thinking of jira as an example) where i'd like to do 90% of the usage conversationally.

replies(1): >>43545066 #

15. TeMPOraL ◴[01 Apr 25 09:13 UTC] No.43544519{3}[source]▶

>>43543765 #

It's easier up until it's time to drop you off, and the selected dropoff point is suboptimal or plain impossible to stop at, and you want to give the car last-minute directions. Then the traditional, "human driver way" of looking out the window and telling them where to go based on what you see is far superior than trying to perspective-switch between the 3D situated view and imprecise, finicky 2D map interface.

replies(1): >>43544951 #

16. shakna ◴[01 Apr 25 09:19 UTC] No.43544577{4}[source]▶

>>43544143 #

I don't think they give a specific and exact output, considering how nondeterminism plays a role in most models.

replies(2): >>43545096 #>>43545818 #

17. littlestymaar ◴[01 Apr 25 09:40 UTC] No.43544743[source]▶

>>43543791 #

Why couldn't the interface ask you about your preferences? Because instead, what we have right now are clunky web interface that just cram every choice in the small screen in front of you and letting you understand how they are in fact different and sort out yourself how to make things work.

Of course a conversational interface is useless if it tries to just do the same thing as a web UI, which is why it failed a decade ago when it was trendy, because the tech was nowhere clever enough to make that useful. But today, I'd bet the other way round.

replies(2): >>43544812 #>>43546328 #

18. nerdponx ◴[01 Apr 25 09:42 UTC] No.43544764{4}[source]▶

>>43544403 #

And how many people book flights that way today?

replies(3): >>43545202 #>>43545915 #>>43551891 #

19. earnestinger ◴[01 Apr 25 09:48 UTC] No.43544812{3}[source]▶

>>43544743 #

It can ask, but how much time do you want to spend answering stuff?

Such dialog is probably nice for first time user, it is a nightmare for repeated user.

replies(1): >>43544989 #

20. ramblejam ◴[01 Apr 25 09:57 UTC] No.43544877[source]▶

>>43543791 #

> since users themselves don’t clearly know what they want.

Knowing what you want is, sadly, computationally irreducible.

21. citrin_ru ◴[01 Apr 25 10:06 UTC] No.43544951{4}[source]▶

>>43544519 #

A perfect interface would be a combination of both ways. Also it depends on personal preferences. I learned to use a paper map as a teenager and it’s convenient for me but I know some people struggle to find a way even using a map on a smartphone.

22. Propelloni ◴[01 Apr 25 10:11 UTC] No.43544978[source]▶

>>43543791 #

> I had the same thoughts on conversational interfaces [1]. Humane AI failed not only because of terrible execution, the whole assumption of voice being a superior interface (and trying to invent something beyond smartphones) was flawed.

Amen to that. I guess, it would help to get of the IT high horse and have a talk with linguists and philosophers of language. They are dealing with this shit for centuries now.

23. littlestymaar ◴[01 Apr 25 10:13 UTC] No.43544989{4}[source]▶

>>43544812 #

What prevents the system to remember your previous choices?

Then it can assume you choice haven't changed, and propose you a solution that matches your previous choices. And to give the user control it just needs to explicitly tell the user about the assumption it made.

In fact, a smart enough system could even see when violating the assumptions could lead to a substantial gain and try convincing the user that it may be a good option this time.

replies(2): >>43545637 #>>43546360 #

24. la_oveja ◴[01 Apr 25 10:25 UTC] No.43545066[source]▶

>>43544444 #

> Even in a car, being able to control the windscreen wipers, radio, ask how much fuel is left are all tasks it would be useful to do conversationally.

are you REALLY sure you want that?

how much fuel there is is a quick glance into the dash, and you can control precisely the radio volume without even looking.

'turn up the volume', 'turn down the volume a little bit', 'a bit more',...

and then a radio ad going 'get yourself a 3 pack of the new magic wipers...' and car wipers going off.

id hate conversational ui on my car.

replies(2): >>43546133 #>>43546878 #

25. furyofantares ◴[01 Apr 25 10:31 UTC] No.43545096{5}[source]▶

>>43544577 #

The diagram you're replying to agrees with this.

replies(1): >>43545198 #

26. everdrive ◴[01 Apr 25 10:37 UTC] No.43545138{4}[source]▶

>>43544403 #

But the booking agent used to understand what you were saying, and it'd be very easy to work out miscommunications. AI chatbots just send you in circles endlessly and if you get "stuck" there is no recourse.

27. walthamstow ◴[01 Apr 25 10:45 UTC] No.43545198{6}[source]▶

>>43545096 #

Does it? The way I'm reading it, the first step is LLM turning human imprecise thinking into specific and exact commands

replies(1): >>43545359 #

28. wetoastfood ◴[01 Apr 25 10:46 UTC] No.43545202{5}[source]▶

>>43544764 #

Many people today are booking flights for others, be it families, business leaders, or traditional travel agents. They’re communicating preferences and asking about preferred travel times, budget, seat selection, and more. When you book for and with someone else, these preferences get learned and you no longer have to ask if they prefer an aisle seat—you just pick it.

The booking experience today is granular to help you find a suitable flight to meet all the preferences you’re compiling into an optimal scenario. The experience of AI booking in the future will likely be similar: find that optimal scenario for you once you’re able to articulate your preferences and remember them over time.

replies(1): >>43547319 #

29. johnnyanmac ◴[01 Apr 25 10:50 UTC] No.43545239[source]▶

>>43543501 (TP) #

Yeah, it comes and goes in games for a reason. If it's not already some sort of social game, then the time to speak an answer is always slower than 3 button presses to select a pre-canned answer. Navigating a menu with Kinect voice commands will often be slower than a decent interface a user clicks through.

Voice interface only prevails in situations with hundreds of choices, and even then it's probably easier to use voice to filter down choices rather than select. But very few games have such scale to worry about (certainly no AAA game as of now).

30. furyofantares ◴[01 Apr 25 11:10 UTC] No.43545359{7}[source]▶

>>43545198 #

That's true, but that is the input section of the diagram, not the output section where [specific and exact output] is labeled, so I believe there was legitimate confusion I was responding to.

To your point, which I think is separate but related, that IS a case where LLMs are good at producing specific and exact commands. The models + the right prompt are pretty reliable at tool calling by themselves, because you give them a list of specific and exact things they can do. And they can be fully specific and exact at inference time with constrained output (although you may still wish it called a different tool.)

replies(1): >>43546984 #

31. indigoabstract ◴[01 Apr 25 11:46 UTC] No.43545602[source]▶

>>43543791 #

And then there is the fact that voice isn't the dominant mode of expression for all people. Some are predominantly visual thinkers, some are analytic and slow to decide, while some prefer to use their hands and so on.

I guess there's just no substitute for someone actually doing the work of figuring out the most appropriate HMI for a given task or situation, be it voice controls, touch screens, physical buttons or something else.

32. nosianu ◴[01 Apr 25 11:50 UTC] No.43545637{5}[source]▶

>>43544989 #

It still has to tell you. Visually in a form it's much faster. Similar reason why many people prefer a blog post over a video.

Talking is not very efficient, and it's serial in fixed time. With something visual you can look at whatever you want whenever you want, at your own (irregular) pace.

You will also be able to make changes much faster. You can go to the target form element right away, and you get immediate feedback from the GUI (or from a physical control that you moved - e.g. in cars). If it's talk, you need to wait to have it said back to you - same reason as why important communication in flight control or military is always read back. Even humans misunderstand. You can't just talk-and-forget unless you accept errors.

You would need some true intelligence for just some brief spoken requests to work well enough. A (human) butler worked fine for such cases, but even then only the best made it into such high-level service positions, because it required real intelligence to know what your lord needed and wanted, and lots of time with them to gain that experience.

replies(2): >>43546167 #>>43552834 #

33. TeMPOraL ◴[01 Apr 25 12:13 UTC] No.43545818{5}[source]▶

>>43544577 #

I'll need to work on the diagram to make it clearer next time.

What it's trying to communicate is, in general, a human operating a computer has to turn their imprecise thinking into "specific and exact commands", and subsequently, understand the "specific and exact output" in whatever terms they're thinking off, prioritizing and filtering out data based on situational context. LLMs enter the picture in two places:

1) In many situations, they can do the "imprecise thinking" -> "specific and exact commands" step for the user;

2) In many situations, they can do the "specific and exact output" -> contextualized output step for the user;

In such scenarios, LLMs are not replacing software, they're being slotted as intermediary between user and classical software, so the user can operate closer to what's natural for them, vs. translating between it and rigid computer language.

This is not applicable everywhere, but then, this is also not the only way LLMs are useful - it's just one broad class of scenarios in which they are.

34. mschuster91 ◴[01 Apr 25 12:24 UTC] No.43545915{5}[source]▶

>>43544764 #

More than enough. Corporate flights are almost always handled that way, alone for compliance reasons (the travel agency knows about budget and "appearance" limits aka only c-level gets business class, everyone else gets economy).

Anecdata: last year my wife and I went on a rail tour through Eastern Europe and god, I wish we had chosen to spend a few hundred euros on a travel agency in retrospect - I can't count just how much time we had to spend researching on what kind of rail, bus and public transit tickets you need on which leg, how to create accounts, set up payment and godknowswhat else. Easily took us two days worth of work and about two dozens individual payment transactions. A professional travel agency can do all the booking via Sabre, Amadeus or whatever...

35. brookst ◴[01 Apr 25 12:39 UTC] No.43546069{3}[source]▶

>>43543986 #

I also don’t like command like interfaces for all things, but there are cases where they excel, or where they are necessary due to technical constraints. But when the man page for a simple command runs to 10 screens of options I sometimes wonder.

36. brookst ◴[01 Apr 25 12:46 UTC] No.43546133{3}[source]▶

>>43545066 #

It’s less common now that car controls have somewhat standardized, but I’m old enough that I remember when rental cars were a pain because it would start raining and you couldn’t find the windshield wipers.

Conversational interfaces are great for rarely used features or when the user doesn’t know how to do something. For repetitive, common tasks they’re terrible.

But nobody is using ChatGPT for repetitive tasks. In fact the whole LLM revolution seems to be about letting users accomplish tasks without having to learn how to do them. Which I know some people look down on, but it’s the literal definition of management (which, to be fair, some people also look down on).

replies(1): >>43550036 #

37. littlestymaar ◴[01 Apr 25 12:49 UTC] No.43546167{6}[source]▶

>>43545637 #

> It still has to tell you. Visually in a form it's much faster.

Who said it cannot be visual? It's still a “conversational” UI if it's a chatbot that writes down its answer.

> Similar reason why many people prefer a blog post over a video.

Well I certainly do, but I also know that we are few and far between in that case. People in general prefer videos over blog post by a very large margin.

> Talking is not very efficient, and it's serial in fixed time. With something visual you can look at whatever you want whenever you want, at your own (irregular) pace. You will also be able to make changes much faster. You can go to the target form element right away, and you get immediate feedback from the GUI.

Saying “I want to travel to Berlin next monday” is much faster than fighting with the website's custom datepicker which will block you until you select your return date until you realize you need to go back and toggle the “one way trip” button before clicking the calendar otherwise it's not working…

There's a reason why nerds love their terminal: GUIs are just very slow and annoying. They are useful for whatever new thing you're doing, because it's much more discoverable than CLI, but it's much less efficient.

> If it's talk, you need to wait to have it said back to you - same reason as why important communication in flight control or military is always read back. Even humans misunderstand. You can't just talk-and-forget unless you accept errors.

This is true, but stays true with a GUI, that's why you have those pesky confirmation pop-ups, because as annoying as they are when you know what you're doing, they are necessary to catch errors.

> You would need some true intelligence for just some brief spoken requests to work well enough.

I don't think so. IMO you just need something that emulates intelligence enough on that particular purpose. And we've seen that LLMs are pretty decent at emulating apparent intelligence so I wouldn't bet against them on that.

replies(1): >>43554271 #

38. UncleMeat ◴[01 Apr 25 13:05 UTC] No.43546328{3}[source]▶

>>43544743 #

Scrolling through a list of a few options seems much less clunky than being asked via voice about which option I prefer. I can see multiple options at once and compare them easily. But via voice I need to keep all of the options in working memory to compare them. Harder.

replies(1): >>43548173 #

39. UncleMeat ◴[01 Apr 25 13:07 UTC] No.43546360{5}[source]▶

>>43544989 #

The previous choice might not what I want today.

Maybe I'm tired of layovers and I'm willing to pay more for a direct flight this time. Maybe I want a different selection at a restaurant because I'm in the mood for tacos rather than a burrito.

replies(1): >>43546976 #

40. grbsh ◴[01 Apr 25 13:16 UTC] No.43546443{4}[source]▶

>>43544403 #

It's a great point that this is how we primarily used to interact with businesses and services, but we've moved on. For Gen-Z, e.g., many will refuse to use the product or service if they have to speak to an actual human. Just like we're now not willing to take boat across the ocean for 3 months, but before airplanes this was not uncommon.

replies(1): >>43548181 #

41. grbsh ◴[01 Apr 25 13:22 UTC] No.43546495{3}[source]▶

>>43543986 #

Why would you ever hire a human to perform some task for you in a company? They're known for having problems with ambiguity and precision in communication.

Humans require a lot of back and forth effort for "alignment" with regular "syncs" and "iterations" and "I'll get that to you by EOD". If you approach the potential of natural interfaces with expectations that frame them the same way as 2000s era software, you'll fail to be creative about new ways humans interact with these systems in the future.

42. notnullorvoid ◴[01 Apr 25 13:59 UTC] No.43546878{3}[source]▶

>>43545066 #

If the choice for controls is touchscreen vs conversational, conversational wins by a mile. However if physical buttons and dials are an option there's really no competing with that.

I wish car manufacturers stopped with the touchscreen bullshit, but it seems more likely that they'll try to offset the terrible experience with voice controls.

43. littlestymaar ◴[01 Apr 25 14:09 UTC] No.43546976{6}[source]▶

>>43546360 #

Just tell it then.

replies(1): >>43555037 #

44. shakna ◴[01 Apr 25 14:09 UTC] No.43546984{8}[source]▶

>>43545359 #

The tool may not even exist. LLMs are really terrible at admitting where the limits of the training are. They will imagine a tool into being. They will also claim the knowledge is within their realm, when it isn't.

replies(1): >>43548118 #

45. steveBK123 ◴[01 Apr 25 14:26 UTC] No.43547161[source]▶

>>43543501 (TP) #

Yeah I mean - haven't we already been doing this a decade with home voice assistant speaker things and all found them to be underwhelming?

Theres 1-5 things any individual finds them useful for (timers/lights/music/etc) and then.. thats it.

99.9% of what I use a computer for its far faster to type/click/touch my phone/tablet/computer.

replies(1): >>43549999 #

46. uoaei ◴[01 Apr 25 14:38 UTC] No.43547319{6}[source]▶

>>43545202 #

And how many bad experiences do you expect people to tolerate before AI eventually learns the person's "real" preferences?

47. Zamaamiro ◴[01 Apr 25 15:13 UTC] No.43547779{4}[source]▶

>>43544403 #

I don't see how "but that's the way we used to do things" is an argument in favor of conversational interfaces.

The whole point is that we currently have better, more efficient ways of doing those things, so why would we regress to inferior methods?

replies(2): >>43548075 #>>43552077 #

48. dcrimp ◴[01 Apr 25 15:37 UTC] No.43548075{5}[source]▶

>>43547779 #

the inferior methods were slower but more flexible - could handle any and all edge cases. Currently we have a UX that really efficiently realises 80% of cases.

To relate to the article - google flights is the Keyboard and Mouse - covering 80% of cases very quickly. Conversational is better for when you're juggling more contextual info than what can be represented in a price/departure time/flight duration table. For example, "i'm bringing a small child with me and have an appointment the day before and I really hate the rain".

Rushed comment because I'm working, but I hope you get the gist.

Current flight planning UX is overfit on the 80% and will never cater to the 20% because cost/benefit of the development work isn't good

49. furyofantares ◴[01 Apr 25 15:40 UTC] No.43548118{9}[source]▶

>>43546984 #

At inference time you can constrain output to a strict json schema that only includes valid tools.

replies(2): >>43548907 #>>43550784 #

50. littlestymaar ◴[01 Apr 25 15:44 UTC] No.43548173{4}[source]▶

>>43546328 #

The problem with scrolling is that you'll be presented tens of options you don't care about because the options have to be determined in advance and be the same for everyone.

That's why the “advanced search” is almost always hidden somewhere. And that's also why you can never find the filter you need on an e-shopping website.

51. taneq ◴[01 Apr 25 15:45 UTC] No.43548181{5}[source]▶

>>43546443 #

Taking a 3 month voyage was still an uncommon thing to do for a person, it’s just that it was the most common type of intercontinental journey due to lack of competition.

52. shakna ◴[01 Apr 25 16:46 UTC] No.43548907{10}[source]▶

>>43548118 #

That would only be possible, if you could prevent hallucinations from ever occurring. Which you can't. Even if you supply a strict schema, the model will sometimes act outside of it - and infer the existence of "something similar".

replies(1): >>43549754 #

53. furyofantares ◴[01 Apr 25 18:04 UTC] No.43549754{11}[source]▶

>>43548907 #

That's not true. You say the model will sometimes act outside of the schema, but models don't act at all, they don't hallucinate by themselves, they don't produce text at all, they do all of this in conjunction with your inference engine.

The model's output is a probability for every token. Constrained output is a feature of the inference engine. With a strict schema the inference engine can ignore every token that doesn't adhere to the schema and select the top token that does adhere to the schema.

54. ryandrake ◴[01 Apr 25 18:29 UTC] No.43549999[source]▶

>>43547161 #

I think a lot of these "voice assistant" systems are envisioned and pushed by senior leadership in companies like SVPs and VPs. They're the ones who make the decision to invest in products like this. Why do they think these products make sense? Because they themselves have personal assistants and nannies and chauffeurs and private chefs, and voice is their primary interface to these people. It makes sense that people who spend all their time vocally telling others to do work, think that voice is a good interface for regular people to tell their computers to do work.

replies(1): >>43550275 #

55. ryandrake ◴[01 Apr 25 18:33 UTC] No.43550036{4}[source]▶

>>43546133 #

> It’s less common now that car controls have somewhat standardized, but I’m old enough that I remember when rental cars were a pain because it would start raining and you couldn’t find the windshield wipers.

This is a problem of standardization across manufacturers, not something inherent in physical controls. I never have a problem using the steering wheel in a rental car because they're all the same.

You'd have the same problem with voice interfaces: For some rental cars, turning on the wipers would be "Turn on the wipers". For others, you'd have to say "Activate the wipers." For others, "Enable the windshield wipers." There is no way manufacturers will be capable of standardizing on a single phrase.

replies(1): >>43572210 #

56. steveBK123 ◴[01 Apr 25 19:03 UTC] No.43550275{3}[source]▶

>>43549999 #

That is actually a very interesting take I've not seen before and does make some sense.

If your work revolves about telling people what to do and asking questions, a voice assistant seems like a great idea (even if you yourself wouldn't have to stoop to using a robotic version since you have a real live human).

If your work actually involves doing things, then voice/conversational text interface quickly falls apart.

57. xigoi ◴[01 Apr 25 20:03 UTC] No.43550784{10}[source]▶

>>43548118 #

Just because the answer adheres to the schema does not mean that it’s correct.

replies(1): >>43551241 #

58. rurp ◴[01 Apr 25 20:44 UTC] No.43551139[source]▶

>>43543501 (TP) #

An empirical example would be Amazon's utter failure at making voice shopping a thing with the Echo. There were always a number of obvious flaws with the idea. There's no way to compare purchase options, check reviews, view images, or just scan a bunch of info at once with your eyeballs at 100x the information bandwidth of a computer generated voice talking to you.

Even for straightforward purchases, how many people trust Amazon to find and pick the best deal for them? Even if Amazon started out being diligent and honest it would never last if voice ordering became popular. There's no way that company would pass up a wildly profitable opportunity to rip people off in an opaque way by selecting higher margin options.

59. furyofantares ◴[01 Apr 25 20:56 UTC] No.43551241{11}[source]▶

>>43550784 #

Yes, we've been discussing "specific and exact" output. As I said, you might wish it called at different tool; nothing in this discussion is addressing that.

60. namaria ◴[01 Apr 25 21:37 UTC] No.43551590{4}[source]▶

>>43544143 #

Despite feeling like a "let me draw it for you" answer is a tad condescending, I want to address something here.

This would be great if LLMs did not tend to output nonsense. Truly it would be grand. But they do. So it isn't. It's wasting resources hoping for a good outcome and risking frustration, misapprehensions, prompt injection attacks... It's non-deterministic algorithms hoping P=NP, except instead of branching at every decision you're doing search by tweaking vectors whose values you don't even know and whose influence on the outcome is impossible to foresee.

Sure, a VC subsidized LLM is a great way to make CVs in LaTeX (I do it all the time), translating text, maybe even generating some code if you know what you need and can describe it well. I will give you that. I even created a few - very mediocre - songs. Am I contradicting myself? I don't think I am, because I would love to live in a hotel if I only had to pay a tiny fraction of the cost. But I would still think that building hotels would be a horrible way to address the housing crisis in modern metropolises.

replies(2): >>43552390 #>>43552872 #

61. fragmede ◴[01 Apr 25 22:16 UTC] No.43551891{5}[source]▶

>>43544764 #

not how many, but which ones? As a regular person, I buy it myself, but do you think rich people do that? No, they just ask their (human) assistant to get a flight to New York around 7pm this Friday, and then move onto the next problem in their lives.

62. fragmede ◴[01 Apr 25 22:47 UTC] No.43552077{5}[source]▶

>>43547779 #

You have to define which axis' you're using to define efficient. If I were an executive at some corporation, I'd tell my assistant to book me a flight to New York on Friday at 7pm and that takes me less than 10 seconds. It may take her a while longer, but that's her problem and that's what I pay her for.

How long is it going to take you to get to a device, load the app/webpage, tell it which airport you're flying from and going to and what date and then you start looking at options. You've blown way past the 10 seconds it took for that executive to get a plane flight.

Better is in the eye of the beholder. What's monetarily efficient isn't going to be temporaly efficient, and that's true along a lot of other dimensions too.

Point is, there are some people that like having conversations, you may not be one of them. you don't have to be. I'm not taking away your mouse and keyboard. I have those too and won't give them up either. But I also find talking out loud helps my thinking process though I know that's not everybody.

63. TeMPOraL ◴[01 Apr 25 23:33 UTC] No.43552390{5}[source]▶

>>43551590 #

> Despite feeling like a "let me draw it for you" answer is a tad condescending, I want to address something here.

I didn't mean it to be condescending - though I can see how it can come across as such. FWIW, I opted for a diagram after I typed half a page worth of "normal" text and realized I'm still not able to elucidate my point - so I deleted it and drew something matching my message more closely.

> This would be great if LLMs did not tend to output nonsense. Truly it would be grand. But they do. So it isn't.

I find this critique to be tiring at this point - it's just as wrong as assuming LLMs work perfectly and all is fine. Both views are too definite, too binary. In reality, LLMs are just non-deterministic - that is, they have an error rate. How big it is, and how small can it get in practice for a given tasks - those are the important questions.

Pretty much every aspect of computing is only probabilistically correct - either because the algorithm is explicitly so (UUIDs and primality testing, for starters), or just because it runs on real hardware, and physics happen. Most people get away with pretending that our systems are either correct or not, but that's only possible because the error rate is low enough. But it's never that low by accident - it got pushed there by careful design at every level, hardware and software. LLMs are just another probabilistically correct system that, over time, we'll learn how to use in ways that gets the error rate low enough to stop worrying about it.

How can we get there - now, that is an interesting challenge.

replies(1): >>43554183 #

64. fragmede ◴[02 Apr 25 01:10 UTC] No.43552834{6}[source]▶

>>43545637 #

> Similar reason why many people prefer a blog post over a video.

I used to be a reading blog over watching video person, but for some things I’ve come to appreciate the video version. The reason you want to get the video of the whatever is because in the blog post, what’s written down only what the author thought was important. But I’m not them. I don’t know everything they know and I don’t see everything they see. I can’t do everything they do but with the video I get everything. When you perform the whatever the video has every detail, not just the ones you think are important. That bit between step 1 and step 2 that’s obvious? It’s not obvious to everyone, or mine is broken in a slightly different way that I really need to see that bit between 1 and 2. of course, videos get edited and cut so they don’t always have that benefit, but I’ve grown to appreciate them.

65. fragmede ◴[02 Apr 25 01:17 UTC] No.43552872{5}[source]▶

>>43551590 #

Using the word hotel has a lot of baggage, but having a large quantity of rooms for rent, for cheap, with a bathroom but no dedicated kitchen would be amazing for the housing crisis. If they were high quality and sound isolated, with high speed elevators, and communal spaces for residents, it could work. I'm not an architect though.

replies(2): >>43553074 #>>43554142 #

66. saratogacx ◴[02 Apr 25 02:08 UTC] No.43553074{6}[source]▶

>>43552872 #

In the early 2000's there was a push for building apodments which were a room, bathroom, and shared kitchen area. Some people liked them but it isn't for everyone.

67. namaria ◴[02 Apr 25 06:25 UTC] No.43554142{6}[source]▶

>>43552872 #

You're describing student housing and if you ever lived in one you'd know how bad of an idea you're musing with.

replies(1): >>43554270 #

68. namaria ◴[02 Apr 25 06:35 UTC] No.43554183{6}[source]▶

>>43552390 #

Natural language has a high entropy floor. It's a very noisy channel. This isn't anything like bit flipping or component failure. This is a whole different league. And we've been pouring outrageous amounts of resources into diminishing returns. OpenAI keeps touting AGI and burning cash. It's being pushed everywhere as a silver bullet, helping spin lay offs as a good thing.

LLMs are cool technology sure. There's a lot of cool things in the ML space. I love it.

But don't pretend like the context of this conversation isn't the current hype and that it isn't reaching absurd levels.

So yeah we're all tired. Tired of the hype, of pushing LLMs, agents, whatever, as some sort of silver bullet. Tired of the corporate smoke screen around it. NLP is still a hard problem, we're nowhere near solving it, and bolting it on everything is not a better idea now than it was before transformers and scaling laws.

On the other hand my security research business is booming and hey the rational thing for me to say is: by all means keep putting NLP everywhere.

69. nosianu ◴[02 Apr 25 06:54 UTC] No.43554271{7}[source]▶

>>43546167 #

> Who said it cannot be visual? It's still a “conversational” UI if it's a chatbot that writes down its answer.

You can't be serious??

Oh it's 1st of April, my apologies! I almost took it seriously. I should ignore this website on this day.

replies(1): >>43554337 #

70. TeMPOraL ◴[02 Apr 25 06:54 UTC] No.43554270{7}[source]▶

>>43554142 #

He's also describing hotels, and aparthostels, and officers' quarters on a ship and bunch of other stuff. The devil is in the details - specifically, how much it costs to rent per sqm, and what stops the price from going up to the point it forces multiple people to share the room? What stops the landlords from subdividing the rooms further and renting them out apiece? What stops already shoddy construction from getting even worse?

Those are the big challenges of housing. Not just how many units there are, but what they are, and how much the "how many" is plain cheating.

71. littlestymaar ◴[02 Apr 25 07:08 UTC] No.43554337{8}[source]▶

>>43554271 #

I don't understand your complaint.

What's the difference between a blog post and a chatbot answer in terms of how “visual” things are?

72. soco ◴[02 Apr 25 09:40 UTC] No.43555037{7}[source]▶

>>43546976 #

And then we're back to point one: retelling the whole stack of choices every time because nobody on the other side of the conversation, person or AI; can tell whether all my previous options are still valid. Because even I, the caller, might not remember what "defaults" I set in the previous call. So yeah, this argument in favor of conversational interfaces sounds at this point more like ideology than logic.

replies(1): >>43555378 #

73. littlestymaar ◴[02 Apr 25 10:44 UTC] No.43555378{8}[source]▶

>>43555037 #

> every time because nobody on the other side of the conversation, person or AI; can tell whether all my previous options are still valid.

But you can, so as long as the interlocutor tells you what assumptions it made, you can correct it if it doesn't match your current mood.

> So yeah, this argument in favor of conversational interfaces sounds at this point more like ideology than logic.

There's no ideology behind the fact that every people rich enough to afford paying someone to deal with mundane stuff will have someone doing it for them, it's just about convenience. Nobody likes to fight with web UIs for fun, the only reason why it has become mainstream is because it's so much cheaper than having a real person working.

Same for Microsoft Word by the way, many people used to have secretaries typing stuff for them, and it's been a massive regression of social status for the upper middle class to have to type things by themselves, it only happened because it was cheaper (in appearance at least).

replies(1): >>43556181 #

74. soco ◴[02 Apr 25 13:00 UTC] No.43556181{9}[source]▶

>>43555378 #

Okay I think I finally get your point, and I even agree. The comparison with an executive assistant doesn't help much here, because the CEO interacts with only one person over all those delegatable activities, and the expectations are that person already knows all the defaults. That's what makes it smooth. This doesn't scale when you must deal with a different AI for each interaction. Will we get to a (scary maybe) point where Siri/Alexa/whoever can actually be that personal assistant? Maybe, but we're still far from it. So at least for today, the conversational interface is an extra burden. And tomorrow, we'll see.

75. pydry ◴[03 Apr 25 16:38 UTC] No.43572210{5}[source]▶

>>43550036 #

That's kinda the point. Previously they couldnt but with an LLM driven conversational interface they wouldnt have to standardize - all of those phrases would turn on the wipers.

↑