Most active commenters

fka(11)
aatd86(5)
ActionHank(3)

Popular/hot comments

>>44005677 #

Beyond Text: On-Demand UI Generation for Better Conversational Experiences

(blog.fka.dev)

1. exe34 ◴[16 May 25 11:35 UTC] No.44004118[source]▶

I was hoping to do this over IRC but never got around to implementing it. I hate the idea of implementing a whole website/chat system, when they already exist. I'd like to use it for my (currently in-existent) home automation communication.

replies(1): >>44004409 #

2. maxcan ◴[16 May 25 11:42 UTC] No.44004181[source]▶

>>44003347 (OP) #

Video isn't loading.

replies(1): >>44004291 #

3. casey2 ◴[16 May 25 11:53 UTC] No.44004277[source]▶

>>44003347 (OP) #

If it could have been done it would have by now

replies(2): >>44004413 #>>44004733 #

4. fka ◴[16 May 25 11:54 UTC] No.44004291[source]▶

>>44004181 #

I think it’s because of the video format.

https://x.com/fkadev/status/1923102445799927818?s=46

5. fka ◴[16 May 25 12:05 UTC] No.44004409[source]▶

>>44004118 #

Perfect home automation never exists.

6. fka ◴[16 May 25 12:06 UTC] No.44004413[source]▶

>>44004277 #

You can say this for all kind of inventions and new ideas.

7. utku1337 ◴[16 May 25 12:12 UTC] No.44004470[source]▶

>>44003347 (OP) #

looks very useful

8. joshstrange ◴[16 May 25 12:14 UTC] No.44004489[source]▶

>>44003347 (OP) #

Related, it’s crazy to me that OpenAI hasn’t already done something like this for Deep Research.

After your initial question, it always follows up asking some clarifying questions, but it’s completely up to the user to format their responses and I always wonder if people are sloppy if the LLM gets confused. It would make much more sense for OpenAI to break out each question and have a dedicated answer box. That way the user’s response can be consistent and there’s less of a chance they make a mistake or forget to answer a question.

replies(2): >>44004520 #>>44006374 #

9. fka ◴[16 May 25 12:17 UTC] No.44004520[source]▶

>>44004489 #

OpenAI would implement this within a minute or smth I guess.

10. revskill ◴[16 May 25 12:40 UTC] No.44004733[source]▶

>>44004277 #

Startups fo not have enough efforts to impriove ux, that is why we have jira.

11. aatd86 ◴[16 May 25 13:34 UTC] No.44005353[source]▶

>>44003347 (OP) #

that's not a very innovative idea or even better UX. I think that the future wil have to do with voice commands and mcps will be the backend, exposing capabilities.

replies(2): >>44005653 #>>44006450 #

12. ActionHank ◴[16 May 25 14:01 UTC] No.44005653[source]▶

>>44005353 #

Because we are all going to be in our open planned offices shouting into the void hoping it poops out the app we want?

replies(1): >>44006745 #

13. ActionHank ◴[16 May 25 14:03 UTC] No.44005677[source]▶

>>44003347 (OP) #

I really believe this is the future.

Conversations are error prone and noisy.

UI distills down the mode of interaction into something defined and well understood by both parties.

Humans have been able to speak to each other for a long time, but we fill out forms for anything formal.

replies(3): >>44005809 #>>44006013 #>>44008426 #

14. banga ◴[16 May 25 14:07 UTC] No.44005717[source]▶

>>44003347 (OP) #

Semantic clarity of written prose is hard, but this approach seems like making it easier for the machines rather than the other way around.

15. fka ◴[16 May 25 14:16 UTC] No.44005809[source]▶

>>44005677 #

Exactly! LLMs can generate UIs according to user needs. E.g. it can generate simplified or translated ones, on-demand. No need for preset forms or long ones. Just the required ones.

16. visarga ◴[16 May 25 14:35 UTC] No.44006013[source]▶

>>44005677 #

> Conversations are error prone and noisy.

I thought you'd say not being able to reload the form at a later time from the same URL is bad. This would be a "quantum UI" slightly different every time you load it.

replies(1): >>44006271 #

17. jFriedensreich ◴[16 May 25 14:42 UTC] No.44006098[source]▶

>>44003347 (OP) #

I was working on exactly this in gpt 3 days and still believe ad hoc generation of super specifc and contextual relevant UIs will solve a lot of problems and friction that purely textual or speech based conversational interfaces pose especially if the UI elements like sliders provide some form of live feedback of their effect and are possible to scroll back to or pin and make changes anytime.

replies(2): >>44006257 #>>44009045 #

18. WillAdams ◴[16 May 25 14:56 UTC] No.44006257[source]▶

>>44006098 #

This always felt like something which the LCARS interface addressed, at least conceptually (though I've never seen an implementation which was more than just a skin).

I'd love to see folks finding the same sort of energy and innovation which was driving early projects such as Momenta and PenPoint and so forth.

replies(2): >>44008096 #>>44009639 #

19. ActionHank ◴[16 May 25 14:56 UTC] No.44006271{3}[source]▶

>>44006013 #

I think that there will be ways to achieve this.

If you look at many of the current innovations around working with llms and agents, they are largely around constraining and tracking context in a structured way. There will likely be emergent patterns for these sorts of things over time, I am implementing my own approach for now with hopefully good abstractions to allow future portability.

20. wddlz ◴[16 May 25 15:04 UTC] No.44006374[source]▶

>>44004489 #

Sorry for the shameless plug but, we recently published this research on 'Dynamic Prompt Middleware' (https://www.iandrosos.me/images/chiwork25-27.pdf) as a potential approach for this. Basically, based on the user's prompt (and some other bits of context), we generate UX containing prompt refinements for users to quickly select answers to and do the prompting for the user.

replies(2): >>44006433 #>>44006960 #

21. fka ◴[16 May 25 15:10 UTC] No.44006433{3}[source]▶

>>44006374 #

Didn't read the paper but sounds like a similar idea.

22. fka ◴[16 May 25 15:12 UTC] No.44006450[source]▶

>>44005353 #

We don't do most of our jobs with our voice. "Click" interaction is still an important one.

replies(1): >>44006690 #

23. aatd86 ◴[16 May 25 15:32 UTC] No.44006690{3}[source]▶

>>44006450 #

there is no benefit in it being AI generated though. There is a closed set of interaction behaviors.

When you want to order a pizza, you won't have to click. Just browse and ask the AI assistant to place an order as you would in a restaurant. Better UX.

replies(1): >>44006704 #

24. fka ◴[16 May 25 15:34 UTC] No.44006704{4}[source]▶

>>44006690 #

Yep, that's why it's "on-demand". With LLMs, you won't need to fill the form, it's an optional interaction makes your UX process better. Please read the post and then comment :) You're possibly commenting on the title.

replies(1): >>44006771 #

25. aatd86 ◴[16 May 25 15:38 UTC] No.44006745{3}[source]▶

>>44005653 #

because you really think the AI can predict the perfect UX for human consumption out of the blue instead of simply using human made components?

AI or not won't change these sorts of UI too much.

26. aatd86 ◴[16 May 25 15:40 UTC] No.44006771{5}[source]▶

>>44006704 #

No I read the post. I had actually read it before I think even. But I am not convinced by the on demand part.

Isn't on demand what chat llms already do nowadays btw?

point being that generating visual UI components is easy. chatgpt does it. server driven UI does it.

But multimodal interaction is something else that goes further.

replies(1): >>44006875 #

27. fka ◴[16 May 25 15:49 UTC] No.44006875{6}[source]▶

>>44006771 #

Well, AI might ask you to choose a color. Now, is it better to show a color picker UI or just ask for the name?

You might say naming the color is enough, but in reality, a color picker is the more natural way to interact.

As humans, we don’t communicate only through words. Other forms of interaction matter too.

replies(1): >>44007055 #

28. ics ◴[16 May 25 15:56 UTC] No.44006960{3}[source]▶

>>44006374 #

Very neat paper, thanks for sharing. Being able to interact with a model through, say, Jupyter Notebook in this way would be amazing especially.

29. aatd86 ◴[16 May 25 16:04 UTC] No.44007055{7}[source]▶

>>44006875 #

Yes but the AI is not creating these components from zero is it (on demand part)?

It will probably have access to a list of components with their specifications, especially the type of data that the components allow to mutably (or not) represent.

Or respond to a query from a database by presenting a graph automatically.

But the hard part is to turn natural language into a sql query in my opinion. It's not really the choice of data representation which is heavily informed by the data itself (type and value) and doesn't require much inference.

replies(1): >>44007172 #

30. fka ◴[16 May 25 16:14 UTC] No.44007172{8}[source]▶

>>44007055 #

I still do think you haven’t read the post :D

31. wddlz ◴[16 May 25 16:39 UTC] No.44007424[source]▶

>>44003347 (OP) #

Related to this: Here is some recently published research we did at Microsoft Research on generating UX for prompt refinements based on the user prompt and other context (case study: https://www.iandrosos.me/promptly.html, paper link also in intro).

We found it lowered barriers to providing context to AI, improved user perception of control over AI, and provided users guidance for steering AI interactions.

32. bhj ◴[16 May 25 17:50 UTC] No.44008096{3}[source]▶

>>44006257 #

Yes, there’s a video where Michael Okuda (with Adam Savage, I think?) recalls the TNG cast being worried about where to tap, and his response was essentially “you can’t press a wrong button“.

33. sheo ◴[16 May 25 17:58 UTC] No.44008186[source]▶

>>44003347 (OP) #

I think that the example in the article is not a good usecase for this technology. It would be better, cheaper and less error prone to have prebuilt forms that LLM can call like tools, at least for things like changing shipping address

Shipping forms usually need verification of addresses, sometimes they even include a map

Especially if on the other end data that would be inputted in this form, would be stored in the traditional DB

Much better usecase would be use it in something, that is dynamic by nature. For example, advanced prompt generator for image generation models (sliders for size of objects in a scene; dropdown menus with variants of backgrounds or style, instead of usual lists)

replies(1): >>44009287 #

34. aziaziazi ◴[16 May 25 18:24 UTC] No.44008426[source]▶

>>44005677 #

> this is the future

For sure! UIs are also most of the past and present way to interact with a computer, off or online. Even Hacker News - which is mostly text - has some UI for to vote, navigate, flag…

Imagine the mess of a text-field-only interface where you had to type "upvote the upper ActionHank message" or "open the third article’ comments on the front page, the one that talks about On-demand UI generation…" then press enter.

Don’t get me wrong: LLMs are great and it’s fascinating to see experimentations with them. Kudos to the author.

35. jmull ◴[16 May 25 19:14 UTC] No.44008875[source]▶

>>44003347 (OP) #

This seems much worse than the typical pre-AI mechanism of navigating to and clicking on a "Change Delivery Address" button.

I don't know why you wouldn't develop whatever forms you wanted to support upfront and make them available to the agent (and hopefully provide old-fashioned search). You can still use AI to develop and maintain the forms. Since the output can be used as many times as you want, you can probably use more expensive/capable models to develop the forms rather than cheaper/faster but less capable models that you're probably limited to for customer service.

36. ◴[16 May 25 19:35 UTC] No.44009045[source]▶

>>44006098 #

37. cjcenizal ◴[16 May 25 20:01 UTC] No.44009287[source]▶

>>44008186 #

You make a good point! There are many common input configurations that will come up again and again, as forms and other types on input (like maps as you mentioned). How can we solve for that?

Maybe a solution would look like the server expression a more general intent -- "shipping address", and leaving it to the client to determine the best UI component for capturing that information. Then the server will need to do its own validation of the user's input, perhaps asking for confirmation that it understood correctly.

38. jFriedensreich ◴[16 May 25 20:41 UTC] No.44009639{3}[source]▶

>>44006257 #

thanks for bringing this up, totally forgot the connection even though i looked at it before and also remember the adam savage interview

↑