Most active commenters
  • fka(11)
  • aatd86(5)
  • ActionHank(3)

69 points fka | 39 comments | | HN request time: 2.058s | source | bottom
1. exe34 ◴[] No.44004118[source]
I was hoping to do this over IRC but never got around to implementing it. I hate the idea of implementing a whole website/chat system, when they already exist. I'd like to use it for my (currently in-existent) home automation communication.
replies(1): >>44004409 #
2. maxcan ◴[] No.44004181[source]
Video isn't loading.
replies(1): >>44004291 #
3. casey2 ◴[] No.44004277[source]
If it could have been done it would have by now
replies(2): >>44004413 #>>44004733 #
4. fka ◴[] No.44004291[source]
I think it’s because of the video format.

https://x.com/fkadev/status/1923102445799927818?s=46

5. fka ◴[] No.44004409[source]
Perfect home automation never exists.
6. fka ◴[] No.44004413[source]
You can say this for all kind of inventions and new ideas.
7. utku1337 ◴[] No.44004470[source]
looks very useful
8. joshstrange ◴[] No.44004489[source]
Related, it’s crazy to me that OpenAI hasn’t already done something like this for Deep Research.

After your initial question, it always follows up asking some clarifying questions, but it’s completely up to the user to format their responses and I always wonder if people are sloppy if the LLM gets confused. It would make much more sense for OpenAI to break out each question and have a dedicated answer box. That way the user’s response can be consistent and there’s less of a chance they make a mistake or forget to answer a question.

replies(2): >>44004520 #>>44006374 #
9. fka ◴[] No.44004520[source]
OpenAI would implement this within a minute or smth I guess.
10. revskill ◴[] No.44004733[source]
Startups fo not have enough efforts to impriove ux, that is why we have jira.
11. aatd86 ◴[] No.44005353[source]
that's not a very innovative idea or even better UX. I think that the future wil have to do with voice commands and mcps will be the backend, exposing capabilities.
replies(2): >>44005653 #>>44006450 #
12. ActionHank ◴[] No.44005653[source]
Because we are all going to be in our open planned offices shouting into the void hoping it poops out the app we want?
replies(1): >>44006745 #
13. ActionHank ◴[] No.44005677[source]
I really believe this is the future.

Conversations are error prone and noisy.

UI distills down the mode of interaction into something defined and well understood by both parties.

Humans have been able to speak to each other for a long time, but we fill out forms for anything formal.

replies(3): >>44005809 #>>44006013 #>>44008426 #
14. banga ◴[] No.44005717[source]
Semantic clarity of written prose is hard, but this approach seems like making it easier for the machines rather than the other way around.
15. fka ◴[] No.44005809[source]
Exactly! LLMs can generate UIs according to user needs. E.g. it can generate simplified or translated ones, on-demand. No need for preset forms or long ones. Just the required ones.
16. visarga ◴[] No.44006013[source]
> Conversations are error prone and noisy.

I thought you'd say not being able to reload the form at a later time from the same URL is bad. This would be a "quantum UI" slightly different every time you load it.

replies(1): >>44006271 #
17. jFriedensreich ◴[] No.44006098[source]
I was working on exactly this in gpt 3 days and still believe ad hoc generation of super specifc and contextual relevant UIs will solve a lot of problems and friction that purely textual or speech based conversational interfaces pose especially if the UI elements like sliders provide some form of live feedback of their effect and are possible to scroll back to or pin and make changes anytime.
replies(2): >>44006257 #>>44009045 #
18. WillAdams ◴[] No.44006257[source]
This always felt like something which the LCARS interface addressed, at least conceptually (though I've never seen an implementation which was more than just a skin).

I'd love to see folks finding the same sort of energy and innovation which was driving early projects such as Momenta and PenPoint and so forth.

replies(2): >>44008096 #>>44009639 #
19. ActionHank ◴[] No.44006271{3}[source]
I think that there will be ways to achieve this.

If you look at many of the current innovations around working with llms and agents, they are largely around constraining and tracking context in a structured way. There will likely be emergent patterns for these sorts of things over time, I am implementing my own approach for now with hopefully good abstractions to allow future portability.

20. wddlz ◴[] No.44006374[source]
Sorry for the shameless plug but, we recently published this research on 'Dynamic Prompt Middleware' (https://www.iandrosos.me/images/chiwork25-27.pdf) as a potential approach for this. Basically, based on the user's prompt (and some other bits of context), we generate UX containing prompt refinements for users to quickly select answers to and do the prompting for the user.
replies(2): >>44006433 #>>44006960 #
21. fka ◴[] No.44006433{3}[source]
Didn't read the paper but sounds like a similar idea.
22. fka ◴[] No.44006450[source]
We don't do most of our jobs with our voice. "Click" interaction is still an important one.
replies(1): >>44006690 #
23. aatd86 ◴[] No.44006690{3}[source]
there is no benefit in it being AI generated though. There is a closed set of interaction behaviors.

When you want to order a pizza, you won't have to click. Just browse and ask the AI assistant to place an order as you would in a restaurant. Better UX.

replies(1): >>44006704 #
24. fka ◴[] No.44006704{4}[source]
Yep, that's why it's "on-demand". With LLMs, you won't need to fill the form, it's an optional interaction makes your UX process better. Please read the post and then comment :) You're possibly commenting on the title.
replies(1): >>44006771 #
25. aatd86 ◴[] No.44006745{3}[source]
because you really think the AI can predict the perfect UX for human consumption out of the blue instead of simply using human made components?

AI or not won't change these sorts of UI too much.

26. aatd86 ◴[] No.44006771{5}[source]
No I read the post. I had actually read it before I think even. But I am not convinced by the on demand part.

Isn't on demand what chat llms already do nowadays btw?

point being that generating visual UI components is easy. chatgpt does it. server driven UI does it.

But multimodal interaction is something else that goes further.

replies(1): >>44006875 #
27. fka ◴[] No.44006875{6}[source]
Well, AI might ask you to choose a color. Now, is it better to show a color picker UI or just ask for the name?

You might say naming the color is enough, but in reality, a color picker is the more natural way to interact.

As humans, we don’t communicate only through words. Other forms of interaction matter too.

replies(1): >>44007055 #
28. ics ◴[] No.44006960{3}[source]
Very neat paper, thanks for sharing. Being able to interact with a model through, say, Jupyter Notebook in this way would be amazing especially.
29. aatd86 ◴[] No.44007055{7}[source]
Yes but the AI is not creating these components from zero is it (on demand part)?

It will probably have access to a list of components with their specifications, especially the type of data that the components allow to mutably (or not) represent.

Or respond to a query from a database by presenting a graph automatically.

But the hard part is to turn natural language into a sql query in my opinion. It's not really the choice of data representation which is heavily informed by the data itself (type and value) and doesn't require much inference.

replies(1): >>44007172 #
30. fka ◴[] No.44007172{8}[source]
I still do think you haven’t read the post :D
31. wddlz ◴[] No.44007424[source]
Related to this: Here is some recently published research we did at Microsoft Research on generating UX for prompt refinements based on the user prompt and other context (case study: https://www.iandrosos.me/promptly.html, paper link also in intro).

We found it lowered barriers to providing context to AI, improved user perception of control over AI, and provided users guidance for steering AI interactions.

32. bhj ◴[] No.44008096{3}[source]
Yes, there’s a video where Michael Okuda (with Adam Savage, I think?) recalls the TNG cast being worried about where to tap, and his response was essentially “you can’t press a wrong button“.
33. sheo ◴[] No.44008186[source]
I think that the example in the article is not a good usecase for this technology. It would be better, cheaper and less error prone to have prebuilt forms that LLM can call like tools, at least for things like changing shipping address

Shipping forms usually need verification of addresses, sometimes they even include a map

Especially if on the other end data that would be inputted in this form, would be stored in the traditional DB

Much better usecase would be use it in something, that is dynamic by nature. For example, advanced prompt generator for image generation models (sliders for size of objects in a scene; dropdown menus with variants of backgrounds or style, instead of usual lists)

replies(1): >>44009287 #
34. aziaziazi ◴[] No.44008426[source]
> this is the future

For sure! UIs are also most of the past and present way to interact with a computer, off or online. Even Hacker News - which is mostly text - has some UI for to vote, navigate, flag…

Imagine the mess of a text-field-only interface where you had to type "upvote the upper ActionHank message" or "open the third article’ comments on the front page, the one that talks about On-demand UI generation…" then press enter.

Don’t get me wrong: LLMs are great and it’s fascinating to see experimentations with them. Kudos to the author.

35. jmull ◴[] No.44008875[source]
This seems much worse than the typical pre-AI mechanism of navigating to and clicking on a "Change Delivery Address" button.

I don't know why you wouldn't develop whatever forms you wanted to support upfront and make them available to the agent (and hopefully provide old-fashioned search). You can still use AI to develop and maintain the forms. Since the output can be used as many times as you want, you can probably use more expensive/capable models to develop the forms rather than cheaper/faster but less capable models that you're probably limited to for customer service.

36. ◴[] No.44009045[source]
37. cjcenizal ◴[] No.44009287[source]
You make a good point! There are many common input configurations that will come up again and again, as forms and other types on input (like maps as you mentioned). How can we solve for that?

Maybe a solution would look like the server expression a more general intent -- "shipping address", and leaving it to the client to determine the best UI component for capturing that information. Then the server will need to do its own validation of the user's input, perhaps asking for confirmation that it understood correctly.

38. jFriedensreich ◴[] No.44009639{3}[source]
thanks for bringing this up, totally forgot the connection even though i looked at it before and also remember the adam savage interview