←back to thread

1122 points felixrieseberg | 1 comments | | HN request time: 0s | source
Show context
jl6 ◴[] No.43907000[source]
IIRC correctly, Clippy’s most famous feature was interrupting you to offer advice. The advice was usually basic/useless/annoying, hence Clippy’s reputation, but a powerful LLM could actually make the original concept work. It would not be simply a chatbot that responds to text, but rather would observe your screen, understand it through a vision model, and give appropriate advice. Things like “did you know there’s an easier way to do what you’re doing”. I don’t think the necessary trust exists yet to do this using public LLM APIs, nor does the hardware to do it locally, but crack either of those and I could see ClipGPT being genuinely useful.
replies(10): >>43907133 #>>43907138 #>>43907168 #>>43907265 #>>43907418 #>>43907981 #>>43908398 #>>43908908 #>>43909895 #>>43913051 #
vunderba ◴[] No.43907133[source]
We are probably getting closer to that with the newer multimodal LLMs, but you'd almost need to take a screenshot on intervals fed directly to the LLM to provide a sort of chronological context to help it understand what the user is trying to do and gauge the users intentions.

As you say though, I don't know how many people would be comfortable having screenshots of their computer sent arbitrarily to a non-local LLM.

replies(5): >>43907196 #>>43907413 #>>43907760 #>>43908782 #>>43913893 #
1. johnisgood ◴[] No.43908782{3}[source]
> I don't know how many people would be comfortable having screenshots of their computer sent arbitrarily to a non-local LLM

shudders.