←back to thread

1122 points felixrieseberg | 1 comments | | HN request time: 0.307s | source
Show context
jl6 ◴[] No.43907000[source]
IIRC correctly, Clippy’s most famous feature was interrupting you to offer advice. The advice was usually basic/useless/annoying, hence Clippy’s reputation, but a powerful LLM could actually make the original concept work. It would not be simply a chatbot that responds to text, but rather would observe your screen, understand it through a vision model, and give appropriate advice. Things like “did you know there’s an easier way to do what you’re doing”. I don’t think the necessary trust exists yet to do this using public LLM APIs, nor does the hardware to do it locally, but crack either of those and I could see ClipGPT being genuinely useful.
replies(10): >>43907133 #>>43907138 #>>43907168 #>>43907265 #>>43907418 #>>43907981 #>>43908398 #>>43908908 #>>43909895 #>>43913051 #
vunderba ◴[] No.43907133[source]
We are probably getting closer to that with the newer multimodal LLMs, but you'd almost need to take a screenshot on intervals fed directly to the LLM to provide a sort of chronological context to help it understand what the user is trying to do and gauge the users intentions.

As you say though, I don't know how many people would be comfortable having screenshots of their computer sent arbitrarily to a non-local LLM.

replies(5): >>43907196 #>>43907413 #>>43907760 #>>43908782 #>>43913893 #
1. walrus01 ◴[] No.43907413[source]
I think we're well into the paradigm of "hidden employee activity monitoring software" already taking periodic screenshots and sending it to an LLM somewhere, which then generates aggregate performance metrics and dashboards for managers. I've heard of multiple companies working on this for $bigcorp environments, customer service/call center workstation PCs, etc.