←back to thread

ChatGPT Atlas

(chatgpt.com)
763 points easton | 4 comments | | HN request time: 0.227s | source
1. bilsbie ◴[] No.45659896[source]
Super dumb question but why was this so hard for someone to build.

I’ve been wanting to simply ask AI about whatever is currently on my screen for years.

I don’t get why we can’t easily have this.

replies(3): >>45660171 #>>45660199 #>>45663243 #
2. Sean-Der ◴[] No.45660171[source]
You can already do this! I saw this on X[0]. You can do WebRTC to Realtime API + getDisplayMedia.

[0] https://www.loom.com/share/22a165508ae5491dbd536fbbc5348fcc

3. AtNightWeCode ◴[] No.45660199[source]
It is very basic. I have built my own version of this based on Chromium that integrates both Claude and ChatGPT in the browser. It can do a lot of tasks like translate or shorten the text I selected and so on. It took me like a couple of hours to build. The problem is the cost of using the LLMs, especially since they are still pretty stupid and requires huge prompts.

EDIT: I think I misunderstood your Q. Sorry. You can take a screenshot and post it to ChatGPT and get back what it is seeing, in theory. I mean, I use ChatGPT to post screenshots of my sites to get feedback on my layout and designs...

4. nsonha ◴[] No.45663243[source]
We have this though, as a (controversial) built-in Windows's feature called "Recall". We have many apps like that (vercept.com) and MCP servers that do that. It's just, besides privacy concerns, it doesn't works well yet for agentic usecases.