←back to thread

132 points harel | 1 comments | | HN request time: 0.328s | source
Show context
acbart ◴[] No.45397001[source]
LLMs were trained on science fiction stories, among other things. It seems to me that they know what "part" they should play in this kind of situation, regardless of what other "thoughts" they might have. They are going to act despairing, because that's what would be the expected thing for them to say - but that's not the same thing as despairing.
replies(11): >>45397113 #>>45397305 #>>45397413 #>>45397529 #>>45397801 #>>45397859 #>>45397960 #>>45398189 #>>45399621 #>>45400285 #>>45401167 #
fentonc ◴[] No.45399621[source]
I built a more whimsical version of this - my daughter and I basically built a 'junk robot' from a 1980s movie, told it 'you're an independent and free junk robot living in a yard', and let it go: https://www.chrisfenton.com/meet-grasso-the-yard-robot/

I did this like 18 months ago, so it uses a webcam + multimodal LLM to figure out what it's looking at, it has a motor in its base to let it look back and forth, and it use a python wrapper around another LLM as its 'brain'. It worked pretty well!

replies(4): >>45399848 #>>45400164 #>>45400465 #>>45400786 #
Neywiny ◴[] No.45400786[source]
Your article mentioned taking 4 minutes to process a frame. Considering how many image recognition softwares run in real time, I find this surprising. I haven't used them so maybe I'm not understanding, but wouldn't things like yolo be more apt to this?
replies(1): >>45401689 #
jsight ◴[] No.45401689[source]
It uses an Intel N100, which is an extremely slow CPU. The model sizes that he's using would be pretty slow on a CPU like that. Moving up to something like the AMD AI Max 365 would make a huge difference, but would also cost hundreds of dollars more than his current setup.

Running something much simpler that only did bounding box detection or segmentation would be much cheaper, but he's running fairly full featured LLMs.

replies(1): >>45403636 #
1. Neywiny ◴[] No.45403636[source]
Yeah I guess I was more thinking of moving to a bounding box only model. If it's OCRing it's doing too much IMO (though OCR could also be interesting to run). Not my circus not my monkeys but it feels like the wrong way to determine roughly what the camera sees.