We accidentally solved robotics by watching 1M hours of YouTube

(ksagar.bearblog.dev)

209 points alexcos | 2 comments | 29 Jun 25 16:08 UTC | HN request time: 0.412s | source

Show context

dchftcs ◴[30 Jun 25 03:53 UTC] No.44419191[source]▶

Pure vision will never be enough because it does not contain information about the physical feedback like pressure and touch, or the strength required to perform a task.

For example, so that you don't crush a human when doing massage (but still need to press hard), or apply the right amount of force (and finesse?) to skin a fish fillet without cutting the skin itself.

Practically in the near term, it's hard to sample from failure examples with videos on Youtube, such as when food spills out of the pot accidentally. Studying simple tasks through the happy path makes it hard to get the robot to figure out how to do something until it succeeds, which can appear even in relatively simple jobs like shuffling garbage.

With that said, I suppose a robot can be made to practice in real life after learning something from vision.

replies(4): >>44419561 #>>44419692 #>>44420011 #>>44426961 #

carlosdp ◴[30 Jun 25 06:16 UTC] No.44420011[source]▶

>>44419191 #

> Pure vision will never be enough because it does not contain information about the physical feedback like pressure and touch, or the strength required to perform a task.

I'm not sure that's necessarily true for a lot of tasks.

A good way to measure this in your head is this:

"If you were given remote control of two robot arms, and just one camera to look through, how many different tasks do you think you could complete successfully?"

When you start thinking about it, you realize there are a lot of things you could do with just the arms and one camera, because you as a human have really good intuition about the world.

It therefore follows that robots should be able to learn with just RGB images too! Counterexamples would be things like grabbing an egg without crushing, perhaps. Though I suspect that could also be done with just vision.

replies(9): >>44420219 #>>44420289 #>>44420630 #>>44420695 #>>44420919 #>>44421236 #>>44423275 #>>44425473 #>>44427030 #

suddenlybananas ◴[30 Jun 25 08:06 UTC] No.44420695[source]▶

>>44420011 #

Humans have innate knowledge that help them interact with the world and can learn from physical interaction for the rest. RGB images aren't enough.

replies(1): >>44420722 #

whatever1 ◴[30 Jun 25 08:12 UTC] No.44420722[source]▶

>>44420695 #

Video games have shown that we can control pretty darn well characters in virtual worlds where we have not experienced their physics. We just look at a 2D monitor and using a joystick/keyboard we manage to figure it out.

replies(2): >>44421108 #>>44421256 #

1. suddenlybananas ◴[30 Jun 25 09:11 UTC] No.44421108[source]▶

>>44420722 #

Yeah but we already have a conception of what physics should be prior to that that helps us enormously. It's not like game designers are coming up with stuff that intentionally breaks our naïve physics.

replies(1): >>44427148 #

2. godelski ◴[30 Jun 25 19:54 UTC] No.44427148[source]▶

>>44421108 (TP) #

I mean they do but we often have generalized (to some degree) world models. So when they do things like change gravity, flip things upside down, or even more egregious changes we can adapt. Because we have contractual counterfactual models. But yeah, they could change things so much that you'd really have to relearn and that could be very very difficult if not impossible (I wonder if anyone has created a playable game with a physics that's impossible for humans to learn, at least without "pen and paper". I think you could do this by putting the game in higher dimensions.)

↑