←back to thread

301 points SerCe | 1 comments | | HN request time: 0.001s | source
Show context
lelag ◴[] No.43113916[source]
Really interesting model, I'm looking forward to play with it.

But what I want is a multimodal agent model capable of generating embeddings for a humanoid control model like Meta motivo[0] rather than directly outputting coordinates.

Meta motivo is still a toy model, trained on the SMPL skeleton, which lacks fingers which limits its capabilities beside having some fun with it. They could have used a more advanced based model, SMPL-X, which includes fingers, but there isn’t enough open motion data with precise finger motion to train a robust manipulation model anyway.

Most existing motion datasets come from academic motion capture setups, which are complex and not focused on manipulation tasks (and also pretty old). I believe this gap will be filled by improvements in 3D HPE from 2D video. With access to thousands of hours of video, we can build large-scale motion datasets covering a wide range of real-world interactions.

This will enable training the two components needed for dexterous humanoid robots: the agentic model that decides what actions to take and generates embeddings that can be read by a control model that accurately models hand and finger joint movement.

Given the rapid progress in the capabilities of SoTA 3D HPE from 2D video, and the vast amount of videos online (Youtube), I expect we will see humanoid robots with good manipulation capabilities it the not so distant future.

[0]: https://github.com/facebookresearch/metamotivo

replies(2): >>43114115 #>>43114172 #
michaelbuckbee ◴[] No.43114115[source]
Trying to wrap my head around this - are you saying that those models are trained around the concept of fingers (some kind of physical manipulators with set dimensions)?
replies(1): >>43114208 #
1. lelag ◴[] No.43114208[source]
The SMPL-x body model, a standard in this academic field does model fingers https://smpl-x.is.tue.mpg.de/

The issue is that there are much less dataset available for it than for the simplier SMPL model.

Regarding fingers, you already have "dumb" models like https://github.com/google-deepmind/mujoco_mpc which can control finger mouvement to achieve specific task.

Look at this video to see it action: https://www.youtube.com/watch?v=2xVN-qY78P4&t=387s

Pretty cool stuff.