The current best neural networks only have around 60% success rates for small horizon tasks (think 10-20 seconds e.g. pick up apple). That is why there is so much cut-motions in this video. The future will be awesome but it will take time a lot of research still needs to happen (e.g. robust hands, tactile, how to even collect large scale data, RL).
"Building Figure won’t be an easy win; it will require decades of commitment and ingenuity."
"Our focus is on what we can achieve 5, 10, 20+ years from now, not the near-term wins."
At least it's not Musk's forever "next year".
Indeed, all the videos/examples are marketing pieces.
I would love to see a video like this "Logistics"[0] one, that shows this new iteration doing some household tasks. There is no way that it's not clunky and prone to all kinds of accidents and failures. Not that it's a bad thing - it would simply be nice to see.
Maybe they will do another video? Would love that.
The problem with the principled approach to high-uncertainty projects is that if you slowly execute on a sequential multi-year plan, you will almost certainly find out in year 9 that multiple of the late-stage tasks are much harder than you thought.
You just don't know ahead of the time. Just look at how many corporations and research labs had decades-long strategies to build human-like AI that went nowhere. And then some guys came up with a novel architecture and all of sudden, you can ask your computer to write an essay about penguins.
Musk's approach is that if you have an infinite supply of fresh grads who really believe in you and are willing to work crazy hours, giving them a "next year" deadline is more likely to give you what you want than telling them "here's your slow-paced project you're gonna be working on for the next decade". And I guess he thinks to himself that some of them are going to burn out, but it's a sacrifice he's willing to make.
We are nowhere near the same for autonomous robots, and it's not even funny. To continue to use the internet as an analogy for LLMs, we are pre-DARPANET, pre-ASCII, pre-transistor. We don't even have the sensors that would make safe household humanoid robots possible. Any theater from robot companies about trying to train a neural net based on motion capture is laughably foolish. At the current rate of progress, we are more than decades away.
Neural networks for motion control is very clearly resulting in some incredible capability in a relatively short amount of time vs. the more traditional control hierarchies used in something like Boston Dynamics. Look at Unitree's G1
https://www.youtube.com/shorts/mP3Exb1YC8o
https://www.youtube.com/watch?v=bPSLMX_V38E
It's like an agile idiot, very physically capable but no purpose.
The next domain is going to be incorporating goals and intent and short/long term chains of causality into the model, and for that it seems we're presently missing quite a bit usable training data. That will clearly evolve over time, as will the fidelity of simulations that can be used to train the model and the learned experience of deployed robots.
As someone who worked in the robotics industry, 90% of the demos and videos are cherry-picked, or even blatantly fake. That's why for any new robot in the market, my criteria is: Can I buy it? If it's affordable and the consumer can buy it and find it useful in day to day life, then this robot is useful and has potential; other than that, it's just an investor money grab PR hype.
I'm not sure that task needs a humanoid robot, but the ability to grab and manipulate all those packages and recover from failures is pretty good
The video shows several of glitches. From the comments:
14:18 the Fall
28:40 the Fall 2
41:23 the Fall 3
Also many of the packages on the left are there throughout the video.But then I think lots of this can be solved in software and having seen how LLMs have advanced in the last few years, I'd not be surprised to see these robots useful in 5 years.
This feels incredibly generous. I'm pretty sure his approach is that he needs to keep the hype cycle going for as long as possible. I also believe it's partially his willingness to believe his own bullshit.
I’m sure they could pretty easily spin up a site with 200 of these processing packages of most sizes (they have a limited number of standardized package sizes) nonstop. Remove ones that it gets right 99.99% of the time and keep training on the more difficult ones, the move to individual items.
Caveat: I have no idea what I’m talking about.
https://rodneybrooks.com/why-todays-humanoids-wont-learn-dex...
In short, he makes the case that unlike text and images, human dexterity is based on sensory inputs that we barely understand, that these robots don't have, and it will take a long time to get the right sensors in, get the right data recorded, and only then train them to the level of a human. He is very skeptical that they can learn from video-only data, which is what the companies are doing.
They for sure did not anticipate that the user would backflip into their robot and knock it (and himself) out :D
If you can make it look believable on camera for 15 seconds under controlled studio conditions... it's probable you can do it autonomously in 10-15 years. I don't think anyone is going to be casually buying these for their house by this time next year, but it certainly demonstrates what is realistically possible.
If they can provably make these things safe, it will have huge implications for in home care in advanced age, where instead of living in an assisted living home at $huge expense for 20+ years, you might be able to live on your own for most of that time.
I am cautiously optimistic.
Would asking the robot for a seahorse emoji leave you in a puddle of blood?
An industrial robot arm with air powered suction cups would do the trick... https://bostondynamics.com/products/stretch/ ...
... So the task they work best at is the task there is already cheaper better robots specialized for.
Tasks left for human "sorters" to do are:
- put packages on conveyor belt so the scanner can read the label (as done by the robot in the video)
- deal with damaged or unreadable packages that can't be processed automatically
- when a package gets jammed and forces the conveyor belt to stop, remove the offending package before restarting
- receive packages at the other end and load them into vehicles
Generally the difficulty with all of these is dealing with variability and humans act as variability absorbers so the machines can operate smoothly.
Like LLMs being used to pick values out of JSON objects when jq would do the job 1000x more efficiently.
This is what this whole field feels like right now. Let's spend lots of time and energy to create a humanoid robot to do the things humans already decided humans were inefficient at and solved with specialised tools.
Like people saying "oh it can wash my dishes for me". Well, I haven't washed dishes in years, there's a thing called a dishwasher which does one thing and does it well.
"Oh it can do the vacuuming". We have robot vacuums which already do that.
An obvious application, if this robot could do it, is retail store shelf restocking. That's a reasonably constrained pick and place task, some mobility is necessary, and the humanoid form is appropriate working in aisles and shelves spaced for humans. How close is that?
It's been tried before. In 2020.[1] And again in 2022.[2] That one runs on a track, is closer to an traditional industrial robot, and is used by 7-11 Japan.
Robots that just cruise around stores and inspect the shelves visually are in moderately wide use. They just compare the shelf images with the planogram; they don't handle the merchandise. So there are already systems to help plan the restocking task.
Technical University Delft says their group should be able to do this in five years.[3] (From when? No date on press release.)
[1] https://www.youtube.com/watch?v=cHgdW1HYLbM
[2] https://blogs.nvidia.com/blog/telexistence-convenience-store...
[3] https://www.tudelft.nl/en/stories/articles/shelf-stocking-ro...
Perhaps this is a bit pedantic, but what about the probable eventual proliferation of useful humanoid robots will make the future awesome? What does an awesome future look like compared to today, to you?
The fabric wrap is idiotic. Insanely stupid. Let's have an expensive fabric-covered robot wash dishes covered in food. Genius. It's a good thing those "dirty dishes" were already perfectly clean. I doubt this machine could handle anything more. Put it in a real commercial kitchen and have it scrape oven pans and I'll be impressed.
I'm so glad I left robotics. I don't want to have anything to do with this very silly bubble.
This only highlights the fact that making a cool prototype do a few cool things on video is far, far easier than making a commercial product that can consistently do these things reliably. It often takes decades to move from the former to the latter. And Figure hasn't even shown us particularly impressive things from its prototypes yet.
With a hefty subscription to make it do anything useful.
They already have. We just don't hold the perpetrators accountable.
If I could just type it into my shell, that would be nice. I’m sure there’s some command (or one could be trivially made) to evaluate an equation, but then you get to play game with shell expansions and quotes.
In emacs I have to convolute the equation into prefix.
All minor stuff but it adds up.
It's hard to find decent general purpose help these days and they would pay good money for a halfway useful helper.
Once it's able to weld... That's going to be a massive game changer, and I can see that coming 'round the corner right quickly.
bc
And the Unitree R1 already only costs $6k.
All the necessary pieces are aligning, very rapidly, and as James Burke has pointed out, that's when Connections happen.
And if you needed it programmable, well an FPGA was still almost as general and far more efficient than a microprocessor.
Guess what won.
in a world with 500 million humanoid robots, parts are plentiful, theyre easier to work on due to not weighing 5000 pounds, and like the other person said, economies of scale
All with much improved privacy, reliability, order of magnitude lower cost, no risk of robbery/SA, etc. 24/7 operation even on holidays. Imagine service staff just sitting waiting for you to need them, always and everywhere.
Nevermind how much human lifespan will be freed from the tyranny of these mindless jobs.
what it IS , however, is a remarkable achievement of commoditization; getting a toy like that with those kind of motors would have been prohibitively expensive anywhere else in the world; but much like the Chinese 20k EV, it's not really a reliable marker for the actual future; in fact bottomed out pricing is more-so an indicator of the phase of industrialization that country is in.
Only because it's not yet attached to a reasonable AI, which is my point. It's not going to do any heavy lifting, but it could easily do basic house chores like cleaning up, folding laundry, etc if it were. The actuators and body platform are there, and economies of scale already at work.
I guess some folks just can't or won't put 2 and 2 together to predict the near future.
It's Moore's law that largely drove what you describe.
Moore's law only applies to semiconductors.
Gears, motors and copper wire are not going to get 10x faster/cheaper every 18 months or whatever.
10 years from now gears will cost more, they will cost what they cost now plus inflation.
I've literally heard super smart YC founders say they just assume some sort of "Moore's law for hardware" will magicallyake their idea workable next year.
Computing power gets, and will continue to get, cheaper every day. Hardware, gears, nuts, bolts, doesnt.
An arm moving against gravity has a higher draw, the arc itself creates characteristics, a motion or force against the arm or fingers generates a change in draw -- a superintellligence would need only an ammeter to master proprioception, because human researchers can do this in a lab and they're nowhere near the bar of 'hypergenius superintelligence'.
I'm not surprised that a Honda Civic can't navigate the Dakar Rally route..
People keep parroting this line, but it's not a given, especially for such an ill-defined metric as "better". If I ask an LLM how its day was, there's no one right answer. (Users anthropomorphizing the LLM is a given these days, no matter how you may feel about that.)
They didn't go nowhere; they just didn't result in human-like AI. They gave us lots of breakthroughs, useful basic knowledge, and knowledge infrastructure that could be built off for related and unrelated projects. Plenty of shoot for the moon corporations didn't result in human-like AI either, but also probably did go nowhere, since they were focused on an all or nothing strategy. The ones that do succeed in a moonshot relied on those breakthroughs from decades-long research.
I'm not going to get into what Musk has been doing because I'm just not,
If you want to replace the human the best bet is to redesign the work so that it can be done with machine assistance, which is what we’ve been doing since the industrial revolution.
There’s a reason the motor car (which is the successful mass market personal transportation machine) doesn’t look anything like the horse that it replaced.
It is those things that are bottlenecking the price of robots.
The price of something tends towards the marginal cost, and the marginal cost of software is close to $0. Robots cost a lot more than that (what's the price of this robot?).
Edit: In fact Figure 03 imply marginal costs matter:
Mass manufacturing: Figure 03 was engineered from the ground-up for high-volume manufacturing
The essay was long so I cant claim I read it in detail - one q in my mind is whether humanoids need to do dexterity the same way that humans do. yes they dont have skin and tiny receptors but maybe there is another way to develop dexterity?
I am impressed by Unitree, but the problem that needs to be solved here is not just better software. Better hardware needs to come down in cost and weight to make the generalized robot argument more convincing.
There is still a long way to go for a humanoid to be a reasonable product, and that's not just a software issue.
Once software is "done" (we all know software is never done) you can just copy it and distribute it. It is negligiblehow much it costs to do so.
Once hardware is done you have to manufacture each and every piece of hardware with the same care, detail and reliability as the first one. You can't just click copy.
Often times you have to completely redesign the product to go from low volume high cost manufacturing to high volume low cost. A hand made McLaren is very different than an F-150.
The two simply scale differently, by nature of their beasts.
It's not quite startrek replicator but much closer to that than the US view of manufacturing where you have your union guy sitting in front of the machine to pull the lever.
If it's just checking or adding labels, it's silly to even use that.
You can control the happy path when the whole thing is your box.
Is it supposed to be taking packages and placing them label face down?
I cannot understand how a robot doing this is cheaper than a second scanner so you can read the label face down or face up. I mean you could do that with a mirror.
But I'm not convinced it is even doing that. Several packages are already "label side down" and it just moves them along. Do those packages even have labels? Clearly the behavior learned is "label not on top", not "label side down". No way is that the intended behavior.
If the bar code is the issue, then why not switch to a QR code or some other format? There's not much information you need in shipping so the QR code can have lots of redundancy, making it readable from many different angles and even if significantly damaged.
The video description also says "approaching human-level dexterity and speed". No way. I'd wager I could do this task at least 10x its speed, if not 20x. And that I'd do it better! I mean I watched a few minutes at 2x speed and man is it slow. Sure, this thing might be able to run 24/7 without breaks, but if I'm running 10-20x faster then what's that matter? I could just come in a few hours a day and blow through its quota. I'd really like to see an actual human worker for comparison.
But if we did want something to do this very narrow task for 24/7, I'm pretty sure there are a hundred different cheaper ways to do it. If there aren't, then it is because there is some edge cases that are pretty important. And without knowing that then we can't actually properly evaluate this video. Besides, this video seems like a pretty simple ideal case. I'm not sure what an actual amazon sorting process looks like, but I suspect not like this.
Regardless, the results look pretty cool and I'm pretty impressed with Figure even if it is an over-simplified case.