(microsoft.github.io)

305 points SerCe | 1 comments | 20 Feb 25 02:11 UTC | HN request time: 0.242s | source

Show context

sorz ◴[20 Feb 25 08:48 UTC] No.43112514[source]▶

In the mug-scrubbing video, the person clearly pretends to wash the cup but does not seem to want to get their hands wet anyway. I'm curious as to when models can figure out that subtle thing.

replies(2): >>43112536 #>>43112906 #

1. funnyAI ◴[20 Feb 25 08:52 UTC] No.43112536[source]▶

>>43112514 #

It's all probabilistic, my guess. I.e. model produces probabilities for a set of actions from the same video. Even pretended action may look more like it than anything else. Thus getting higher probability.

↑

Magma: A foundation model for multimodal AI agents