Most active commenters

ollin(7)

Popular/hot comments

>>43799541 #

World Emulation via Neural Network

(madebyoll.in)

1. quantumHazer ◴[25 Apr 25 23:09 UTC] No.43799333[source]▶

>>43798757 (OP) #

Is this a solo/personal project? If it is is indeed very cool.

Is OP the blog’s author? Because in the post the author said that the purpose of the project is to show why NN are truly special and I wanted a more articulate view of why he/she thinks that? Good work anyway!

replies(2): >>43799344 #>>43799549 #

2. treesciencebot ◴[25 Apr 25 23:11 UTC] No.43799344[source]▶

>>43799333 #

author is: https://x.com/madebyollin

3. alain94040 ◴[25 Apr 25 23:33 UTC] No.43799461[source]▶

>>43798757 (OP) #

Appreciate this article that shows some failures on the way to a great result. Too many times, people only show how the polished end-result: look, I trained this AI and it produces these great results. The world dissolving was very interesting to see, even if I'm not sure I understand how it got fixed.

replies(1): >>43799620 #

4. puchatek ◴[25 Apr 25 23:46 UTC] No.43799541[source]▶

>>43798757 (OP) #

This is great but I think I'll stick to mushrooms.

replies(3): >>43800019 #>>43800771 #>>43801285 #

5. ollin ◴[25 Apr 25 23:47 UTC] No.43799549[source]▶

>>43799333 #

Yes! This was a solo project done in my free time :) to learn about WMs and get more practice training GANs.

The special aspect of NNs (in the context of simulating worlds) is that NNs can mimic entire worlds from videos alone, without access to the source code (in the case of pokemon) or even without the source code having existed (as is the case for the real-world forest trail mimicked in this post). They mimic the entire interactive behavior of the world, not just the geometry (note e.g. the not-programmed-in autoexposure that appears when you look at the sky).

Although the neural world in the post is a toy project, and quite far from generating photorealistic frames with "trees that bend in the wind, lilypads that bob in the rain, birds that sing to each other", I think getting better results is mostly a matter of scale. See e.g. the GAIA-2 results (https://wayve.ai/wp-content/uploads/2025/03/generalisation_0..., https://wayve.ai/wp-content/uploads/2025/03/unsafe_ego_01_le...) for an example of what WMs can do without the realtime-rendering-in-a-browser constraints :)

replies(2): >>43799900 #>>43801326 #

6. ollin ◴[25 Apr 25 23:57 UTC] No.43799620[source]▶

>>43799461 #

Thanks! My favorite failure mode (not mentioned in the post - I think it was during the first round of upgrades?) was a "dry" form of soupification where the texture detail didn't fully disappear https://imgur.com/c7gVRG0

7. throwaway314155 ◴[26 Apr 25 00:19 UTC] No.43799763[source]▶

>>43798757 (OP) #

Really cool. How much compute did you require to successfully train these models? Is it in the ballpark of something you could do with a single gaming GPU? Or did you spin up something fancier?

edit: I see now that you mention a pricepoint of 100 GPU-hours/roughly 100$. My mistake.

8. janalsncm ◴[26 Apr 25 00:44 UTC] No.43799900{3}[source]▶

>>43799549 #

You mentioned it took 100 gpu hours, what gpu did you train on?

replies(1): >>43800142 #

9. bitwize ◴[26 Apr 25 00:46 UTC] No.43799904[source]▶

>>43798757 (OP) #

I want to see a spiritual successor to LSD: Dream Emulator based on this.

https://en.m.wikipedia.org/wiki/LSD:_Dream_Emulator

10. udia ◴[26 Apr 25 00:51 UTC] No.43799933[source]▶

>>43798757 (OP) #

Very nice work. Seems very similar to the Oasis Minecraft simulator.

https://oasis.decart.ai/

replies(1): >>43800231 #

11. AndrewKemendo ◴[26 Apr 25 01:03 UTC] No.43800004[source]▶

>>43798757 (OP) #

I think this is very interesting because you seem to have reinvented NeRF, if I’m understanding it correctly. I only did one pass through but it looks at first glance like a different approach entirely.

More interesting is that you made an easy to use environment authoring tool that (I haven’t tried it yet) seems really slick.

Both of those are impressive alone but together that’s very exciting.

replies(1): >>43801073 #

12. bongodongobob ◴[26 Apr 25 01:07 UTC] No.43800019[source]▶

>>43799541 #

Yeah, the similarities to psychedelics with some of this stuff is remarkable.

replies(1): >>43800737 #

13. tehsauce ◴[26 Apr 25 01:30 UTC] No.43800121[source]▶

>>43798757 (OP) #

I love this! Your results seem comparable to the counter strike or minecraft models from a bit ago with massively less compute and data. It's particularly cool that it uses real world data. I've been wanting to do something like this for a while, like capturing a large dataset while backpacking in the cascades :)

I didn't see it in an obvious place on your github, do you have any plans to open source the training code?

14. ilaksh ◴[26 Apr 25 01:32 UTC] No.43800133[source]▶

>>43798757 (OP) #

This seems incredibly powerful.

Imagine a similar technique but with productivity software.

And a pre-trained network that adapts quickly.

15. ollin ◴[26 Apr 25 01:36 UTC] No.43800142{4}[source]▶

>>43799900 #

Mostly 1xA10 (though I switched to 1xGH200 briefly at the end, lambda has a sale going). The network used in the post is very tiny, but I had to train a really long time w/ large batch to get somewhat-stable results.

16. ollin ◴[26 Apr 25 01:54 UTC] No.43800231[source]▶

>>43799933 #

Yup, definitely similar! There are a lot of video-game-emulation World Models floating around now, https://worldarcade.gg had a list. In the self-driving & robotics literature there have also been many WMs created for policy training and evaluation. I don't remember a prior WM built on first-person cell-phone video, but it's a simple enough concept that someone has probably done it for a student project or something :)

17. ilaksh ◴[26 Apr 25 03:41 UTC] No.43800737{3}[source]▶

>>43800019 #

It makes me think that maybe our visual perception is similar to what this program is doing in some ways.

I wonder if there are any computer vision projects that take a similar world emulation approach?

Imagine you collected the depth data also.

replies(1): >>43802006 #

18. LoganDark ◴[26 Apr 25 03:48 UTC] No.43800771[source]▶

>>43799541 #

For some reason, psilocybin causes me to randomly just lose consciousness, and LSD doesn't. Weird stuff.

19. gitroom ◴[26 Apr 25 04:58 UTC] No.43801069[source]▶

>>43798757 (OP) #

Gotta say, Ive always wanted to try building something like this myself. That kind of grind pays off way more than shiny announcements imo.

20. bjornsing ◴[26 Apr 25 04:59 UTC] No.43801073[source]▶

>>43800004 #

NeRF is a more complex and constrained approach, based on a kind of ray tracing. But results are obviously similar.

replies(1): >>43802755 #

21. bjornsing ◴[26 Apr 25 05:00 UTC] No.43801077[source]▶

>>43798757 (OP) #

What used to be cutting edge research not so long ago is now a fun hobby project. I love it.

22. ulrikrasmussen ◴[26 Apr 25 05:56 UTC] No.43801285[source]▶

>>43799541 #

I also thought those wooden guard rails looked pretty spot on how they would look on 2C-B. The only thing that's missing is the overlay of geometric patterns on even surfaces.

23. attilakun ◴[26 Apr 25 06:08 UTC] No.43801326{3}[source]▶

>>43799549 #

Amazing project. This has the same feel as Karpathy’s classic “The Unreasonable Effectiveness of Recurrent Neural Networks” blog post. I think in 10 years’ time we will look back and say “wow, this is how it started.”

24. Valk3_ ◴[26 Apr 25 06:21 UTC] No.43801365[source]▶

>>43798757 (OP) #

This might be a vague question, but what kind of intuition or knowledge do you need to work with these kind of things, say if you want to make your own model? Is it just having experience with image generation and trying to incorporate relevant inputs that you would expect in a 3D world, like the control information you added for instance?

replies(1): >>43806473 #

25. nopakos ◴[26 Apr 25 07:15 UTC] No.43801565[source]▶

>>43798757 (OP) #

Next we should try "Excel emulation via Neural Network". We get rid of a lot of intermediate steps, calculations, user interface etc!

What could go wrong?

Jokes aside, this is insanely cool!

replies(1): >>43801815 #

26. ◴[26 Apr 25 07:47 UTC] No.43801681[source]▶

>>43798757 (OP) #

27. downboots ◴[26 Apr 25 08:17 UTC] No.43801815[source]▶

>>43801565 #

or for a large dataset of math identities and have the user draw one side

28. titouanch ◴[26 Apr 25 08:37 UTC] No.43801893[source]▶

>>43798757 (OP) #

This is very impressive for a hobby project. I was wondering if you were planning to release the source code. Being able to create client-hosted, low-requirement neural networks for world generation could be really useful for game dev or artistic projects.

replies(1): >>43802560 #

29. das_keyboard ◴[26 Apr 25 08:44 UTC] No.43801938[source]▶

>>43798757 (OP) #

> So, if traditional game worlds are paintings, neural worlds are photographs. Information flows from sensor to screen without passing through human hands.

I don't get this analogy at all. Instead of a human information flows through a neural network which alters the information.

> Every lifelike detail in the final world is only there because my phone recorded it.

I might be wrong here but I don't think this is true. It might also be there because the network inferred that it is there based on previous data.

Imo this just takes the human out of a artistic process - creating video game worlds and I'm not sure if this is worth archiving.

replies(2): >>43801996 #>>43805318 #

30. Imanari ◴[26 Apr 25 08:48 UTC] No.43801957[source]▶

>>43798757 (OP) #

Amazing work. Could you elaborate on the model architecture and the process that lead you to using this architecture?

replies(1): >>43802248 #

31. ajb ◴[26 Apr 25 09:01 UTC] No.43801996[source]▶

>>43801938 #

>I don't get this analogy at all. Instead of a human information flows through a neural network which alters the information.

These days most photos are also stored using lossy compression which alters the information.

You can think of this as a form of highly lossy compression of an image of this forest in time and space.

Most lossy compression is 'subtractive' in that detail is subtracted from the image in order to compress it, so the kind of alterations are limited. However there have been previous non-subtractive forms of compression (eg, fractal compression) that have been criticised on the basis of making up details, which is certainly something that a neural network will do. However if the network is only trained on this forest data, rather than being also trained on other data and then fine tuned, then in some sense it does only represent this forest rather than giving an 'informed impression' like a human artist would.

replies(1): >>43805118 #

32. voidspark ◴[26 Apr 25 09:03 UTC] No.43802006{4}[source]▶

>>43800737 #

Yes the model is a U-Net, which is a type of Convolutional Neural Network (CNN), which is inspired by the structure of the visual cortex.

https://en.wikipedia.org/wiki/Convolutional_neural_network#H...

33. montebicyclelo ◴[26 Apr 25 09:28 UTC] No.43802128[source]▶

>>43798757 (OP) #

Awesome work / demo / blog

Link to the demo in case people miss it [1]

> using a customized camera app which also recorded my phone’s motion

Using phone's gyro as a proxy for "controls" is very clever

[1] https://madebyoll.in/posts/world_emulation_via_dnn/demo/

34. Macuyiko ◴[26 Apr 25 09:56 UTC] No.43802248[source]▶

>>43801957 #

The model seems to be viewable here:

https://netron.app/?url=https://madebyoll.in/posts/world_emu...

35. thenthenthen ◴[26 Apr 25 11:04 UTC] No.43802560[source]▶

>>43801893 #

Yes please! I would love to try and use this on disappearing neighbourhoods, the results are so dreamlike, or like memories!

36. AndrewKemendo ◴[26 Apr 25 11:45 UTC] No.43802755{3}[source]▶

>>43801073 #

Right which is why i said it’s an entirely different approach but results in almost the same kind of output

37. stormfather ◴[26 Apr 25 13:10 UTC] No.43803293[source]▶

>>43798757 (OP) #

Its a time capsule, among other things. I want to take many, many videos of my grandpa's farm, and be able to walk around in it in VR using something like this in the future.

replies(1): >>43807713 #

38. alekseiprokopev ◴[26 Apr 25 13:42 UTC] No.43803551[source]▶

>>43798757 (OP) #

It would be quite interesting to try to mess with the neural representations do add or remove the images of some objects there. I'm also curious if the topology of the actual place is similar to the topology of the embedding space.

39. andai ◴[26 Apr 25 16:42 UTC] No.43805118{3}[source]▶

>>43801996 #

>These days most photos are also stored using lossy compression which alters the information.

I noticed this in some photos I see online starting maybe 5-10 years ago.

I'd click through to a high res version of the photo, and instead of sensor noise or jpeg artefacts, I'd see these bizarre snakelike formations, as though the thing had been put through style transfer.

40. Legend2440 ◴[26 Apr 25 17:08 UTC] No.43805318[source]▶

>>43801938 #

>It might also be there because the network inferred that it is there based on previous data.

There is no previous data. This network is exclusively trained on the data he collected from the scene.

41. ollin ◴[26 Apr 25 19:40 UTC] No.43806473[source]▶

>>43801365 #

I think https://diamond-wm.github.io is a reasonable place to start (they have public world-model training code, and people have successfully adapted their codebase to other games e.g. https://derewah.dev/projects/ai-mariokart). Most modern world models are essentially image generators with additional inputs (past-frames + controls) added on, so understanding how Diffusion/IADB/Flow Matching work would definitely help.

replies(1): >>43807374 #

42. Valk3_ ◴[26 Apr 25 21:28 UTC] No.43807374{3}[source]▶

>>43806473 #

Thanks!

43. foxglacier ◴[26 Apr 25 22:17 UTC] No.43807713[source]▶

>>43803293 #

You can do it using the more classic technique of photogrammetry. There are commercial products used by real estate salesmen to produce high quality "games" where you walk around inside a house, but they're more like Google Streetview where you swoosh between points where a 360 degree photo was taken. All those things will be more faithful than neurally generating next frames based on previous frames and control input.

44. Jotalea ◴[28 Apr 25 11:35 UTC] No.43820125[source]▶

>>43798757 (OP) #

It's a really interesting project, reminds me of the 360° videos I used to watch on my phone, back in 2015.

But there's one thing that I'm a little bit worried about: I was getting like 8 stable FPS on my 3 years old flagship phone. My concern is that these models are not optimized to run on this type of hardware, which may or may not lead to hardware obsolescence quicker than planned. And it's not like these aren't powerful, they really are.

replies(1): >>43822024 #

45. ollin ◴[28 Apr 25 14:40 UTC] No.43822024[source]▶

>>43820125 #

Curious, which device/OS/browser? I did all my testing on 4-year old hardware (iPhone 13 Pro, M1 Pro MBP), and the model itself is extremely tiny (~1GFLOP) so I'm optimistic that performance issues would be solvable with a better software stack (e.g. native app).

replies(1): >>43828019 #

46. Jotalea ◴[29 Apr 25 01:55 UTC] No.43828019{3}[source]▶

>>43822024 #

I was on my Samsung Galaxy S21FE (Snapdragon 888), on the latest version of the Firefox browser for Android (138.0), on One UI 6.1 (Android 14). It is possibly the most powerful device I own, that's why I was concerned.

replies(1): >>43834437 #

47. ollin ◴[29 Apr 25 16:00 UTC] No.43834437{4}[source]▶

>>43828019 #

Got it, that makes sense! In terms of raw compute capability, a Snapdragon 888's GPU should have more than enough power to run this demo smoothly. I think I just need to optimize the inference setup better (maybe switch to WebGPU if the platform supports it?) and do targeted testing on Firefox/Android.

↑