←back to thread

497 points samplank2 | 10 comments | | HN request time: 0.63s | source | bottom

Hi HN! I love imagining the past, so I made Time Portal, a game where you are dropped into a historical event and see AI video footage from that moment. You have to guess where you are in time and on the map. It’s like GeoGuessr (and heavily inspired by it!) but for historical events.

The videos are all created with AI. It’s a pipeline of Flux (images), Kling (video), and mmaudio (audio). The videos aren’t always historically accurate to the last detail. They might incorporate elements of folklore or have details from popular beliefs about the way things looked rather than the latest academic research on how they looked.

I’m thinking a lot about how to make the game more interactive. One thing that makes Geoguessr so fun for me is that you can move infinitely and always find more details to help you pinpoint the location. I want Time Portal to have a similar quality. I have a few ideas to try soon that will hopefully make the game more interactive and infinite.

1. lukev ◴[] No.43348450[source]
yeah it's a cool concept, but knowing what I know about the ability of generative AI to accurately replicate specific moments of history, it falls flat.

The whole point of this kind of thing should be to reward people who can recognize "that architectural style wasn't invented until the 13th century" but that's precisely the sort of thing image models cannot do reliably.

replies(1): >>43348481 #
2. samplank2 ◴[] No.43348481[source]
I agree that it's not possible to have them do 13th century architectural style perfectly right now. But I believe it will be soon. The image/video models are improving, but so are the reasoning models, and they can check for and fix anachronisms.
replies(4): >>43348538 #>>43348539 #>>43348558 #>>43349657 #
3. lukev ◴[] No.43348538[source]
I hope you're right. Are you aware of any image-gen models that apply chain-of-thought style reasoning (either agentic or via reinforcment learning to shape outputs?)

For example, consider this imagery from today's challenge: https://firebasestorage.googleapis.com/v0/b/fastab-f08e9.app...

These are some incredible monoliths: if they were real, I feel like I would have heard about them? And if they did... that's so cool. But because it's AI generated, I have a very low confidence level that this ever existed at all. Which is sad.

replies(2): >>43348612 #>>43352026 #
4. nl ◴[] No.43348539[source]
Reasoning models aren't needed for this. The loss function for the image models needs to take year into account.

This is entirely possible, as the incredible accuracy[1] of non-generative picture location models (a very similar problem) shows.

[1] https://paperswithcode.com/sota/image-based-localization-on-...

5. littlestymaar ◴[] No.43348558[source]
Why not using img2vid starting from an historically accurate picture or painting?
replies(1): >>43348841 #
6. samplank2 ◴[] No.43348612{3}[source]
No, not aware of image models that do chain-of-thought reasoning. But there are vision models that do it, so you can have them review the generated images and iterate on the prompts.
7. samplank2 ◴[] No.43348841{3}[source]
This does use img2vid but with AI generated images. Using real pictures or paintings could definitely be fun too.
8. peishang ◴[] No.43349657[source]
You might look into era specific LoRas if they exist, and if not consider training a few to help better capture architectural detail from that specific time frame.
replies(1): >>43349726 #
9. samplank2 ◴[] No.43349726{3}[source]
good idea! It would be fun to have a ton of LoRas for different places x eras
10. tralarpa ◴[] No.43352026{3}[source]
[Spoiler] I guess it's this: https://madainproject.com/northern_stelae_park

Which is funny, because the monoliths in the AI video look more eroded than the real ones today.

This looked like a nice idea at first glance. At second glance, it's really bad because you have to assume that everything you see in these videos can be wrong or misleading.