Most active commenters

Popular/hot comments

>>46182946 #
>>46183206 #

←back to thread

Google Titans architecture, helping AI have long-term memory

(research.google)

Show context

okdood64 ◴[07 Dec 25 14:05 UTC] No.46181759[source]▶

>>46181231 (OP) #

From the blog:

https://arxiv.org/abs/2501.00663

https://arxiv.org/pdf/2504.13173

Is there any other company that's openly publishing their research on AI at this level? Google should get a lot of credit for this.

replies(12): >>46181829 #>>46182057 #>>46182168 #>>46182358 #>>46182633 #>>46183087 #>>46183462 #>>46183546 #>>46183827 #>>46184875 #>>46186114 #>>46189989 #

1. mapmeld ◴[07 Dec 25 15:02 UTC] No.46182168[source]▶

>>46181759 #

Well it's cool that they released a paper, but at this point it's been 11 months and you can't download a Titans-architecture model code or weights anywhere. That would put a lot of companies up ahead of them (Meta's Llama, Qwen, DeepSeek). Closest you can get is an unofficial implementation of the paper https://github.com/lucidrains/titans-pytorch

replies(7): >>46182351 #>>46182946 #>>46184154 #>>46185017 #>>46186942 #>>46187280 #>>46188385 #

2. informal007 ◴[07 Dec 25 15:25 UTC] No.46182351[source]▶

>>46182168 (TP) #

I don't think model code is a big deal compared to the idea. If public can recognize the value of idea 11 months ago, they could implement the code quickly because there are so much smart engineers in AI field.

replies(2): >>46182445 #>>46183173 #

3. jstummbillig ◴[07 Dec 25 15:37 UTC] No.46182445[source]▶

>>46182351 #

If that is true, does it follow this idea does not actually have a lot of value?

replies(2): >>46182827 #>>46183206 #

4. ◴[07 Dec 25 16:20 UTC] No.46182827{3}[source]▶

>>46182445 #

5. alyxya ◴[07 Dec 25 16:34 UTC] No.46182946[source]▶

>>46182168 (TP) #

The hardest part about making a new architecture is that even if it is just better than transformers in every way, it’s very difficult to both prove a significant improvement at scale and gain traction. Until google puts in a lot of resources into training a scaled up version of this architecture, I believe there’s plenty of low hanging fruit with improving existing architectures such that it’ll always take the back seat.

replies(5): >>46183227 #>>46184404 #>>46184696 #>>46186138 #>>46186853 #

6. mapmeld ◴[07 Dec 25 17:06 UTC] No.46183173[source]▶

>>46182351 #

Well we have the idea and the next best thing to official code, but if this was a big revelation where are all of the Titan models? If this were public, I think we'd have a few attempts at variants (all of the Mamba SSMs, etc.) and get a better sense if this is valuable or not.

7. fancy_pantser ◴[07 Dec 25 17:09 UTC] No.46183206{3}[source]▶

>>46182445 #

Student: Look, there’s hundred dollar bill on the ground! Economist: No there isn’t. If there were, someone would have picked it up already.

To wit, it's dangerous to assume the value of this idea based on the lack of public implementations.

replies(3): >>46183726 #>>46185648 #>>46190464 #

8. UltraSane ◴[07 Dec 25 17:11 UTC] No.46183227[source]▶

>>46182946 #

Yes. The path dependence for current attention based LLMs is enormous.

replies(1): >>46184174 #

9. lukas099 ◴[07 Dec 25 18:15 UTC] No.46183726{4}[source]▶

>>46183206 #

If the hundred dollar bill was in an accessible place and the fact of its existence had been transmitted to interested parties worldwide, then yeah, the economist would probably be right.

10. root_axis ◴[07 Dec 25 19:07 UTC] No.46184154[source]▶

>>46182168 (TP) #

I don't think the comparison is valid. Releasing code and weights for an architecture that is widely known is a lot different than releasing research about an architecture that could mitigate fundamental problems that are common to all LLM products.

11. patapong ◴[07 Dec 25 19:10 UTC] No.46184174{3}[source]▶

>>46183227 #

At the same time, there is now a ton of data for training models to act as useful assistants, and benchmarks to compare different assistant models. The wide availability and ease of obtaining new RLHF training data will make it more feasible to build models on new architectures I think.

12. p1esk ◴[07 Dec 25 19:36 UTC] No.46184404[source]▶

>>46182946 #

Until google puts in a lot of resources into training a scaled up version of this architecture

If Google is not willing to scale it up, then why would anyone else?

replies(1): >>46187379 #

13. tyre ◴[07 Dec 25 20:11 UTC] No.46184696[source]▶

>>46182946 #

Google is large enough, well-funded enough, and the opportunity is great enough to run experiments.

You don't necessarily have to prove it out on large foundation models first. Can it beat out a 32b parameter model, for example?

replies(1): >>46185008 #

14. swatcoder ◴[07 Dec 25 20:48 UTC] No.46185008{3}[source]▶

>>46184696 #

Do you think there might be an approval process to navigate when experiments costs might run seven or eight digits and months of reserved resources?

While they do have lots of money and many people, they don't have infinite money and specifically only have so much hot infrastructure to spread around. You'd expect they have to gradually build up the case that a large scale experiment is likely enough to yield a big enough advantage over what's already claiming those resources.

replies(2): >>46189610 #>>46191181 #

15. innagadadavida ◴[07 Dec 25 20:49 UTC] No.46185017[source]▶

>>46182168 (TP) #

Just keep in mind it is performance review time for all the tech companies. Their promotion of these seems to be directly correlated with that event.

16. NavinF ◴[07 Dec 25 22:02 UTC] No.46185648{4}[source]▶

>>46183206 #

That day the student was the 100th person to pick it up, realize it's fake, and drop it

17. nickpsecurity ◴[07 Dec 25 22:52 UTC] No.46186138[source]▶

>>46182946 #

But, it's companies like Google that made tools like Jax and TPU's saying we can throw together models with cheap, easy scaling. Their paper's math is probably harder to put together than an alpha-level prototype which they need anyway.

So, I think they could default on doing it for small demonstrators.

18. m101 ◴[08 Dec 25 00:15 UTC] No.46186853[source]▶

>>46182946 #

Prove it beats models of different architectures trained under identical limited resources?

19. SilverSlash ◴[08 Dec 25 00:29 UTC] No.46186942[source]▶

>>46182168 (TP) #

The newer one is from late May: https://arxiv.org/abs/2505.23735

20. AugSun ◴[08 Dec 25 01:27 UTC] No.46187280[source]▶

>>46182168 (TP) #

Gemini 3 _is_ that architecture.

replies(1): >>46187485 #

21. 8note ◴[08 Dec 25 01:44 UTC] No.46187379{3}[source]▶

>>46184404 #

chatgpt is an example on why.

replies(1): >>46193265 #

22. FpUser ◴[08 Dec 25 02:03 UTC] No.46187485[source]▶

>>46187280 #

I've read many very positive reviews about Gemini 3. I tried using it including Pro and to me it looks very inferior to ChatGPT. What was very interesting though was when I caught it bullshitting me I called its BS and Gemini expressed very human like behavior. It did try to weasel its way out, degenerated down to "true Scotsman" level but finally admitted that it was full of it. this is kind of impressive / scary.

23. mupuff1234 ◴[08 Dec 25 04:39 UTC] No.46188385[source]▶

>>46182168 (TP) #

> it's been 11 months

Is that supposed to be a long time? Seems fair that companies don't rush to open up their models.

24. dpe82 ◴[08 Dec 25 08:01 UTC] No.46189610{4}[source]▶

>>46185008 #

I would imagine they do not want their researchers unnecessarily wasting time fighting for resources - within reason. And at Google, "within reason" can be pretty big.

replies(1): >>46190731 #

25. dotancohen ◴[08 Dec 25 09:58 UTC] No.46190464{4}[source]▶

>>46183206 #

In my opinion, a refined analogy would be:

Student: Look, a well known financial expert placed what could potentially be a hundred dollar bill on the ground, other well-known financial experts just leave it there!

26. howdareme ◴[08 Dec 25 10:34 UTC] No.46190731{5}[source]▶

>>46189610 #

I mean looking antigravity, jules & gemini cli, they have have no problem with their developers fighting for resources

27. nl ◴[08 Dec 25 11:45 UTC] No.46191181{4}[source]▶

>>46185008 #

I mean you'd think so, but...

> In fact, the UL2 20B model (at Google) was trained by leaving the job running accidentally for a month.

https://www.yitay.net/blog/training-great-llms-entirely-from...

28. falcor84 ◴[08 Dec 25 15:23 UTC] No.46193265{4}[source]▶

>>46187379 #

You think that this might be another ChatGPT/Docker/Hadoop case, where Google comes up with the technology but doesn't care to productize it?

↑