←back to thread

584 points Alifatisk | 2 comments | | HN request time: 0.517s | source
Show context
okdood64 ◴[] No.46181759[source]
From the blog:

https://arxiv.org/abs/2501.00663

https://arxiv.org/pdf/2504.13173

Is there any other company that's openly publishing their research on AI at this level? Google should get a lot of credit for this.

replies(12): >>46181829 #>>46182057 #>>46182168 #>>46182358 #>>46182633 #>>46183087 #>>46183462 #>>46183546 #>>46183827 #>>46184875 #>>46186114 #>>46189989 #
mapmeld ◴[] No.46182168[source]
Well it's cool that they released a paper, but at this point it's been 11 months and you can't download a Titans-architecture model code or weights anywhere. That would put a lot of companies up ahead of them (Meta's Llama, Qwen, DeepSeek). Closest you can get is an unofficial implementation of the paper https://github.com/lucidrains/titans-pytorch
replies(7): >>46182351 #>>46182946 #>>46184154 #>>46185017 #>>46186942 #>>46187280 #>>46188385 #
alyxya ◴[] No.46182946[source]
The hardest part about making a new architecture is that even if it is just better than transformers in every way, it’s very difficult to both prove a significant improvement at scale and gain traction. Until google puts in a lot of resources into training a scaled up version of this architecture, I believe there’s plenty of low hanging fruit with improving existing architectures such that it’ll always take the back seat.
replies(5): >>46183227 #>>46184404 #>>46184696 #>>46186138 #>>46186853 #
1. UltraSane ◴[] No.46183227[source]
Yes. The path dependence for current attention based LLMs is enormous.
replies(1): >>46184174 #
2. patapong ◴[] No.46184174[source]
At the same time, there is now a ton of data for training models to act as useful assistants, and benchmarks to compare different assistant models. The wide availability and ease of obtaining new RLHF training data will make it more feasible to build models on new architectures I think.