(gist.github.com)

262 points rain1 | 3 comments | 02 Jul 25 10:39 UTC | HN request time: 0.549s | source

Show context

kamranjon ◴[02 Jul 25 14:24 UTC] No.44444101[source]▶

>>44442072 (OP) #

This is somehow missing the Gemma and Gemini series of models from Google. I also think that not mentioning the T5 series of models is strange from a historical perspective because they sort of pioneered many of the concepts in transfer learning and kinda kicked off quite a bit of interest in this space.

replies(1): >>44444690 #

1. rain1 ◴[02 Jul 25 15:08 UTC] No.44444690[source]▶

>>44444101 #

The Gemma models are too small to be included in this list.

You're right the T5 stuff is very important historically but they're below 11B and I don't have much to say about them. Definitely a very interesting and important set of models though.

replies(2): >>44445159 #>>44448467 #

2. tantalor ◴[02 Jul 25 15:46 UTC] No.44445159[source]▶

>>44444690 (TP) #

> too small

Eh?

* Gemma 1 (2024): 2B, 7B

* Gemma 2 (2024): 2B, 9B, 27B

* Gemma 3 (2025): 1B, 4B, 12B, 27B

This is the same range as some Llama models which you do mention.

> important historically

Aren't you trying to give a historical perspective? What's the point of this?

3. kamranjon ◴[02 Jul 25 20:29 UTC] No.44448467[source]▶

>>44444690 (TP) #

Since you included GPT-2, everything from Google including T5 would qualify for the list I would think.

↑

How large are large language models?