Mercury: Ultra-fast language models based on diffusion

(arxiv.org)

Show context

Alifatisk ◴[07 Jul 25 20:49 UTC] No.44494543[source]▶

Love the ui in the playground, it reminds me of Qwen chat.

We have reached a point where the bottlenecks in genAI is not the knowledge or accuracy, it is the context window and speed.

Luckily, Google (and Meta?) has pushed the limits of the context window to about 1 million tokens which is incredible. But I feel like todays options are still stuck about ~128k token window per chat, and after that it starts to forget.

Another issue is the time time it takes for inference AND reasoning. dLLMs is an interesting approach at this. I know we have Groqs hardware aswell.

I do wonder, can this be combined with Groqs hardware? Would the response be instant then?

How many tokens can each chat handle in the playground? I couldn't find so much info about it.

Which model is it using for inference?

Also, is the training the same on dLLMs as on the standardised autoregressive LLMs? Or is the weights and models completely different?

replies(4): >>44495048 #>>44495371 #>>44496876 #>>44497661 #

kadushka ◴[07 Jul 25 22:48 UTC] No.44495371[source]▶

>>44494543 #

We have reached a point where the bottlenecks in genAI is not the knowledge or accuracy, it is the context window and speed.

You’re joking, right? I’m using o3 and it couldn’t do half of the coding tasks I tried.

replies(1): >>44499361 #

Alifatisk ◴[08 Jul 25 12:32 UTC] No.44499361[source]▶

>>44495371 #

I've been in similar situations, I've realized, you make these llms accomplish lots of difficult tasks if you prompt it correctly, and that is a form of art! If a colleague of mine, who's incredible at prompting, did impressive things with gpt-3, so I am sure o3 can do even more wilder stuff.

replies(1): >>44500021 #

kadushka ◴[08 Jul 25 13:57 UTC] No.44500021[source]▶

>>44499361 #

What did he do with gpt-3?

replies(1): >>44500159 #

Alifatisk ◴[08 Jul 25 14:09 UTC] No.44500159[source]▶

>>44500021 #

It was mostly coding related tasks we had

replies(1): >>44502234 #

kadushka ◴[08 Jul 25 17:38 UTC] No.44502234[source]▶

>>44500159 #

gpt-3 could not do any coding tasks.

replies(1): >>44504616 #

1. Alifatisk ◴[08 Jul 25 22:29 UTC] No.44504616[source]▶

>>44502234 #

What makes you say so?

replies(1): >>44505729 #

2. kadushka ◴[09 Jul 25 02:08 UTC] No.44505729[source]▶

>>44504616 (TP) #

Because gpt-3 was not trained to do coding tasks. It could do a simple autocomplete. Perhaps you are confusing it with gpt-3.5?

replies(1): >>44507176 #

3. Alifatisk ◴[09 Jul 25 07:25 UTC] No.44507176[source]▶

>>44505729 #

I vividly remember it being in the same period as OpenAis Codex

replies(1): >>44516246 #

4. kadushka ◴[10 Jul 25 01:03 UTC] No.44516246{3}[source]▶

>>44507176 #

Codex paper confirms that GPT-3 could not do any coding tasks. It's right there in the abstract: https://arxiv.org/abs/2107.03374

replies(1): >>44520029 #

5. Alifatisk ◴[10 Jul 25 12:03 UTC] No.44520029{4}[source]▶

>>44516246 #

Might be GPT-3.5 then, but I am certain this was before the GPT-4 era. But that's besides the point, prompting it correctly has a huge effect on the outcome and its ability to suffice your need. So saying o3 being very unusable is hard to believe in my experience

replies(1): >>44521478 #

6. kadushka ◴[10 Jul 25 14:25 UTC] No.44521478{5}[source]▶

>>44520029 #

o3 is definitely usable, as I said, it solved about half of the coding tasks I tried. My problem with your original comment was "bottlenecks in genAI is not the knowledge or accuracy". Knowledge and accuracy are absolutely the main bottlenecks for LLMs today. Hallucination rate for o3 and o4-mini models have doubled (compared to o1), and OpenAI does not understand why. If my AI model is not accurate, and if it makes up fake knowledge I don't care how fast it is - I will have to spend more time double checking its output than the time I saved by getting that output faster.

↑