←back to thread

114 points cmcconomy | 4 comments | | HN request time: 0s | source
Show context
aliljet ◴[] No.42175062[source]
This is fantastic news. I've been using Qwen2.5-Coder-32B-Instruct with Ollama locally and it's honestly such a breathe of fresh air. I wonder if any of you have had a moment to try this newer context length locally?

BTW, I fail to effectively run this on my 2080 ti, I've just loaded up the machine with classic RAM. It's not going to win any races, but as they say, it's not the speed that matter, it's the quality of the effort.

replies(3): >>42175226 #>>42176314 #>>42177831 #
1. lukev ◴[] No.42177831[source]
I ran a couple needle-in-a-haystack type queries with just a 32k context length, and was very much not impressed. It often failed to find facts buried in the middle of the prompt, that were stated almost identically to the question being asked.

It's cool that these models are getting such long contexts, but performance definitely degrades the longer the context gets and I haven't seen this characterized or quantified very well anywhere.

replies(1): >>42179500 #
2. zackangelo ◴[] No.42179500[source]
Would you care to share your prompts?

They posted a haystack benchmark in the blog post that seems too good to be true.

replies(2): >>42182251 #>>42188720 #
3. busssard ◴[] No.42182251[source]
yeah, when i saw that they have 100% coverage with 1M token, i thought this must be a placeholder image, for when the actual results come in.

Because there is no variation, nothing.

4. lukev ◴[] No.42188720[source]
I wasn't scientific about it, unfortunately. My searches were natural language, not token-based, though.