Olmo 3: Charting a path through the model flow to lead open-source AI

(allenai.org)

361 points mseri | 4 comments | 21 Nov 25 06:50 UTC | HN request time: 0s | source

Show context

nickreese ◴[21 Nov 25 16:49 UTC] No.46006142[source]▶

I'm just now moving my main workflows off openai over to local models and I'm starting to find that these smaller models main failure mode is that they will accept edgecases with the goal of being helpful.

Especially in extraction tasks. This appears as inventing data or rationalizing around clear roadblocks.

My biggest hack so far is giving them an out named "edge_case" and telling them it is REALLY helpful if they identify edgecases. Simply renaming "fail_closed" or "dead_end" options to "edge_case" with helpful wording causes qwen models to adhere to their prompting more.

It feels like there are 100s of these small hacks that people have to have discovered... why isn't there a centralized place where people are recording these learnings?

replies(2): >>46006316 #>>46008300 #

1. alach11 ◴[21 Nov 25 17:05 UTC] No.46006316[source]▶

>>46006142 #

Just curious - are you using Open WebUI or Librechat as a local frontend or are all your workflows just calling the models directly without UI?

replies(1): >>46008034 #

2. nickreese ◴[21 Nov 25 19:32 UTC] No.46008034[source]▶

>>46006316 (TP) #

I run lmstudio for ease of use on several mac studios that are fronted by a small token aware router that estimates resource usage on the mac studios.

Lots of optimization left there, but the systems are pinned most of the time so not focused on that at the moment as the gpus are the issue not the queuing.

replies(1): >>46010123 #

3. grosswait ◴[21 Nov 25 23:04 UTC] No.46010123[source]▶

>>46008034 #

I would like to hear more about your set up if you’re willing. Is the token aware router you’re using publicly available or something you’ve written yourself?

replies(1): >>46011096 #

4. nickreese ◴[22 Nov 25 01:11 UTC] No.46011096{3}[source]▶

>>46010123 #

It isn't open... but drop me an email and I can send you it. Basically just tracks a list of known lmstudios on the network, queries their models every 15 seconds and routes to the ones who have the requested models loaded in a FIFO queue tracking the number of tokens/model (my servers are uniform... m4 max 128gb studios but could also track the server) and routes to the one that has just finished. I used to have it queue one just as it was expected to finish but was facing timeout issues due to an edgecase.

↑