(blog.abdellatif.io)

548 points tifa2up | 2 comments | 20 Oct 25 15:55 UTC | HN request time: 0s | source

Show context

swyx ◴[21 Oct 25 07:21 UTC] No.45653321[source]▶

> LLM: GPT 4.1 -> GPT 5 -> GPT 4.1, covered by Azure credits

whats this roundtrip? also the chronology of the LLM (4.1) doesnt match the rest of the stack (text-embedding-large-3), feels weird

replies(1): >>45653457 #

1. tifa2up ◴[21 Oct 25 07:47 UTC] No.45653457[source]▶

>>45653321 #

OP. We migrated to GPT-5 when it came out but found that it performs worse than 4.1 when you pass lots of context (up to 100K tokens in some cases). We found that it:

a) has worse instruction following; doesn't follow the system prompt b) produces very long answers which resulted in a bad ux c) has 125K context window so extreme cases resulted in an error

Again, these were only observed in RAG when you pass lots of chunks, GPT-5 is probably a better model for other taks.

Production RAG: what I learned from processing 5M+ documents