> LLM: GPT 4.1 -> GPT 5 -> GPT 4.1, covered by Azure credits
whats this roundtrip? also the chronology of the LLM (4.1) doesnt match the rest of the stack (text-embedding-large-3), feels weird
replies(1):
a) has worse instruction following; doesn't follow the system prompt b) produces very long answers which resulted in a bad ux c) has 125K context window so extreme cases resulted in an error
Again, these were only observed in RAG when you pass lots of chunks, GPT-5 is probably a better model for other taks.