←back to thread

129 points ericciarla | 1 comments | | HN request time: 0.209s | source
Show context
OutOfHere ◴[] No.40719915[source]
I have developed multiple multi-step LLM workflows, expressible as both conditional and parallel DAGs, using mostly plain Python, and I still don't understand why these langchain-type libraries feel the need to exist. Plain Python is quite sufficient for advanced LLM workflows if you know how to use it.

LLMs are innately unreliable, and they require a lot of hand-holding and prompt-tuning to get them to work well. Getting into the low-level details of the prompts is too essential. I don't want any libraries to come in the way because I have to be able to find and cleverly prevent the failure cases that happen just 1 in 500 times.

These libraries seem to mainly just advertise each other. If I am missing something, I don't know what it is.

replies(3): >>40720149 #>>40720202 #>>40720512 #
etse ◴[] No.40720512[source]
If you wanted to compare OpenAI models against Anthropic or Google, wouldn't the framework help a lot? Breaking APIs is more about bad framework development than frameworks in general.

I think frameworks tend to provide an escape hatch. LlamaIndex comes to mind. It seems to me that by not learning and using an existing framework, you're building your own, which is a calculated tradeoff.

replies(1): >>40721015 #
OutOfHere ◴[] No.40721015[source]
That is a good use case and it's a good problem to have, certainly the kind I wanted to hear, but it's not a problem I have had yet.

Moreover, I absolutely expect to have to update my prompts if I have to support a different model, even if its a different model by the same provider. For example, there is a difference in the behavior of gpt4-turbo vs gpt4-o even though both are by OpenAI.

Specific LLMs have specific tendencies and preferences which one has to work with. What I'm saying is that the framework will help, but it's not as simple as switching the model class.

replies(1): >>40725205 #
1. etse ◴[] No.40725205[source]
I'm not quite understanding how different prompts for different models reduces the attractiveness of a framework. A framework could theoretically have an LLM evals package to run continuous experiments of all prompts against across all models.

Also theoretically, an LLM framework could estimate costs, count tokens, offer a variety of chunking strategies, unify the more sophisticated APIs, like tools or agents–all of which could vary from provider to provider.

Admittedly, this view came just from doing early product explorations, but a framework was helpful for most of the above reasons (I didn't find an evals framework that I liked).

You mentioned not having this problem yet. What kind of problems have you been running across? I'm wondering if I'm missing some other context.