LLM that can call multiple tool APIs with one request

1. OutOfHere ◴[18 Jun 24 17:01 UTC] No.40719915[source]▶

I have developed multiple multi-step LLM workflows, expressible as both conditional and parallel DAGs, using mostly plain Python, and I still don't understand why these langchain-type libraries feel the need to exist. Plain Python is quite sufficient for advanced LLM workflows if you know how to use it.

LLMs are innately unreliable, and they require a lot of hand-holding and prompt-tuning to get them to work well. Getting into the low-level details of the prompts is too essential. I don't want any libraries to come in the way because I have to be able to find and cleverly prevent the failure cases that happen just 1 in 500 times.

These libraries seem to mainly just advertise each other. If I am missing something, I don't know what it is.

replies(3): >>40720149 #>>40720202 #>>40720512 #

2. leobg ◴[18 Jun 24 17:28 UTC] No.40720149[source]▶

>>40719915 (TP) #

Always felt the same way, but could never put it in words as eloquently as you just did. Python (or any other programming language) already is the best glue. With these frameworks, you just waste brain cycles on learning APIs that change and break every couple of months.

3. SeriousStorm ◴[18 Jun 24 17:34 UTC] No.40720202[source]▶

>>40719915 (TP) #

Every time I look into building a workflow with langchain it seems unnecessarily complex. So I end up stopping.

Are you just running an LLM server (Ollama, llama.cpp, etc) and then making API calls to that server with plain Python or is it more than that?

replies(1): >>40720334 #

4. OutOfHere ◴[18 Jun 24 17:47 UTC] No.40720334[source]▶

>>40720202 #

I suppose ollama and llama.cpp, or at least any corresponding Python SDKs, would be good for using self-hosted models, especially if they support parallel GPU use. If it's something custom, Pytorch would come into the picture. In production workflows, it can obviously be useful to run certain LLM prompts in parallel to hasten the job.

For now I have used only cloud APIs with their Python SDKs, including the prompt completion, TTS, and embedding endpoints. They allow me to run many jobs in parallel which is useful for complex workflows or if facing heavy user demand. For caching of responses, I have used a local disk caching library, although I guess one can alternatively use a standalone or embedded database. I have used threading via `concurrent.futures` for concurrent jobs, although asyncio too would work.

The one simple external Python library I found so far is `semantic-text-splitter` for splitting long texts using token counts, but this too I could have done by myself with a bit of effort. I think langchain has something for it too.

5. etse ◴[18 Jun 24 18:06 UTC] No.40720512[source]▶

>>40719915 (TP) #

If you wanted to compare OpenAI models against Anthropic or Google, wouldn't the framework help a lot? Breaking APIs is more about bad framework development than frameworks in general.

I think frameworks tend to provide an escape hatch. LlamaIndex comes to mind. It seems to me that by not learning and using an existing framework, you're building your own, which is a calculated tradeoff.

replies(1): >>40721015 #

6. OutOfHere ◴[18 Jun 24 19:05 UTC] No.40721015[source]▶

>>40720512 #

That is a good use case and it's a good problem to have, certainly the kind I wanted to hear, but it's not a problem I have had yet.

Moreover, I absolutely expect to have to update my prompts if I have to support a different model, even if its a different model by the same provider. For example, there is a difference in the behavior of gpt4-turbo vs gpt4-o even though both are by OpenAI.

Specific LLMs have specific tendencies and preferences which one has to work with. What I'm saying is that the framework will help, but it's not as simple as switching the model class.

replies(1): >>40725205 #

7. etse ◴[19 Jun 24 05:44 UTC] No.40725205{3}[source]▶

>>40721015 #

I'm not quite understanding how different prompts for different models reduces the attractiveness of a framework. A framework could theoretically have an LLM evals package to run continuous experiments of all prompts against across all models.

Also theoretically, an LLM framework could estimate costs, count tokens, offer a variety of chunking strategies, unify the more sophisticated APIs, like tools or agents–all of which could vary from provider to provider.

Admittedly, this view came just from doing early product explorations, but a framework was helpful for most of the above reasons (I didn't find an evals framework that I liked).

You mentioned not having this problem yet. What kind of problems have you been running across? I'm wondering if I'm missing some other context.