←back to thread

469 points ghuntley | 1 comments | | HN request time: 0.298s | source
Show context
losvedir ◴[] No.45004504[source]
Can someone confirm my understanding of how tool use works behind the scenes? Claude, ChatGPT, etc, through the API offer "tools" and give responses that ask for tool invocations which you then do and send the result back. However, the underlying model is a strictly text based medium, so I'm wondering how exactly the model APIs are turning the model response into these different sort of API responses. I'm assuming there's been a fine-tuning step with lots of examples which put desired tool invocations into some sort of delineated block or something, which the Claude/ChatGPT server understand? Is there any documentation about how this works exactly, and what those internal delineation tokens and such are? How do they ensure that the user text doesn't mess with it and inject "semantic" markers like that?
replies(3): >>45004657 #>>45004890 #>>45005147 #
1. jedimastert ◴[] No.45004657[source]
Here's some docs from anthropic about their implementation

https://docs.anthropic.com/en/docs/agents-and-tools/tool-use...

The disconnect here is that models aren't really "text" based, but token based, like how compilers don't use the code itself but a series of tokens that can include keywords, brackets, and other things. The output can include words but also metadata