←back to thread

Show HN: A private, flat monthly subscription for open-source LLMs

(synthetic.new)

29 points reissbaker | 2 comments | 28 Aug 25 19:03 UTC | HN request time: 0.426s | source

Hey HN! We've run our privacy-focused open-source inference company for a while now, and we're launching a flat monthly subscription similar to Anthropic's. It should work with Cline, Roo, KiloCode, Aider, etc — any OpenAI-compatible API client should do. The rate limits at every tier are higher than the Claude rate limits, so even if you prefer using Claude it can be a helpful backup for when you're rate limited, for a pretty low price. Let me know if you have any feedback!

1. cofob_ ◴[29 Aug 25 06:35 UTC] No.45060904[source]▶

>>45055763 (OP) #

Cool!

How are messages counted? For example, in Cursor, one request is 25 tool calls. Does 100 messages in a subscription here mean 100 tool calls or 100 requests each with 25 tool calls?

When it comes to privacy, there are also some questions. It says that requests can only be used for debugging purposes, but it later mentions a license for using the requests to improve the platform, which can mean that you can use it not only for debugging purposes.

replies(1): >>45068532 #

2. reissbaker ◴[29 Aug 25 19:44 UTC] No.45068532[source]▶

>>45060904 (TP) #

Oh to be clear, the API prompts/completions can't be stored longer than 14 days or used for anything other than debugging — the data retention section takes priority over everything else. I believe the other requests mentioned refer to general web traffic requests and web UI data. Thank you for the great question!

For requests, it depends on the agent framework to a certain extent. We just count API requests. For frameworks that support parallel tool calls, assuming they're using the standard OpenAI parallel tool call API, the entire parallel batch only counts as one request — since it only generated a single API request, and we just count API requests. I don't know exactly how Cursor structures it but I'd be surprised if they were making 100 API requests per message — I assume they're using the normal parallel tool call API to send all tools in a single batch, which equates to only taking 1 request of your quota in the rate limit.