Un Ministral, Des Ministraux

1. ed ◴[16 Oct 24 16:19 UTC] No.41860918[source]▶

3b is is API-only so you won’t be able to run it on-device, which is the killer app for these smaller edge models.

I’m not opposed to licensing but “email us for a license” is a bad sign for indie developers, in my experience.

8b weights are here https://huggingface.co/mistralai/Ministral-8B-Instruct-2410

Commercial entities aren’t permitted to use or distribute 8b weights - from the agreement (which states research purposes only):

"Research Purposes": means any use of a Mistral Model, Derivative, or Output that is solely for (a) personal, scientific or academic research, and (b) for non-profit and non-commercial purposes, and not directly or indirectly connected to any commercial activities or business operations. For illustration purposes, Research Purposes does not include (1) any usage of the Mistral Model, Derivative or Output by individuals or contractors employed in or engaged by companies in the context of (a) their daily tasks, or (b) any activity (including but not limited to any testing or proof-of-concept) that is intended to generate revenue, nor (2) any Distribution by a commercial entity of the Mistral Model, Derivative or Output whether in return for payment or free of charge, in any medium or form, including but not limited to through a hosted or managed service (e.g. SaaS, cloud instances, etc.), or behind a software layer.

replies(8): >>41861229 #>>41861251 #>>41862331 #>>41862714 #>>41862802 #>>41863345 #>>41865597 #>>41866472 #

2. mark_l_watson ◴[16 Oct 24 16:52 UTC] No.41861229[source]▶

>>41860918 (TP) #

You are correct, convenience for trying many new models is important. For me, this means being able to run with Ollama.

3. diggan ◴[16 Oct 24 16:54 UTC] No.41861251[source]▶

>>41860918 (TP) #

> I’m not opposed to licensing but “email us for a license” is a bad sign for indie developers, in my experience.

At least they're not claiming it's Open Source / Open Weights, kind of happy about that, as other companies didn't get the memo that lying/misleading about stuff like that is bad.

replies(1): >>41861795 #

4. talldayo ◴[16 Oct 24 17:53 UTC] No.41861795[source]▶

>>41861251 #

Yeah, a real silver-lining on the API-only access for a model that is intentionally designed for edge devices. As a user I honestly only care about the weights being open - I'm not going to reimpliment their training code and I don't need or want redistributed training data that both already exists elsewhere. There is no benefit, for my uses, to having an "open source" model when I could have weights and finetunes instead.

There's nothing to be happy about when businesses try to wall-off a feature to make you salivate over it more. You're within your right to nitpick licensing differences, but unless everyone gets government-subsidized H100s in their garage I don't think the code will be of use to anyone except moneyed competitors that want to undermine foundational work.

5. tarruda ◴[16 Oct 24 18:42 UTC] No.41862331[source]▶

>>41860918 (TP) #

Isn't 3b the kind of size you'd expect to be able to run on the edge? What is the point of using 3b via API when you can use larger and more capable models?

replies(1): >>41862987 #

6. cjtrowbridge ◴[16 Oct 24 19:14 UTC] No.41862714[source]▶

>>41860918 (TP) #

They released it on huggingface.

7. wg0 ◴[16 Oct 24 19:20 UTC] No.41862802[source]▶

>>41860918 (TP) #

Genuine question - if I have a model which I only release weights with restrictions on commercial usage and then someone deploys that model and operates it commercially - what are the way to identify that it is my model that's doing the online per token slavery over HTTP endpoint?

replies(1): >>41866561 #

8. littlestymaar ◴[16 Oct 24 19:38 UTC] No.41862987[source]▶

>>41862331 #

GP misunderstood: 3b will be available for running on edge devices, but you must sign a deal with Mistral to get access to the weights to run.

I don't think that can work without a significant lobbying push towards models running on the edge but who knows (especially since they have a former French Minister in the founding team).

replies(1): >>41863187 #

9. ed ◴[16 Oct 24 19:59 UTC] No.41863187{3}[source]▶

>>41862987 #

> GP misunderstood

I don’t think it’s fair to claim the weights are available if you need to hammer out a custom agreement with mistral’s sales team first.

If they had a self-serve process, or some sort of shink-wrapped deal up to say 500k users, that would be great. But bespoke contracts are rarely cheap or easy to get. This comes from my experience building a bunch of custom infra for Flux1-dev, only to find I wasn’t big enough for a custom agreement, because, duh, the service doesn’t exist yet. Mistral is not BFL, but sales teams don’t like speculating on usage numbers for a product that hasn’t been released yet. Which is a bummer considering most innovation happens at a small scale initially.

replies(1): >>41866847 #

10. DreamGen ◴[16 Oct 24 20:12 UTC] No.41863345[source]▶

>>41860918 (TP) #

From what I have heard, getting license from them is also far from guaranteed. They are selective about who they want to do business with -- understandable, but something to keep in mind.

11. moralestapia ◴[17 Oct 24 01:32 UTC] No.41865597[source]▶

>>41860918 (TP) #

Lol, the whole point of Edge models is to be able to run them locally.

12. csomar ◴[17 Oct 24 04:42 UTC] No.41866472[source]▶

>>41860918 (TP) #

Thanks, I was confused for a bit. The 3b comparison with llma3.2 is useless. If I can't run it on my laptop, it's no longer comparable to open models.

13. dest ◴[17 Oct 24 05:02 UTC] No.41866561[source]▶

>>41862802 #

There are ways to watermark the output, by slightly altering the choice of tokens in a recognizable pattern.

replies(1): >>41866657 #

14. wg0 ◴[17 Oct 24 05:24 UTC] No.41866657{3}[source]▶

>>41866561 #

Within the model? Like as part of training afterwards some fine tuning?

15. littlestymaar ◴[17 Oct 24 06:05 UTC] No.41866847{4}[source]▶

>>41863187 #

I'm not defending Mistral here, I don't think it's a good idea I just wanted to to not out that there is no paradox as if the 3b model was API-only.