←back to thread

602 points emrah | 8 comments | | HN request time: 1.046s | source | bottom
1. perching_aix ◴[] No.43744332[source]
This is my first time trying to locally host a model - gave both the 12B and 27B QAT models a shot.

I was both impressed and disappointed. Setup was piss easy, and the models are great conversationalists. I have a 12 gig card available and the 12B model ran very nice and swift.

However, they're seemingly terrible at actually assisting with stuff. Tried something very basic: asked for a powershell one liner to get the native blocksize of my disks. Ended up hallucinating fields, then telling me to go off into the deep end, first elevating to admin, then using WMI, then bringing up IOCTL. Pretty unfortunate. Not sure I'll be able to put it to actual meaningful use as a result.

replies(4): >>43744568 #>>43744683 #>>43747309 #>>43748148 #
2. parched99 ◴[] No.43744568[source]
I think Powershell is a bad test. I've noticed all local models have trouble providing accurate responses to Powershell-related prompts. Strangely, even Microsoft's model, Phi 4, is bad at answering these questions without careful prompting. Though, MS can't even provide accurate PS docs.

My best guess is that there's not enough discussion/development related to Powershell in training data.

replies(1): >>43746262 #
3. terhechte ◴[] No.43744683[source]
Local models, due to their size more than big cloud models, favor popular languages rather than more niche ones. They work fantastic for JavaScript, Python, Bash but much worse at less popular things like Clojure, Nim or Haskell. Powershell is probably on the less popular side compared to Js or Bash.

If this is your main use case you can always try to fine tune a model. I maintain a small llm bench of different programming languages and the performance difference between say Python and Rust on some smaller models is up to 70%

replies(1): >>43744769 #
4. perching_aix ◴[] No.43744769[source]
How accessible and viable is model fine-tuning? I'm not in the loop at all unfortunately.
replies(1): >>43752517 #
5. fragmede ◴[] No.43746262[source]
Which, like, you'd think Microsoft has an entire team there who's purpose would be to generate good PowerShell for it to train on.
6. HachiWari8 ◴[] No.43747309[source]
I tried the 27B QAT model and it hallucinates like crazy. When I ask it for information about some made up person, restaurant, place name, etc., it never says "I don't know about that" and instead seems eager to just make up details. The larger local models like the older Llama 3.3 70B seem better at this, but are also too big to fit on a 24GB GPU.
replies(1): >>43751992 #
7. jayavanth ◴[] No.43748148[source]
you should set a lower temperature
8. terhechte ◴[] No.43752517{3}[source]
This is a very accessible way of playing around with the topic: https://transformerlab.ai