(github.com)

1311 points msoad | 1 comments | 31 Mar 23 20:37 UTC | HN request time: 0.197s | source

Show context

abujazar ◴[31 Mar 23 22:31 UTC] No.35394638[source]▶

>>35393284 (OP) #

I love how LLMs have got the attention of proper programmers such that the Python mess is getting cleaned up.

replies(2): >>35395088 #>>35398259 #

faitswulff ◴[31 Mar 23 23:18 UTC] No.35395088[source]▶

>>35394638 #

How so?

replies(2): >>35395298 #>>35399707 #

seydor ◴[01 Apr 23 12:26 UTC] No.35399707[source]▶

>>35395088 #

C has an almost infinite horizon for optimization. Python is good prototypes but we are beyond that stage now

replies(1): >>35400522 #

lostmsu ◴[01 Apr 23 14:21 UTC] No.35400522[source]▶

>>35399707 #

99% of LLM evaluation with PyTorch was already done in C++.

These .cpp projects don't improve anything for performance. They just drop dependencies necessary for training and experimentation.

replies(1): >>35400556 #

seydor ◴[01 Apr 23 14:29 UTC] No.35400556[source]▶

>>35400522 #

Optimization isn't just about speed. As you said, dropping dependencies makes it portable, embeddable, more versatile

replies(1): >>35405181 #

jart ◴[01 Apr 23 23:16 UTC] No.35405181[source]▶

>>35400556 #

It's also nice to not lose your mind over how crazy Python and Docker are, when all you want to do is run inference in a shell script as though it were the `cat` command. That sacred cow is going to have to come out of the temple sooner or later, and when that happens, people are going to think, wow, it's just a cow.

replies(1): >>35442417 #

1. Max-Limelihood ◴[04 Apr 23 16:43 UTC] No.35442417[source]▶

>>35405181 #

Have you tried Julia for this instead?

↑

Llama.cpp 30B runs with only 6GB of RAM now