←back to thread

1311 points msoad | 4 comments | | HN request time: 0s | source
Show context
jart ◴[] No.35393615[source]
Author here. For additional context, please read https://github.com/ggerganov/llama.cpp/discussions/638#discu... The loading time performance has been a huge win for usability, and folks have been having the most wonderful reactions after using this change. But we don't have a compelling enough theory yet to explain the RAM usage miracle. So please don't get too excited just yet! Yes things are getting more awesome, but like all things in science a small amount of healthy skepticism is warranted.
replies(24): >>35393868 #>>35393942 #>>35394089 #>>35394097 #>>35394107 #>>35394203 #>>35394208 #>>35394244 #>>35394259 #>>35394288 #>>35394408 #>>35394881 #>>35395091 #>>35395249 #>>35395858 #>>35395995 #>>35397318 #>>35397499 #>>35398037 #>>35398083 #>>35398427 #>>35402974 #>>35403334 #>>35468946 #
intelVISA ◴[] No.35394288[source]
Didn't expect to see two titans today: ggerganov AND jart. Can ya'll slow down you make us mortals look bad :')

Seeing such clever use of mmap makes me dread to imagine how much Python spaghetti probably tanks OpenAI's and other "big ML" shops' infra when they should've trusted in zero copy solutions.

Perhaps SWE is dead after all, but LLMs didn't kill it...

replies(11): >>35395112 #>>35395145 #>>35395165 #>>35395404 #>>35396298 #>>35397484 #>>35398972 #>>35399367 #>>35400001 #>>35400090 #>>35456064 #
MontyCarloHall ◴[] No.35395145[source]
>how much Python spaghetti probably tanks OpenAI's and other "big ML" shops' infra when they should've trusted in zero copy solutions

Probably not all that much. All of the Python numeric computing frameworks (Numpy, PyTorch, TensorFlow, etc.) are basically just wrappers for lower level C++/C/Fortran code. Unless you’re doing something boneheaded and converting framework-native tensors to Python objects, passing tensors around within a framework essentially just passes a pointer around, which has marginal overhead even when encapsulated in a bloated Python object.

Indeed, a huge number of PyTorch operations are explicitly zero copy: https://pytorch.org/docs/stable/tensor_view.html

replies(2): >>35396982 #>>35408170 #
oceanplexian ◴[] No.35396982[source]
It’s not that the performance is the issue, it’s that it’s unmaintainable and prone to break. Exceptions aren’t handled right, dependencies are a disaster (Proprietary NVIDIA drivers+CUDA+PyTorch+ the various versions of stuff are a complete disaster)

This leads to all sorts of bugs and breaking changes that are cool in an academic or hobbyist setting but a total headache on a large production system.

replies(3): >>35397515 #>>35397551 #>>35398182 #
CurrentB ◴[] No.35397551[source]
Yeah, I've been using python for the first time in a while to try out some of the llm stuff and I can't believe how bad the dependency hell is. It's probably particularly bad due to the pace of change in this field. But I spend an hour getting dependencies fixed every time I touch anything. 80% of the Google Collabs I find are just outright broken. I wish there were other viable non python options to try out these things.
replies(3): >>35398107 #>>35399048 #>>35424595 #
bboygravity ◴[] No.35398107[source]
No idea what a Google Collab is, but does the code come with an environment or at least a specifications of which packages and versions to use (requirements.txt)?

It sounds unnecessarily weird to me that people would share Python code that simply doesn't work out at all out of the box.

replies(2): >>35398243 #>>35402417 #
version_five ◴[] No.35398243[source]
Its rarely as easy as sharing a requirements.txt. There are lots of things that can still break - for examples you get weird situations where different modules require different versions of a third module. Or all the Cuda toolkit version issues thsy seem to come up with gpu stuff. When we share python, we tend to share a docker image, and even this isn't foolproof. A big problem I think is that it doesn't incentivize building something portable. And it's very hard to test across different machines. Add to that all the different practices re virtual environments, venv, conda, etc, everyone tries to install the dependencies differently or is starting from some nonstandard state. It's a mess.
replies(1): >>35400039 #
pablo1107 ◴[] No.35400039{3}[source]
Maybe using Nix it's a better experience for creating such an environment where you depending also on system utilities.
replies(1): >>35401252 #
superkuh ◴[] No.35401252{4}[source]
Everyone is using llama.cpp because we reject the idea of giving up on system libraries like nix does. That kind of tomfoolery (at least in the desktop context) is only required when you use software projects that use libraries/languages which break forwards compatibility every 3 years.

If you just write straight c++ (without c++xx, or anything like it) you can compile the code on machines from decades ago if you want.

replies(2): >>35402510 #>>35405209 #
remexre ◴[] No.35405209{5}[source]
What's c++xx?
replies(1): >>35407666 #
opless ◴[] No.35407666{6}[source]
C++11, and greater.
replies(1): >>35416499 #
1. remexre ◴[] No.35416499{7}[source]
Huh, I was proficient in Rust before "properly" learning C++, so maybe that accounts for it, but I didn't realize C++11 was controversial. Is it just move semantics, or are there some library things that are hard to implement?
replies(1): >>35431561 #
2. int_19h ◴[] No.35431561[source]
I think what OP is saying is that decades-old systems wouldn't have C++11-compatible compilers on them.
replies(1): >>35441542 #
3. bboygravity ◴[] No.35441542[source]
And maybe that "C++" is now basically a bunch of different incompatible languages instead of just 1 language, depending on what "xx" is (11, 14, 17, 20, 23, etc).

It's like Python 2 vs Python 3 except even worse.

replies(1): >>35444029 #
4. int_19h ◴[] No.35444029{3}[source]
In my experience, C++03 code works just fine without changes on a C++11 and C++14 compilers, so no, it's not at all like Python 2/3. The few features that were ripped out were exactly the stuff that pretty much no-one was using for good reasons (e.g. throw-specifications).