←back to thread

1311 points msoad | 1 comments | | HN request time: 0.205s | source
Show context
jart ◴[] No.35393615[source]
Author here. For additional context, please read https://github.com/ggerganov/llama.cpp/discussions/638#discu... The loading time performance has been a huge win for usability, and folks have been having the most wonderful reactions after using this change. But we don't have a compelling enough theory yet to explain the RAM usage miracle. So please don't get too excited just yet! Yes things are getting more awesome, but like all things in science a small amount of healthy skepticism is warranted.
replies(24): >>35393868 #>>35393942 #>>35394089 #>>35394097 #>>35394107 #>>35394203 #>>35394208 #>>35394244 #>>35394259 #>>35394288 #>>35394408 #>>35394881 #>>35395091 #>>35395249 #>>35395858 #>>35395995 #>>35397318 #>>35397499 #>>35398037 #>>35398083 #>>35398427 #>>35402974 #>>35403334 #>>35468946 #
intelVISA ◴[] No.35394288[source]
Didn't expect to see two titans today: ggerganov AND jart. Can ya'll slow down you make us mortals look bad :')

Seeing such clever use of mmap makes me dread to imagine how much Python spaghetti probably tanks OpenAI's and other "big ML" shops' infra when they should've trusted in zero copy solutions.

Perhaps SWE is dead after all, but LLMs didn't kill it...

replies(11): >>35395112 #>>35395145 #>>35395165 #>>35395404 #>>35396298 #>>35397484 #>>35398972 #>>35399367 #>>35400001 #>>35400090 #>>35456064 #
somesortofsystm ◴[] No.35399367[source]
> such clever use of mmap

Just wanna say, that this use of mmap() is cleverly used in this context, but should be acknowledged as a widely accepted industry standard practice for getting higher performance, particularly in embedded applications but also in performance-oriented apps such as digital audio workstations, video editing systems, and so on.

replies(1): >>35400131 #
jart ◴[] No.35400131[source]
Just because mmap() is commonly used doesn't mean it's commonly understood. Yes, it powers just about everything important in terms of the skeletons of our local systems. So why has the thought of using it occurred to so few people until now? Almost a whole generation has passed since things like mmap() were relegated to "the work's been done!" category of computing. People moved on to caring about things like My Browser and The Cloud where mmap() doesn't exist. Most people don't know about it. The ones who do, are reluctant to use it. Scientific computing projects are totally devoted to supporting MSVC (since you just know data scientists are secretly using those GPUs for gaming) so any thought devs may have had previously about using mmap() would have certainly triggered fears w.r.t. WIN32 before any chance to fully consider the true depth of its value would kick in. Plus data migrations are very difficult to pull off. It worked here due to the outpouring of community support, since people were blocked on this. But for a corporation with tons of cash to burn, it's a harder sell.
replies(1): >>35411574 #
1. somesortofsystm ◴[] No.35411574[source]
The Cloud has been with us since the birth of computing. What is happening is, the computing industry goes through waves of attrition, whereby the schools push everyone up the Brand New Stack, while industry, frustrated with generations of programmers who can't program, just Builds Another Stack.

Repeat, ad infinitum. In the cracks you'll find people re-learning things they should've known, if only they weren't slagging off the grey beards .. or, even worse .. as grey beards not paying attention to the discoveries of youth.

>Most people don't know about it. The ones who do, are reluctant to use it.

Not so sure about this. The reluctance is emotional, its not technical. Nobody is killing POSIX under all of this - it is deployed. Therefore, learn it.

>so any thought devs may have had previously about using mmap() would have certainly triggered fears w.r.t. WIN32

Does not compute. Own up, you're an AI.