←back to thread

1311 points msoad | 1 comments | | HN request time: 0.225s | source
Show context
jart ◴[] No.35393615[source]
Author here. For additional context, please read https://github.com/ggerganov/llama.cpp/discussions/638#discu... The loading time performance has been a huge win for usability, and folks have been having the most wonderful reactions after using this change. But we don't have a compelling enough theory yet to explain the RAM usage miracle. So please don't get too excited just yet! Yes things are getting more awesome, but like all things in science a small amount of healthy skepticism is warranted.
replies(24): >>35393868 #>>35393942 #>>35394089 #>>35394097 #>>35394107 #>>35394203 #>>35394208 #>>35394244 #>>35394259 #>>35394288 #>>35394408 #>>35394881 #>>35395091 #>>35395249 #>>35395858 #>>35395995 #>>35397318 #>>35397499 #>>35398037 #>>35398083 #>>35398427 #>>35402974 #>>35403334 #>>35468946 #
1. StillBored ◴[] No.35397499[source]
Took a look at it, did you try MAP_HUGETLB? This looks like the kind of application that can gain very large runtime advantages from avoiding TLB pressure. It might take a bit longer (or fail entirely) on machines where you can't get enough huge pages, but attempting it (or probing for free pages via /proc/meminfo) and then falling back to mapping it without might take slightly longer for the mmap() but the advantages of taking an order of magnitude (assuming you can get 1G pages) fewer TLB misses might be worth it.