←back to thread

1311 points msoad | 1 comments | | HN request time: 0s | source
Show context
jart ◴[] No.35393615[source]
Author here. For additional context, please read https://github.com/ggerganov/llama.cpp/discussions/638#discu... The loading time performance has been a huge win for usability, and folks have been having the most wonderful reactions after using this change. But we don't have a compelling enough theory yet to explain the RAM usage miracle. So please don't get too excited just yet! Yes things are getting more awesome, but like all things in science a small amount of healthy skepticism is warranted.
replies(24): >>35393868 #>>35393942 #>>35394089 #>>35394097 #>>35394107 #>>35394203 #>>35394208 #>>35394244 #>>35394259 #>>35394288 #>>35394408 #>>35394881 #>>35395091 #>>35395249 #>>35395858 #>>35395995 #>>35397318 #>>35397499 #>>35398037 #>>35398083 #>>35398427 #>>35402974 #>>35403334 #>>35468946 #
nynx ◴[] No.35393868[source]
Why is it behaving sparsely? There are only dense operations, right?
replies(2): >>35394105 #>>35399718 #
w1nk ◴[] No.35394105[source]
I also have this question, yes it should be. The forward pass should require accessing all the weights AFAIK.
replies(2): >>35394210 #>>35396386 #
1. ◴[] No.35396386[source]