It's true that with small files, my primary interest is simply not to wear on my disk unnecessarily. However I do also often do work on large files, usually local data processing work.
"This optimization [of putting files directly into RAM instead of trusting the buffers] is unnecessary" was an interesting claim, so I decided to put it to the test with `time`.
$ # Drop any disk caches first.
$ sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
$
$ # Read a 3.5 GB JSON Lines file from disk.
$ time wc -l /home/andrew/Downloads/kaikki.org-dictionary-Finnish.jsonl
255111 /home/andrew/Downloads/kaikki.org-dictionary-Finnish.jsonl
real 0m2.249s
user 0m0.048s
sys 0m0.809s
$ # Now with caching.
$ time wc -l /dev/shm/kaikki.org-dictionary-Finnish.jsonl
255111 /dev/shm/kaikki.org-dictionary-Finnish.jsonl
real 0m0.528s
user 0m0.028s
sys 0m0.500s
$
$ # Drop caches again, just to be certain.
$ sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
$
$ # Read that same 3.5 GB LSON Lines file from /dev/shm.
$ time wc -l /dev/shm/kaikki.org-dictionary-Finnish.jsonl
255111 /dev/shm/kaikki.org-dictionary-Finnish.jsonl
real 0m0.453s
user 0m0.049s
sys 0m0.404s
Compared to the first read there is indeed a large speedup, from 2.2s down to under 0.5s. After the file had been loaded into cache from disk by the first `wc --lines`, however, the difference dropped to /dev/shm being about ~20% faster. Still significant, but not game-changingly so.
I'll probably come back to this and run more tests with some of the more complex `jq` query stuff I have to see if we stay at that 20% mark, or if it gets faster or slower.