Dealing with some minified json, switching to iTerm, doing `pbpaste | json_pp | pbcopy` and having a clean output is _so_ nice.
Dealing with some minified json, switching to iTerm, doing `pbpaste | json_pp | pbcopy` and having a clean output is _so_ nice.
Instead, they open a file descriptor and pass that.
Tiny difference but there you go.
When you run "cmd < file", the command reads from stdin, which pulls directly from the file. When you do "cat file | cmd", "cat" opens the file, reads from there, and writes to a pipe. Then "cmd" reads from its stdin, which is a pipe.
And ... ?
In short, I straight up don't care.
Another sometimes-important difference is that if there are multiple input files, `somecommand file1 file2 file3` can tell what data is coming from which file; with `cat file1 file2 file3 | somecommand` they're all mashed together, and the program has no idea what's coming from where.
In general, though, I think it's mostly a matter of people's expertise level in using the shell. If you're a beginner, it makes sense to learn one very general way to do things (`cat |`), and use it everywhere. But as you gain expertise, you learn other ways of doing it, and will choose the best method for each specific situation. While `cat |` is usually an ok method to read from a file, it's almost never the best method, so expert shell users will almost never use it.
[1] http://redsymbol.net/articles/unofficial-bash-strict-mode/
Even the Wikipedia page on cat has a section about that, titled eponymously.
copy_file_range allows a user land program to copy data between two files without doing any user space work. Instead of reading data into a buffer and writing it back out to the destination, the kernel will somehow manage to move the data for you.
I think this will prevent any extra copies from occurring in situations where it can be used.
https://man.archlinux.org/man/copy_file_range.2
https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/cat...