To add, searching for “useless use of cat” will yield several results for those interested in learning more. Other examples include “useless use of echo” and “useless use of ls *”.

replies(1): >>36500160 #

4. adrianmonk ◴[27 Jun 23 15:04 UTC] No.36493906[source]▶

>>36493372 #

Not just that, but also all the bytes have to go through an extra pipe. Presumably they're copied an extra time because of this.

When you run "cmd < file", the command reads from stdin, which pulls directly from the file. When you do "cat file | cmd", "cat" opens the file, reads from there, and writes to a pipe. Then "cmd" reads from its stdin, which is a pipe.

replies(1): >>36500885 #

5. omginternets ◴[27 Jun 23 15:53 UTC] No.36494613[source]▶

>>36493372 #

>They avoid an unnecessary invocation of the cat executable.

And ... ?

6. wpm ◴[27 Jun 23 16:07 UTC] No.36494783[source]▶

>>36493372 #

I teach shell scripting. Cat invocations are cheap and help learners understand and keep clear where input is coming from, and where it is going. There are no awards or benefits to reducing the number of lines, commands invoked, or finding the shortest possible way to perform a task in a script. There are plenty of detriments to reading and understanding though when we try to obfuscate this to save 1ms of execution time on a script that is going to execute near instantaneously anyways.

In short, I straight up don't care.

replies(1): >>36494997 #

7. derefr ◴[27 Jun 23 16:17 UTC] No.36494926[source]▶

>>36493144 (TP) #

If the command is meant to stream through something really fast by using a large buffer size, then prepending a cat(1) will limit the incoming buffer size to ~4k.

replies(1): >>36500068 #

8. revscat ◴[27 Jun 23 16:21 UTC] No.36494997{3}[source]▶

>>36494783 #

I 100% agree with you. My only defense of OP is that `<` is something tends to be forgotten. Like everyone else in this thread I go to `cat` first for things like this. But sometimes I forget that even `<` exists, and the callout is a nice reminder.

9. gdavisson ◴[27 Jun 23 19:50 UTC] No.36497774[source]▶

>>36493144 (TP) #

It usually doesn't matter much, but there are some situations where it can matter a lot. For one thing, you can't use seek() on a pipe, so e.g. `cat bigfile | tail` has to read through the entire file to find the end, but `tail bigfile` will read the file backward from the end, completely skipping the irrelevant beginning and middle. With `pv bigfile | whatever`, pv (which is basically a pipeline progress indicator) can tell how big file is and tell you how for through you are as a percentage; with `cat bigfile | pv | whatever`, it has no idea (unless you add a flag to tell it). Also, `cat bigfile | head` will end up killing cat with a SIGPIPE signal after head exits; if you're using something like "Unofficial bash strict mode" [1], this will cause your script to exit prematurely.

Another sometimes-important difference is that if there are multiple input files, `somecommand file1 file2 file3` can tell what data is coming from which file; with `cat file1 file2 file3 | somecommand` they're all mashed together, and the program has no idea what's coming from where.

In general, though, I think it's mostly a matter of people's expertise level in using the shell. If you're a beginner, it makes sense to learn one very general way to do things (`cat |`), and use it everywhere. But as you gain expertise, you learn other ways of doing it, and will choose the best method for each specific situation. While `cat |` is usually an ok method to read from a file, it's almost never the best method, so expert shell users will almost never use it.

[1] http://redsymbol.net/articles/unofficial-bash-strict-mode/

10. vram22 ◴[27 Jun 23 23:19 UTC] No.36500068[source]▶

>>36494926 #

Interesting.

Maybe use dd with one of its blocksize options, then?

Not at a terminal, can't check.

11. vram22 ◴[27 Jun 23 23:30 UTC] No.36500160{3}[source]▶

>>36493489 #

Yes.

Even the Wikipedia page on cat has a section about that, titled eponymously.

https://en.m.wikipedia.org/wiki/Cat_(Unix)

12. chlorion ◴[28 Jun 23 00:56 UTC] No.36500885{3}[source]▶

>>36493906 #

GNU cat will use the copy_file_range syscall when possible!

copy_file_range allows a user land program to copy data between two files without doing any user space work. Instead of reading data into a buffer and writing it back out to the destination, the kernel will somehow manage to move the data for you.

I think this will prevent any extra copies from occurring in situations where it can be used.

https://man.archlinux.org/man/copy_file_range.2

https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/cat...

↑

macOS command-line tools you might not know about