Most active commenters
  • omginternets(3)
  • jonnycomputer(3)

←back to thread

2039 points Gadiguibou | 43 comments | | HN request time: 2.654s | source | bottom
1. omginternets ◴[] No.36492265[source]
If I had a nickel for each `cat foo.json | jq | pbcopy`, I'd be a rich man :)
replies(1): >>36492862 #
2. maleldil ◴[] No.36492862[source]
That's a useless use of cat. You can use `jq . foo.json | pbcopy` or `jq < foo.json | pbcopy`.
replies(6): >>36492960 #>>36493144 #>>36493370 #>>36494558 #>>36495101 #>>36495918 #
3. jdbartee ◴[] No.36492960[source]
Speaking for myself, the first form is more natural- even if it’s a useless cat, because I’m always cat-ing files to see their structure. Then progressively tacking on different transforms. And then finally putting it in whatever I want as output.

It’s so ingrained, I’m more likely than not to just write it out that way even when I know exactly what I’m doing from the onset.

replies(5): >>36493350 #>>36493487 #>>36494377 #>>36496238 #>>36497947 #
4. nicky0 ◴[] No.36493144[source]
In what way do you see those alternatives as superior?
replies(3): >>36493372 #>>36494926 #>>36497774 #
5. paulddraper ◴[] No.36493350{3}[source]
You could consider

    < foo.json jq | pbcopy
6. ◴[] No.36493370[source]
7. paulddraper ◴[] No.36493372{3}[source]
They avoid an unnecessary invocation of the cat executable.

Instead, they open a file descriptor and pass that.

Tiny difference but there you go.

replies(4): >>36493489 #>>36493906 #>>36494613 #>>36494783 #
8. jonnycomputer ◴[] No.36493487{3}[source]
Yes, this iterative procedure is often why "useless" cats get put into it. It's a very effective way of processing regular text information.

e.g.

I need to grab some info from textfile.txt to use as arguments to a function.

cat textfile.txt

looks like its comma delimited.

cat textfile.txt | cut -d, -f 2-5

ah, its the third and fourth column i need

cat textfile.txt | cut -d, -f 3-4 | grep '123456'

perfect

cat textfile.txt | cut -d, -f 3-4 | grep 123456 | tr , ' '

myfunc $(cat textfile.txt | cut -d, -f 3-4 | grep 123456 | tr , ' ')

replies(1): >>36494503 #
9. latexr ◴[] No.36493489{4}[source]
To add, searching for “useless use of cat” will yield several results for those interested in learning more. Other examples include “useless use of echo” and “useless use of ls *”.
replies(1): >>36500160 #
10. adrianmonk ◴[] No.36493906{4}[source]
Not just that, but also all the bytes have to go through an extra pipe. Presumably they're copied an extra time because of this.

When you run "cmd < file", the command reads from stdin, which pulls directly from the file. When you do "cat file | cmd", "cat" opens the file, reads from there, and writes to a pipe. Then "cmd" reads from its stdin, which is a pipe.

replies(1): >>36500885 #
11. jamespullar ◴[] No.36494377{3}[source]
I've been using bat as a cat replacement for a while now. It includes paging, syntax highlighting, line numbers, and is generally very performant.

https://github.com/sharkdp/bat

12. gumby ◴[] No.36494503{4}[source]
> cat textfile.txt

> looks like its comma delimited.

Interesting; why wouldn't you use `head`? Who knows how big textfile.txt is?

replies(4): >>36494862 #>>36494869 #>>36494913 #>>36495260 #
13. nojs ◴[] No.36494558[source]
The “useless cat” meme needs to die. Everyone is aware that most commands accept a file argument, but looking up the arguments and their ordering is annoying and using cat for things like this is just fine.
replies(4): >>36494590 #>>36494899 #>>36496086 #>>36496387 #
14. omginternets ◴[] No.36494590{3}[source]
This. "Useless cat" is more useful than "useless file-arg".
15. omginternets ◴[] No.36494613{4}[source]
>They avoid an unnecessary invocation of the cat executable.

And ... ?

16. wpm ◴[] No.36494783{4}[source]
I teach shell scripting. Cat invocations are cheap and help learners understand and keep clear where input is coming from, and where it is going. There are no awards or benefits to reducing the number of lines, commands invoked, or finding the shortest possible way to perform a task in a script. There are plenty of detriments to reading and understanding though when we try to obfuscate this to save 1ms of execution time on a script that is going to execute near instantaneously anyways.

In short, I straight up don't care.

replies(1): >>36494997 #
17. rovr138 ◴[] No.36494862{5}[source]
`file` will tell you too
replies(1): >>36494911 #
18. ◴[] No.36494869{5}[source]
19. epcoa ◴[] No.36494899{3}[source]
The redirect always works though - that is not a program argument, that is handled by the shell. Apparently not everyone is aware of that.
20. jonnycomputer ◴[] No.36494911{6}[source]
Won't tell you the delimiter.
21. yrro ◴[] No.36494913{5}[source]
Don't forget to pipe head into 'cat -v'... that text file could contain _anything_!
replies(2): >>36495824 #>>36496116 #
22. derefr ◴[] No.36494926{3}[source]
If the command is meant to stream through something really fast by using a large buffer size, then prepending a cat(1) will limit the incoming buffer size to ~4k.
replies(1): >>36500068 #
23. revscat ◴[] No.36494997{5}[source]
I 100% agree with you. My only defense of OP is that `<` is something tends to be forgotten. Like everyone else in this thread I go to `cat` first for things like this. But sometimes I forget that even `<` exists, and the callout is a nice reminder.
24. Someone ◴[] No.36495101[source]
Is there any shell that has cat as a built-in?

Such a shell could remove some of the more common cases.

replies(1): >>36497117 #
25. jonnycomputer ◴[] No.36495260{5}[source]
generally, speaking, if you don't have an idea of how big the file is, or it would take up too much real-estate on your terminal window, sure. 100%. It was just an example.

lot's of times we sort of know what we are working with, but don't remember the particulars especially

26. gunapologist99 ◴[] No.36495824{6}[source]
Thank you for pointing this out! This is much safer.
27. cratermoon ◴[] No.36495918[source]
https://porkmail.org/era/unix/award
28. hdb2 ◴[] No.36496086{3}[source]
granted, it is a little snarky and maybe the snark isn't appropriate in today's tech environment. but no, things like "useless use of cat" do not need to go away, because they make me better at what I do in little ways. those little ways add up over time.

> but looking up the arguments and their ordering is annoying

you seem to be arguing for complacency. taking your idea to an extreme, why learn to do _anything_ well?

29. lelandbatey ◴[] No.36496116{6}[source]
I really recommend folks use "less" over cat, especially keyboard oriented folks. Different terminal emulators don't always have the scroll behavior I want, not do they always allow me to search the file I'm looking at. "less" does all those things, in nearly every environment no matter the terminal emulator, and has other wonderful options to boot (chop long lines so they don't wrap can be nice for logs, line numbers can be VITAL, etc).

I still uselessly use cat though, it's such a nice way to build a pipeline.

replies(2): >>36498355 #>>36499959 #
30. patrec ◴[] No.36496238{3}[source]
If you're using zsh, you can just replace any instance of

    $ cat somefile ...
with

    $ <somefile ...
For bash, this only works if you have at least one `|`.
replies(1): >>36502079 #
31. burnished ◴[] No.36496387{3}[source]
Everyone is not aware, new people are joining all the time.
32. hnlmorg ◴[] No.36497117{3}[source]
All of them do. Including bash. It’s just not the same syntax (ie ‘< filename’).

But I honestly think people who try to optimise away ‘cat’ are optimising the wrong thing. If one extra fork() is that detrimental then don’t use a shell scripting language.

For a lot of people, “useless” ‘cat’ enables them to write a pipeline in the order that their brain farts out the requirements for the pipeline. So they’ve optimised for human productivity. And given the human brain is slower than a few extra fork()s, I think optimising for one’s brain makes more sense here.

replies(1): >>36498644 #
33. gdavisson ◴[] No.36497774{3}[source]
It usually doesn't matter much, but there are some situations where it can matter a lot. For one thing, you can't use seek() on a pipe, so e.g. `cat bigfile | tail` has to read through the entire file to find the end, but `tail bigfile` will read the file backward from the end, completely skipping the irrelevant beginning and middle. With `pv bigfile | whatever`, pv (which is basically a pipeline progress indicator) can tell how big file is and tell you how for through you are as a percentage; with `cat bigfile | pv | whatever`, it has no idea (unless you add a flag to tell it). Also, `cat bigfile | head` will end up killing cat with a SIGPIPE signal after head exits; if you're using something like "Unofficial bash strict mode" [1], this will cause your script to exit prematurely.

Another sometimes-important difference is that if there are multiple input files, `somecommand file1 file2 file3` can tell what data is coming from which file; with `cat file1 file2 file3 | somecommand` they're all mashed together, and the program has no idea what's coming from where.

In general, though, I think it's mostly a matter of people's expertise level in using the shell. If you're a beginner, it makes sense to learn one very general way to do things (`cat |`), and use it everywhere. But as you gain expertise, you learn other ways of doing it, and will choose the best method for each specific situation. While `cat |` is usually an ok method to read from a file, it's almost never the best method, so expert shell users will almost never use it.

[1] http://redsymbol.net/articles/unofficial-bash-strict-mode/

34. fastaguy88 ◴[] No.36497947{3}[source]
As a scientist who cares about reproducibility, the big difference between the "useless cat" and providing the input file name on the command line is that, in the latter case, the program can capture that file name and reproduce it. That is harder when using stdin.

Many of my programs and scripts start output with the line: # cmd arg1 arg2 arg3 ...

and simply echo back lines that start with '#'. That way, I have an internal record of the program that was run and the data file that was read (as well as previous parts of the analysis chain).

And, 'R' ignores lines starting with '#', so the record is there, but does not affect later analyses.

35. jmhammond ◴[] No.36498355{7}[source]
My useless cat is that I always use `cat file | less` when I could just `less file`.

I've been typing cat for over 25 years. Old habits die hard.

36. Someone ◴[] No.36498644{4}[source]
> All of them do. Including bash.

Are you sure? https://unix.stackexchange.com/questions/208615/is-cat-a-she... disagrees and neither https://manpages.ubuntu.com/manpages/jammy/man7/bash-builtin... nor https://zsh.sourceforge.io/Doc/Release/Shell-Builtin-Command... mention it

replies(1): >>36499220 #
37. hnlmorg ◴[] No.36499220{5}[source]
Literally the next sentence after the one you quoted explains my point:

> It’s just not the same syntax (ie ‘< filename’).

Reading from a file isn’t a hard problem. Having a good UX for doing that is where most shells fall apart. And that’s basically what ‘cat’ offers here: an improved UX.

Having ‘cat’ as a shell builtin wouldn’t really solve the complaints raised by “useless use of” anyway because you’d still be piping (and in some cases, fork()ing too). You couldnt really use ‘cat’ as syntactic sugar for ‘<‘ because things start to get really weird if you want to pass flags to ‘cat’ or even redirect the output to something other than a pipe. And given ‘cat’ is POSIX (https://en.m.wikipedia.org/wiki/List_of_Unix_commands#/media...) the current behaviour of shells is, in my opinion, correct. This is why my own shell has a differently named builtin that approximately serves the purpose of ‘cat’ but for instances when you need the command built into the shell and it can’t just be passing a file handle to the next command (in my case, because i wanted to pass metadata out-of-band as well as the file contents)

38. rconti ◴[] No.36499959{7}[source]
I hate that when I use `less`, then quit, the output goes away.
replies(1): >>36508860 #
39. vram22 ◴[] No.36500068{4}[source]
Interesting.

Maybe use dd with one of its blocksize options, then?

Not at a terminal, can't check.

40. vram22 ◴[] No.36500160{5}[source]
Yes.

Even the Wikipedia page on cat has a section about that, titled eponymously.

https://en.m.wikipedia.org/wiki/Cat_(Unix)

41. chlorion ◴[] No.36500885{5}[source]
GNU cat will use the copy_file_range syscall when possible!

copy_file_range allows a user land program to copy data between two files without doing any user space work. Instead of reading data into a buffer and writing it back out to the destination, the kernel will somehow manage to move the data for you.

I think this will prevent any extra copies from occurring in situations where it can be used.

https://man.archlinux.org/man/copy_file_range.2

https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/cat...

42. ddingus ◴[] No.36502079{4}[source]
I did this last time I saw it come up and was surprised! Doing it makes perfect sense in hindsight. Neato!
43. LgWoodenBadger ◴[] No.36508860{8}[source]
You can run "less -X" for that, but it may have other problems depending on how you use less (e.g. scrolling up, etc.)