Dealing with some minified json, switching to iTerm, doing `pbpaste | json_pp | pbcopy` and having a clean output is _so_ nice.
Dealing with some minified json, switching to iTerm, doing `pbpaste | json_pp | pbcopy` and having a clean output is _so_ nice.
It’s so ingrained, I’m more likely than not to just write it out that way even when I know exactly what I’m doing from the onset.
< foo.json jq | pbcopy
Instead, they open a file descriptor and pass that.
Tiny difference but there you go.
e.g.
I need to grab some info from textfile.txt to use as arguments to a function.
cat textfile.txt
looks like its comma delimited.
cat textfile.txt | cut -d, -f 2-5
ah, its the third and fourth column i need
cat textfile.txt | cut -d, -f 3-4 | grep '123456'
perfect
cat textfile.txt | cut -d, -f 3-4 | grep 123456 | tr , ' '
myfunc $(cat textfile.txt | cut -d, -f 3-4 | grep 123456 | tr , ' ')
When you run "cmd < file", the command reads from stdin, which pulls directly from the file. When you do "cat file | cmd", "cat" opens the file, reads from there, and writes to a pipe. Then "cmd" reads from its stdin, which is a pipe.
And ... ?
In short, I straight up don't care.
lot's of times we sort of know what we are working with, but don't remember the particulars especially
> but looking up the arguments and their ordering is annoying
you seem to be arguing for complacency. taking your idea to an extreme, why learn to do _anything_ well?
I still uselessly use cat though, it's such a nice way to build a pipeline.
But I honestly think people who try to optimise away ‘cat’ are optimising the wrong thing. If one extra fork() is that detrimental then don’t use a shell scripting language.
For a lot of people, “useless” ‘cat’ enables them to write a pipeline in the order that their brain farts out the requirements for the pipeline. So they’ve optimised for human productivity. And given the human brain is slower than a few extra fork()s, I think optimising for one’s brain makes more sense here.
Another sometimes-important difference is that if there are multiple input files, `somecommand file1 file2 file3` can tell what data is coming from which file; with `cat file1 file2 file3 | somecommand` they're all mashed together, and the program has no idea what's coming from where.
In general, though, I think it's mostly a matter of people's expertise level in using the shell. If you're a beginner, it makes sense to learn one very general way to do things (`cat |`), and use it everywhere. But as you gain expertise, you learn other ways of doing it, and will choose the best method for each specific situation. While `cat |` is usually an ok method to read from a file, it's almost never the best method, so expert shell users will almost never use it.
[1] http://redsymbol.net/articles/unofficial-bash-strict-mode/
Many of my programs and scripts start output with the line: # cmd arg1 arg2 arg3 ...
and simply echo back lines that start with '#'. That way, I have an internal record of the program that was run and the data file that was read (as well as previous parts of the analysis chain).
And, 'R' ignores lines starting with '#', so the record is there, but does not affect later analyses.
Are you sure? https://unix.stackexchange.com/questions/208615/is-cat-a-she... disagrees and neither https://manpages.ubuntu.com/manpages/jammy/man7/bash-builtin... nor https://zsh.sourceforge.io/Doc/Release/Shell-Builtin-Command... mention it
> It’s just not the same syntax (ie ‘< filename’).
Reading from a file isn’t a hard problem. Having a good UX for doing that is where most shells fall apart. And that’s basically what ‘cat’ offers here: an improved UX.
Having ‘cat’ as a shell builtin wouldn’t really solve the complaints raised by “useless use of” anyway because you’d still be piping (and in some cases, fork()ing too). You couldnt really use ‘cat’ as syntactic sugar for ‘<‘ because things start to get really weird if you want to pass flags to ‘cat’ or even redirect the output to something other than a pipe. And given ‘cat’ is POSIX (https://en.m.wikipedia.org/wiki/List_of_Unix_commands#/media...) the current behaviour of shells is, in my opinion, correct. This is why my own shell has a differently named builtin that approximately serves the purpose of ‘cat’ but for instances when you need the command built into the shell and it can’t just be passing a file handle to the next command (in my case, because i wanted to pass metadata out-of-band as well as the file contents)
Even the Wikipedia page on cat has a section about that, titled eponymously.
copy_file_range allows a user land program to copy data between two files without doing any user space work. Instead of reading data into a buffer and writing it back out to the destination, the kernel will somehow manage to move the data for you.
I think this will prevent any extra copies from occurring in situations where it can be used.
https://man.archlinux.org/man/copy_file_range.2
https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/cat...