Why doesn't shell automatically fix “useless use of cat”? [on hold]
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
Many people use oneliners and scripts containing code along the lines
cat "$MYFILE" | command1 | command2 > "$OUTPUT"
The first cat
is often called "useless use of cat" because technically it requires starting a new process (often /usr/bin/cat
) where this could be avoided if the command had been
< "$MYFILE" command1 | command2 > "$OUTPUT"
because then shell only needs to start command1
and simply point its stdin
to the given file.
Why doesn't the shell do this conversion automatically? I feel that the "useless use of cat" syntax is easier to read and shell should have enough information to get rid of useless cat automatically. The cat
is defined in POSIX standard so shell should be allowed to implement it internally instead of using a binary in path. The shell could even contain implementation only for exactly one argument version and fallback to binary in path.
shell-script performance posix
put on hold as primarily opinion-based by muru, Mr Shunz, Dmitry Grigoryev, X Tian, Jeff Schaller♦ Apr 13 at 2:46
Many good questions generate some degree of opinion based on expert experience, but answers to this question will tend to be almost entirely based on opinions, rather than facts, references, or specific expertise. If this question can be reworded to fit the rules in the help center, please edit the question.
|
show 21 more comments
Many people use oneliners and scripts containing code along the lines
cat "$MYFILE" | command1 | command2 > "$OUTPUT"
The first cat
is often called "useless use of cat" because technically it requires starting a new process (often /usr/bin/cat
) where this could be avoided if the command had been
< "$MYFILE" command1 | command2 > "$OUTPUT"
because then shell only needs to start command1
and simply point its stdin
to the given file.
Why doesn't the shell do this conversion automatically? I feel that the "useless use of cat" syntax is easier to read and shell should have enough information to get rid of useless cat automatically. The cat
is defined in POSIX standard so shell should be allowed to implement it internally instead of using a binary in path. The shell could even contain implementation only for exactly one argument version and fallback to binary in path.
shell-script performance posix
put on hold as primarily opinion-based by muru, Mr Shunz, Dmitry Grigoryev, X Tian, Jeff Schaller♦ Apr 13 at 2:46
Many good questions generate some degree of opinion based on expert experience, but answers to this question will tend to be almost entirely based on opinions, rather than facts, references, or specific expertise. If this question can be reworded to fit the rules in the help center, please edit the question.
22
Those commands are not actually equivalent, since in one case stdin is a file, and in the other it's a pipe, so it wouldn't be a strictly safe conversion. You could make a system that did it, though.
– Michael Homer
Apr 11 at 7:32
14
That you can't imagine a use case doesn't mean that an application isn't allowed to rely on the specified behaviour uselessly. Getting an error fromlseek
is still defined behaviour and could cause a different outcome, the different blocking behaviour can be semantically meaningful, etc. It would be allowable to make the change if you knew what the other commands were and knew they didn't care, or if you just didn't care about compatibility at that level, but the benefit is pretty small. I do imagine the lack of benefit drives the situation more than the conformance cost.
– Michael Homer
Apr 11 at 7:57
3
The shell absolutely is allowed to implementcat
itself, though, or any other utility. It's also allowed to know how the other utilities that belong to the system work (e.g. it can know how the externalgrep
implementation that came with the system behaves). This is completely viable to do, so it's entirely fair to wonder why they don't.
– Michael Homer
Apr 11 at 8:04
6
@MichaelHomer e.g. it can know how the external grep implementation that came with the system behaves So the shell now has a dependency on the behavior ofgrep
. Andsed
. Andawk
. Anddu
. And how many hundreds if not thousands of other utilities?
– Andrew Henle
Apr 11 at 11:01
19
It would be pretty uncool of my shell to edit my commands for me.
– Azor Ahai
Apr 11 at 16:35
|
show 21 more comments
Many people use oneliners and scripts containing code along the lines
cat "$MYFILE" | command1 | command2 > "$OUTPUT"
The first cat
is often called "useless use of cat" because technically it requires starting a new process (often /usr/bin/cat
) where this could be avoided if the command had been
< "$MYFILE" command1 | command2 > "$OUTPUT"
because then shell only needs to start command1
and simply point its stdin
to the given file.
Why doesn't the shell do this conversion automatically? I feel that the "useless use of cat" syntax is easier to read and shell should have enough information to get rid of useless cat automatically. The cat
is defined in POSIX standard so shell should be allowed to implement it internally instead of using a binary in path. The shell could even contain implementation only for exactly one argument version and fallback to binary in path.
shell-script performance posix
Many people use oneliners and scripts containing code along the lines
cat "$MYFILE" | command1 | command2 > "$OUTPUT"
The first cat
is often called "useless use of cat" because technically it requires starting a new process (often /usr/bin/cat
) where this could be avoided if the command had been
< "$MYFILE" command1 | command2 > "$OUTPUT"
because then shell only needs to start command1
and simply point its stdin
to the given file.
Why doesn't the shell do this conversion automatically? I feel that the "useless use of cat" syntax is easier to read and shell should have enough information to get rid of useless cat automatically. The cat
is defined in POSIX standard so shell should be allowed to implement it internally instead of using a binary in path. The shell could even contain implementation only for exactly one argument version and fallback to binary in path.
shell-script performance posix
shell-script performance posix
asked Apr 11 at 7:25
Mikko RantalainenMikko Rantalainen
1,7301119
1,7301119
put on hold as primarily opinion-based by muru, Mr Shunz, Dmitry Grigoryev, X Tian, Jeff Schaller♦ Apr 13 at 2:46
Many good questions generate some degree of opinion based on expert experience, but answers to this question will tend to be almost entirely based on opinions, rather than facts, references, or specific expertise. If this question can be reworded to fit the rules in the help center, please edit the question.
put on hold as primarily opinion-based by muru, Mr Shunz, Dmitry Grigoryev, X Tian, Jeff Schaller♦ Apr 13 at 2:46
Many good questions generate some degree of opinion based on expert experience, but answers to this question will tend to be almost entirely based on opinions, rather than facts, references, or specific expertise. If this question can be reworded to fit the rules in the help center, please edit the question.
22
Those commands are not actually equivalent, since in one case stdin is a file, and in the other it's a pipe, so it wouldn't be a strictly safe conversion. You could make a system that did it, though.
– Michael Homer
Apr 11 at 7:32
14
That you can't imagine a use case doesn't mean that an application isn't allowed to rely on the specified behaviour uselessly. Getting an error fromlseek
is still defined behaviour and could cause a different outcome, the different blocking behaviour can be semantically meaningful, etc. It would be allowable to make the change if you knew what the other commands were and knew they didn't care, or if you just didn't care about compatibility at that level, but the benefit is pretty small. I do imagine the lack of benefit drives the situation more than the conformance cost.
– Michael Homer
Apr 11 at 7:57
3
The shell absolutely is allowed to implementcat
itself, though, or any other utility. It's also allowed to know how the other utilities that belong to the system work (e.g. it can know how the externalgrep
implementation that came with the system behaves). This is completely viable to do, so it's entirely fair to wonder why they don't.
– Michael Homer
Apr 11 at 8:04
6
@MichaelHomer e.g. it can know how the external grep implementation that came with the system behaves So the shell now has a dependency on the behavior ofgrep
. Andsed
. Andawk
. Anddu
. And how many hundreds if not thousands of other utilities?
– Andrew Henle
Apr 11 at 11:01
19
It would be pretty uncool of my shell to edit my commands for me.
– Azor Ahai
Apr 11 at 16:35
|
show 21 more comments
22
Those commands are not actually equivalent, since in one case stdin is a file, and in the other it's a pipe, so it wouldn't be a strictly safe conversion. You could make a system that did it, though.
– Michael Homer
Apr 11 at 7:32
14
That you can't imagine a use case doesn't mean that an application isn't allowed to rely on the specified behaviour uselessly. Getting an error fromlseek
is still defined behaviour and could cause a different outcome, the different blocking behaviour can be semantically meaningful, etc. It would be allowable to make the change if you knew what the other commands were and knew they didn't care, or if you just didn't care about compatibility at that level, but the benefit is pretty small. I do imagine the lack of benefit drives the situation more than the conformance cost.
– Michael Homer
Apr 11 at 7:57
3
The shell absolutely is allowed to implementcat
itself, though, or any other utility. It's also allowed to know how the other utilities that belong to the system work (e.g. it can know how the externalgrep
implementation that came with the system behaves). This is completely viable to do, so it's entirely fair to wonder why they don't.
– Michael Homer
Apr 11 at 8:04
6
@MichaelHomer e.g. it can know how the external grep implementation that came with the system behaves So the shell now has a dependency on the behavior ofgrep
. Andsed
. Andawk
. Anddu
. And how many hundreds if not thousands of other utilities?
– Andrew Henle
Apr 11 at 11:01
19
It would be pretty uncool of my shell to edit my commands for me.
– Azor Ahai
Apr 11 at 16:35
22
22
Those commands are not actually equivalent, since in one case stdin is a file, and in the other it's a pipe, so it wouldn't be a strictly safe conversion. You could make a system that did it, though.
– Michael Homer
Apr 11 at 7:32
Those commands are not actually equivalent, since in one case stdin is a file, and in the other it's a pipe, so it wouldn't be a strictly safe conversion. You could make a system that did it, though.
– Michael Homer
Apr 11 at 7:32
14
14
That you can't imagine a use case doesn't mean that an application isn't allowed to rely on the specified behaviour uselessly. Getting an error from
lseek
is still defined behaviour and could cause a different outcome, the different blocking behaviour can be semantically meaningful, etc. It would be allowable to make the change if you knew what the other commands were and knew they didn't care, or if you just didn't care about compatibility at that level, but the benefit is pretty small. I do imagine the lack of benefit drives the situation more than the conformance cost.– Michael Homer
Apr 11 at 7:57
That you can't imagine a use case doesn't mean that an application isn't allowed to rely on the specified behaviour uselessly. Getting an error from
lseek
is still defined behaviour and could cause a different outcome, the different blocking behaviour can be semantically meaningful, etc. It would be allowable to make the change if you knew what the other commands were and knew they didn't care, or if you just didn't care about compatibility at that level, but the benefit is pretty small. I do imagine the lack of benefit drives the situation more than the conformance cost.– Michael Homer
Apr 11 at 7:57
3
3
The shell absolutely is allowed to implement
cat
itself, though, or any other utility. It's also allowed to know how the other utilities that belong to the system work (e.g. it can know how the external grep
implementation that came with the system behaves). This is completely viable to do, so it's entirely fair to wonder why they don't.– Michael Homer
Apr 11 at 8:04
The shell absolutely is allowed to implement
cat
itself, though, or any other utility. It's also allowed to know how the other utilities that belong to the system work (e.g. it can know how the external grep
implementation that came with the system behaves). This is completely viable to do, so it's entirely fair to wonder why they don't.– Michael Homer
Apr 11 at 8:04
6
6
@MichaelHomer e.g. it can know how the external grep implementation that came with the system behaves So the shell now has a dependency on the behavior of
grep
. And sed
. And awk
. And du
. And how many hundreds if not thousands of other utilities?– Andrew Henle
Apr 11 at 11:01
@MichaelHomer e.g. it can know how the external grep implementation that came with the system behaves So the shell now has a dependency on the behavior of
grep
. And sed
. And awk
. And du
. And how many hundreds if not thousands of other utilities?– Andrew Henle
Apr 11 at 11:01
19
19
It would be pretty uncool of my shell to edit my commands for me.
– Azor Ahai
Apr 11 at 16:35
It would be pretty uncool of my shell to edit my commands for me.
– Azor Ahai
Apr 11 at 16:35
|
show 21 more comments
11 Answers
11
active
oldest
votes
The 2 commands are not equivalent: consider error handling:
cat <file that doesn't exist> | less
will produce an empty stream that will be passed to the piped program... as such you end up with a display showing nothing.
< <file that doesn't exist> less
will fail to open bar, and then not open less at all.
Attempting to change the former to the latter could break any number of scripts that expect to run the program with a potentially blank input.
New contributor
1
I'll mark your response as accepted because I think this is the most important difference between both syntaxes. The variant withcat
will always execute the second command in the pipeline whereas the variant with just input redirection will not execute the command at all if the input file is missing.
– Mikko Rantalainen
Apr 13 at 8:43
However, note that<"missing-file" grep foo | echo 2
will not executegrep
but will executeecho
.
– Mikko Rantalainen
12 hours ago
add a comment |
"Useless use of cat
" is more about how you write your code than about what actually runs when you execute the script. It's a sort of design anti-pattern, a way of going about something that could probably be done in a more efficient manner. It's a failure in understanding of how to best combine the given tools to create a new tool. I'd argue that stringing several sed
and/or awk
commands together in a pipeline also could be said to be a symptom of this same anti-pattern.
Fixing instances of "useless use of cat
" in a script is a primarily matter of fixing the source code of the script manually. A tool such as ShellCheck can help with this by pointing them out:
$ cat script.sh
#!/bin/sh
cat file | cat
$ shellcheck script.sh
In script.sh line 2:
cat file | cat
^-- SC2002: Useless cat. Consider 'cmd < file | ..' or 'cmd file | ..' instead.
Getting the shell to do this automatically would be difficult due to the nature of shell scripts. The way a script executes depends on the environment inherited from its parent process, and on the specific implementation of the available external commands.
The shell does not necessarily know what cat
is. It could potentially be any command from anywhere in your $PATH
, or a function.
If it was a built-in command (which it may be in some shells), it would have the ability to reorganise the pipeline as it would know of the semantics of its built-in cat
command. Before doing that, it would additionally have to make assumptions about the next command in the pipeline, after the original cat
.
Note that reading from standard input behaves slightly differently when it's connected to a pipe and when it's connected to a file. A pipe is not seekable, so depending on what the next command in the pipeline does, it may or may not behave differently if the pipeline was rearranged (it may detect whether the input is seekable and decide to do things differently if it is or if it isn't, in any case it would then behave differently).
This question is similar (in a very general sense) to "Are there any compilers that attempt to fix syntax errors on their own?" (at the Software Engineering StackExchange site), although that question is obviously about syntax errors, not useless design patterns. The idea about automatically changing the code based on intent is largely the same though.
It's perfectly conformant for a shell to know whatcat
is, and the other commands in the pipeline, (the as-if rule) and behave accordingly, they just don't here because it's pointless and too hard.
– Michael Homer
Apr 11 at 7:40
4
@MichaelHomer Yes. But it's also allowed to overload a standard command with a function of the same name.
– Kusalananda♦
Apr 11 at 7:42
2
@PhilipCouling It’s absolutely conformant as long as it’s known that none of the pipeline commands care. The shell is specifically allowed to replace utilities with builtins or shell functions and those have no execution environment restrictions, so as long as the external result is indistinguishable it’s permitted. For your case,cat /dev/tty
is the interesting one that would be different with<
.
– Michael Homer
Apr 11 at 9:28
1
@MichaelHomer so as long as the external result is indistinguishable it’s permitted That means the behavior of the entire set of utilities optimized in such a manner can never change. That has to be the ultimate dependency hell.
– Andrew Henle
Apr 11 at 10:58
2
@MichaelHomer As the other comments said, of course it's perfectly comformant for the shell to know that given the OP's input it is impossible to tell what thecat
command actually does without executing it. For all you (and the shell) know, the OP has a commandcat
in her path which is an interactive cat simulation, "myfile" is just the stored game state, andcommand1
andcommand2
are postprocessing some statistics about the current playing session...
– alephzero
Apr 11 at 12:44
|
show 4 more comments
Because it's not useless.
In the case of cat file | cmd
, the fd 0
(stdin) of cmd
will be a pipe, and in the case of cmd <file
it may be a regular file, device, etc.
A pipe has different semantics from a regular file, and its semantics are not a subset of those of a regular file:
a regular file cannot be
select(2)
ed orpoll(2)
ed on in a meaningful way; aselect(2)
on it will always return "ready". Advanced interfaces likeepoll(2)
on Linux will simply not work with regular files.on Linux there are system calls (
splice(2)
,vmsplice(2)
,tee(2)
) which only work on pipes [1]
Since cat
is so much used, it could be implemented as a shell built-in which will avoid an extra process, but once you started on that path, the same thing could be done with most commands -- transforming the shell into a slower & clunkier perl
or python
. it's probably better to write another scripting language with an easy to use pipe-like syntax for continuations instead ;-)
[1] If you want a simple example not made up for the occasion, you can look at my "exec binary from stdin" git gist with some explanations in the comment here. Implementing cat
inside it in order to make it work without UUoC would have made it 2 or 3 times bigger.
2
In fact, ksh93 does implement some external commands likecat
internally.
– jrw32982
Apr 11 at 19:40
3
cat /dev/urandom | cpu_bound_program
runs theread()
system calls in a separate process. On Linux for example, the actual CPU work of generating more random numbers (when the pool is empty) is done in that system call, so using a separate process lets you take advantage of a separate CPU core to generate random data as input. e.g. in What's the fastest way to generate a 1 GB text file containing random digits?
– Peter Cordes
Apr 12 at 1:00
4
More importantly for most cases, it meanslseek
won't work.cat foo.mp4 | mpv -
will work, but you can't seek backward further than mpv's or mplayer's cache buffer. But with input redirected from a file, you can.cat | mpv -
is one way to check if an MP4 has itsmoov
atom at the start of the file, so it can be played without seeking to the end and back (i.e. if it's suitable for streaming). It's easy to imagine other cases where you want to test a program for non-seekable files by running it on/dev/stdin
withcat
vs. a redirect.
– Peter Cordes
Apr 12 at 1:03
This is even more true when usingxargs cat | somecmd
. If file paths extend beyond the command buffer limit,xargs
can runcat
multiple times resulting in a continuous stream, while usingxargs somecmd
directly often fails becausesomecmd
cannot be run in multiples to achieve a seamless result.
– tasket
Apr 13 at 0:08
add a comment |
Because detecting useless cat is really really hard.
I had a shell script where I wrote
cat | (somecommand <<!
...
/proc/self/fd/3
...
!) 0<&3
The shell script failed in production if the cat
was removed because it was invoked via su -c 'script.sh' someuser
. The apparently superfluous cat
caused the owner of standard input to change to the user the script was running as so that reopening it via /proc
worked.
This case would be pretty easy because it clearly does not follow the simple model ofcat
followed by exactly one parameter so the shell should use realcat
executable instead of optimized shortcut. Good point on possibly different credentials or non-standard stdin for real processes, though.
– Mikko Rantalainen
Apr 13 at 8:47
add a comment |
tl;dr: Shells don't do it automatically because the costs exceed the likely benefits.
Other answers have pointed out the technical difference between stdin being a pipe and it being a file. Keeping that in mind, the shell could do one of:
- Implement
cat
as a builtin, still preserving the file v. pipe distinction. This would save the cost of an exec and maybe, possibly, a fork. - Perform a full analysis of the pipeline with knowledge of the various commands used to see if file/pipe matters, then act based on that.
Next you have to consider the costs and benefits of each approach. The benefits are simple enough:
- In either case, avoid an exec (of
cat
) - In the second case, when redirect substitution is possible, avoidance of a fork.
- In cases where you have to use a pipe, it might be possible sometimes to avoid a fork/vfork, but often not. That's because the cat-equivalent needs to run at the same time as the rest of the pipeline.
So you save a little CPU time & memory, especially if you can avoid the fork. Of course, you only save this time & memory when the feature is actually used. And you're only really saving the fork/exec time; with larger files, the time is mostly the I/O time (i.e., cat reading a file from disk). So you have to ask: how often is cat
used (uselessly) in shell scripts where the performance actually matters? Compare it to other common shell builtins like test
— it's hard to imagine cat
is used (uselessly) even a tenth as often as test
is used in places that matter. That's a guess, I haven't measured, which is something you'd want to do before any attempt at implementation. (Or similarly, asking someone else to implement in e.g., a feature request.)
Next you ask: what are the costs. The two costs that come to mind are (a) additional code in the shell, which increases its size (and thus possibly memory use), requires more maintenance work, is another spot for bugs, etc.; and (b) backwards compatibility surprises, POSIX cat
omits a lot of features of e.g., GNU coreutils cat
, so you'd have to be careful exactly what the cat
builtin would implement.
The additional builtin option probably isn't that bad — adding one more builtin where a bunch already exist. If you had profiling data showing it'd help, you could probably convince your favorite shell's authors to add it.
As for analyzing the pipeline, I don't think shells do anything like this currently (a few recognize the end of a pipeline and can avoid a fork). Essentially you'd be adding a (primitive) optimizer to the shell; optimizers often turn out to be complicated code and the source of a lot of bugs. And those bugs can be surprising — slight changes in the shell script could wind up avoiding or triggering the bug.
Postscript: You can apply a similar analysis to your useless uses of cat. Benefits: easier to read (though if command1 will take a file as an argument, probably not). Costs: extra fork and exec (and if command1 can take a file as an argument, probably more confusing error messages). If your analysis tells you to uselessly use cat, then go ahead.
add a comment |
The cat
command can accept -
as a marker for stdin. (POSIX, "If a file is '-', the cat utility shall read from the standard input at that point in the sequence.") This allows simple handling of a file or stdin where otherwise this would be disallowed.
Consider these two trivial alternatives, where the shell argument $1
is -
:
cat "$1" | nl # Works completely transparently
nl < "$1" # Fails with 'bash: -: No such file or directory'
Another time cat
is useful is where it's intentionally used as a no-op simply to maintain shell syntax:
file="$1"
reader=cat
[[ $file =~ .gz$ ]] && reader=zcat
[[ $file =~ .bz2$ ]] && reader=bzcat
"$reader" "$file"
Finally, I believe the only time that UUOC can really be correctly called out is when cat
is used with a filename that is known to be a regular file (i.e. not a device or named pipe), and that no flags are given to the command:
cat file.txt
In any other situation the oroperties of cat
itself may be required.
add a comment |
The cat command can do things that the shell can't necessarily do ( or at least, can't do easily). For example, suppose you want to print characters that might otherwise be invisible, such as tabs, carriage returns, or newlines. There *might* be a way to do so with only shell builtin commands, but I can't think of any off the top of my head. The GNU version of cat can do so with the -A
argument or the -v -E -T
arguments (IDK about other versions of cat, though). You could also prefix each line with a line number using -n
(again, IDK if non-GNU versions can do this).
Another advantage of cat is that it can easily read multiple files. To do so, one can simply type cat file1 file2 file3
. To do the same with a shell, things would get tricky, although a carefully-crafted loop could most likely achieve the same result. That said, do you really want to take the time to write such a loop, when such a simple alternative exists? I don't!
Reading files with cat would probably use less CPU than the shell would, since cat is a pre-compiled program (the obvious exception is any shell that has a builtin cat). When reading a large group of files, this might become apparent, but I have never done so on my machines, so I can't be sure.
The cat command can also be useful for forcing a command to accept standard input in instances it might not. Consider the following:
echo 8 | sleep
The number "8" will be not accepted by the "sleep" command, since it was never really meant to accept standard input. Thus, sleep will disregard that input, complain about a lack of arguments, and exit. However, if one types:
echo 8 | sleep $(cat)
Many shells will expand this to sleep 8
, and sleep will wait for 8 seconds before exiting. You can also do something similar with ssh:
command | ssh 1.2.3.4 'cat >> example-file'
This command with append example-file on the machine with the address of 1.2.3.4 with whatever is outputted from "command".
And that's (probably) just scratching the surface. I'm sure I could find more example of cat being useful if I wanted to, but this post is long enough as it is. So, I'll conclude by saying this: asking the shell to anticipate all of these scenarios (and several others) is not really feasible.
I would end the last sentence by "is not easily feasible"
– Basile Starynkevitch
Apr 13 at 5:55
add a comment |
Remember that a user could have a cat
in his $PATH
which is not exactly the POSIX cat
(but perhaps some variant which could log something somewhere). In that case, you don't want the shell to remove it.
The PATH
could change dynamically, and then cat
is not what you believe it is. It would be quite difficult to write a shell doing the optimization you dream of.
Also, in practice, cat
is a quite quick program. There are few practical reasons (except aesthetics) to avoid it.
See also the excellent Parsing POSIX [s]hell talk by Yann Regis-Gianas at FOSDEM2018. It gives other good reasons to avoid attempting doing what you dream of in a shell.
If performance was really an issue for shells, someone would have proposed a shell which uses sophisticated whole program compiler optimization, static source code analysis, and just-in-time compilation techniques (all these three domains have decades of progress and scientific publications and dedicated conferences, e.g. under SIGPLAN). Sadly, even as an interesting research topic, that is not currently funded by research agencies or venture capitalists, and I am deducing that it is simply not worth the effort. In other words, there is probably no significant market for optimizing shells. If you have half a million euro to spend on such research, you'll easily find someone to do it, and I believe it would give worthwhile results.
On a practical side, rewriting, to improve its performance, a small (un hundred lines) shell script in any better scripting language (Python, AWK, Guile, ...) is commonly done. And it is not reasonable (for many software engineering reasons) to write large shell scripts: when you are writing a shell script exceeding a hundred lines, you do need to consider rewriting it (even for readability and maintenance reasons) in some more suitable language: as a programming language the shell is a very poor one. However, there are many large generated shell scripts, and for good reasons (e.g. GNU autoconf generated configure
scripts).
Regarding huge textual files, passing them to cat
as a single argument is not good practice, and most sysadmins know that (when any shell script takes more than a minute to run, you begin considering optimizing it). For large gigabytes files, cat
is never the good tool to process them.
3
"Quite few practical reasons to avoid it" -- anyone who's waited forcat some-huge-log | tail -n 5
to run (wheretail -n 5 some-huge-log
could jump straight to the end, whereascat
reads only front-to-back) would disagree.
– Charles Duffy
Apr 12 at 22:22
Comment checks out ^cat
ing a large text file in tens of GB range ( which was created for testing ) takes kinda long time. Wouldn't recommend.
– Sergiy Kolodyazhnyy
Apr 13 at 1:03
1
BTW, re: "no significant market for optimizing shells" -- ksh93 is an optimizing shell, and a quite good one. It was, for a while, successfully sold as a commercial product. (Sadly, being commercially licensed also made it sufficiently niche that poorly-written clones and other less-capable-but-free-of-cost successors took over the world outside of those sites willing to pay for a license, leading to the situation we have today).
– Charles Duffy
Apr 13 at 18:36
(not using the specific techniques you note, but frankly, those techniques don't make sense given the process model; the techniques it does apply are, well, well applied and to good effect).
– Charles Duffy
Apr 13 at 18:42
add a comment |
Adding to @Kusalananda answer (and @alephzero comment), cat could be anything:
alias cat='gcc -c'
cat "$MYFILE" | command1 | command2 > "$OUTPUT"
or
echo 'echo 1' > /usr/bin/cat
cat "$MYFILE" | command1 | command2 > "$OUTPUT"
There is no reason that cat (on its own) or /usr/bin/cat on the system is actually cat the concatenate tool.
New contributor
3
Other than the behaviour ofcat
is defined by POSIX and so shouldn't be wildly different.
– roaima
Apr 11 at 14:13
2
@roaima:PATH=/home/Joshua/bin:$PATH cat ...
Are you sure you know whatcat
does now?
– Joshua
Apr 11 at 17:53
1
@Joshua it doesn't really matter. We both knowcat
can be overridden, but we also both know it shouldn't be wantonly replaced with something else. My comment points out that POSIX mandates a particular (subset of) behaviour that can reasonably be expected to exist. I have, at times, written a shell script that extends behaviour of a standard utility. In this case the shell script acted and behaved just like the tool it replaced, except that it had additional capabilities.
– roaima
Apr 11 at 21:12
@Joshua: On most platforms, shells know (or could know) which directories hold executables that implement POSIX commands. So you could just defer the substitution until after alias expansion and path resolution, and only do it for/bin/cat
. (And you'd make it an option you could turn off.) Or you'd makecat
a shell built-in (which maybe falls back to/bin/cat
for multiple args?) so users could control whether or not they wanted the external version the normal way, withenable cat
. Like forkill
. (I was thinking that bashcommand cat
would work, but that doesn't skip builtins)
– Peter Cordes
Apr 12 at 1:13
If you provide an alias, the shell will know thatcat
in that environment no longer refers to the usualcat
. Obviously, the optimization should be implemented after the aliases have been processed. I consider shell built-ins to represent commands in virtual directory that is always prepended to your path. If you want to avoid shell built-in version of any command (e.g.test
) you have to use a variant with a path.
– Mikko Rantalainen
Apr 13 at 8:52
add a comment |
Two "useless" uses for cat:
sort file.txt | cat header.txt - footer.txt | less
...here cat
is used to mix file and piped input.
find . -name '*.info' -type f | sh -c 'xargs cat' | sort
...here xargs
can accept a virtually infinite number of filenames and run cat
as many times as needed while making it all behave like one stream. So this works for large file lists where direct use of xargs sort
does not.
Both of these use cases would be trivially avoided by making shell built-in only step-in ifcat
is called with exactly one argument. Especially the case wheresh
is passed a string andxargs
will callcat
directly there's no way the shell could use it's built-in implementation.
– Mikko Rantalainen
Apr 13 at 8:55
add a comment |
Aside from other things, cat
-check would add additional performance overhead and confusion as to which use of cat
is actually useless, IMHO, because such checks can be inefficient and create problems with legitimate cat
usage.
When commands deal with the standard streams, they only have to care about reading/writing to the standard file descriptors. Commands can know if stdin is seekable/lseekable or not, which indicates a pipe or file.
If we add to the mix checking what process actually provides that stdin content, we will need to find the process on the other side of the pipe and apply appropriate optimization. This can be done in terms of shell itself, as shown in the SuperUser post by Kyle Jones, and in terms of shell that's
(find /proc -type l | xargs ls -l | fgrep 'pipe:[20043922]') 2>/dev/null
as shown in the linked post. This is 3 more commands (so extra fork()
s and exec()
s) and recursive traversals ( so whole lot of readdir()
calls ).
In terms of C and shell source code, the shell already knows the child process,so there's no need for recursion, but how do we know when to optimize and when cat
is actually useless ? There are in fact useful uses of cat, such as
# adding header and footer to file
( cmd; cat file; cmd ) | cmd
# tr command does not accept files as arguments
cat log1 log2 log3 | tr '[:upper:]' '[:lower:]'
It would probably be waste and unnecessary overhead to add such optimization to the shell. As Kusalanda's answer mentioned already, UUOC is more about user's own lack of understanding of how to best combine commands for best results.
add a comment |
11 Answers
11
active
oldest
votes
11 Answers
11
active
oldest
votes
active
oldest
votes
active
oldest
votes
The 2 commands are not equivalent: consider error handling:
cat <file that doesn't exist> | less
will produce an empty stream that will be passed to the piped program... as such you end up with a display showing nothing.
< <file that doesn't exist> less
will fail to open bar, and then not open less at all.
Attempting to change the former to the latter could break any number of scripts that expect to run the program with a potentially blank input.
New contributor
1
I'll mark your response as accepted because I think this is the most important difference between both syntaxes. The variant withcat
will always execute the second command in the pipeline whereas the variant with just input redirection will not execute the command at all if the input file is missing.
– Mikko Rantalainen
Apr 13 at 8:43
However, note that<"missing-file" grep foo | echo 2
will not executegrep
but will executeecho
.
– Mikko Rantalainen
12 hours ago
add a comment |
The 2 commands are not equivalent: consider error handling:
cat <file that doesn't exist> | less
will produce an empty stream that will be passed to the piped program... as such you end up with a display showing nothing.
< <file that doesn't exist> less
will fail to open bar, and then not open less at all.
Attempting to change the former to the latter could break any number of scripts that expect to run the program with a potentially blank input.
New contributor
1
I'll mark your response as accepted because I think this is the most important difference between both syntaxes. The variant withcat
will always execute the second command in the pipeline whereas the variant with just input redirection will not execute the command at all if the input file is missing.
– Mikko Rantalainen
Apr 13 at 8:43
However, note that<"missing-file" grep foo | echo 2
will not executegrep
but will executeecho
.
– Mikko Rantalainen
12 hours ago
add a comment |
The 2 commands are not equivalent: consider error handling:
cat <file that doesn't exist> | less
will produce an empty stream that will be passed to the piped program... as such you end up with a display showing nothing.
< <file that doesn't exist> less
will fail to open bar, and then not open less at all.
Attempting to change the former to the latter could break any number of scripts that expect to run the program with a potentially blank input.
New contributor
The 2 commands are not equivalent: consider error handling:
cat <file that doesn't exist> | less
will produce an empty stream that will be passed to the piped program... as such you end up with a display showing nothing.
< <file that doesn't exist> less
will fail to open bar, and then not open less at all.
Attempting to change the former to the latter could break any number of scripts that expect to run the program with a potentially blank input.
New contributor
New contributor
answered Apr 11 at 10:52
UKMonkeyUKMonkey
31614
31614
New contributor
New contributor
1
I'll mark your response as accepted because I think this is the most important difference between both syntaxes. The variant withcat
will always execute the second command in the pipeline whereas the variant with just input redirection will not execute the command at all if the input file is missing.
– Mikko Rantalainen
Apr 13 at 8:43
However, note that<"missing-file" grep foo | echo 2
will not executegrep
but will executeecho
.
– Mikko Rantalainen
12 hours ago
add a comment |
1
I'll mark your response as accepted because I think this is the most important difference between both syntaxes. The variant withcat
will always execute the second command in the pipeline whereas the variant with just input redirection will not execute the command at all if the input file is missing.
– Mikko Rantalainen
Apr 13 at 8:43
However, note that<"missing-file" grep foo | echo 2
will not executegrep
but will executeecho
.
– Mikko Rantalainen
12 hours ago
1
1
I'll mark your response as accepted because I think this is the most important difference between both syntaxes. The variant with
cat
will always execute the second command in the pipeline whereas the variant with just input redirection will not execute the command at all if the input file is missing.– Mikko Rantalainen
Apr 13 at 8:43
I'll mark your response as accepted because I think this is the most important difference between both syntaxes. The variant with
cat
will always execute the second command in the pipeline whereas the variant with just input redirection will not execute the command at all if the input file is missing.– Mikko Rantalainen
Apr 13 at 8:43
However, note that
<"missing-file" grep foo | echo 2
will not execute grep
but will execute echo
.– Mikko Rantalainen
12 hours ago
However, note that
<"missing-file" grep foo | echo 2
will not execute grep
but will execute echo
.– Mikko Rantalainen
12 hours ago
add a comment |
"Useless use of cat
" is more about how you write your code than about what actually runs when you execute the script. It's a sort of design anti-pattern, a way of going about something that could probably be done in a more efficient manner. It's a failure in understanding of how to best combine the given tools to create a new tool. I'd argue that stringing several sed
and/or awk
commands together in a pipeline also could be said to be a symptom of this same anti-pattern.
Fixing instances of "useless use of cat
" in a script is a primarily matter of fixing the source code of the script manually. A tool such as ShellCheck can help with this by pointing them out:
$ cat script.sh
#!/bin/sh
cat file | cat
$ shellcheck script.sh
In script.sh line 2:
cat file | cat
^-- SC2002: Useless cat. Consider 'cmd < file | ..' or 'cmd file | ..' instead.
Getting the shell to do this automatically would be difficult due to the nature of shell scripts. The way a script executes depends on the environment inherited from its parent process, and on the specific implementation of the available external commands.
The shell does not necessarily know what cat
is. It could potentially be any command from anywhere in your $PATH
, or a function.
If it was a built-in command (which it may be in some shells), it would have the ability to reorganise the pipeline as it would know of the semantics of its built-in cat
command. Before doing that, it would additionally have to make assumptions about the next command in the pipeline, after the original cat
.
Note that reading from standard input behaves slightly differently when it's connected to a pipe and when it's connected to a file. A pipe is not seekable, so depending on what the next command in the pipeline does, it may or may not behave differently if the pipeline was rearranged (it may detect whether the input is seekable and decide to do things differently if it is or if it isn't, in any case it would then behave differently).
This question is similar (in a very general sense) to "Are there any compilers that attempt to fix syntax errors on their own?" (at the Software Engineering StackExchange site), although that question is obviously about syntax errors, not useless design patterns. The idea about automatically changing the code based on intent is largely the same though.
It's perfectly conformant for a shell to know whatcat
is, and the other commands in the pipeline, (the as-if rule) and behave accordingly, they just don't here because it's pointless and too hard.
– Michael Homer
Apr 11 at 7:40
4
@MichaelHomer Yes. But it's also allowed to overload a standard command with a function of the same name.
– Kusalananda♦
Apr 11 at 7:42
2
@PhilipCouling It’s absolutely conformant as long as it’s known that none of the pipeline commands care. The shell is specifically allowed to replace utilities with builtins or shell functions and those have no execution environment restrictions, so as long as the external result is indistinguishable it’s permitted. For your case,cat /dev/tty
is the interesting one that would be different with<
.
– Michael Homer
Apr 11 at 9:28
1
@MichaelHomer so as long as the external result is indistinguishable it’s permitted That means the behavior of the entire set of utilities optimized in such a manner can never change. That has to be the ultimate dependency hell.
– Andrew Henle
Apr 11 at 10:58
2
@MichaelHomer As the other comments said, of course it's perfectly comformant for the shell to know that given the OP's input it is impossible to tell what thecat
command actually does without executing it. For all you (and the shell) know, the OP has a commandcat
in her path which is an interactive cat simulation, "myfile" is just the stored game state, andcommand1
andcommand2
are postprocessing some statistics about the current playing session...
– alephzero
Apr 11 at 12:44
|
show 4 more comments
"Useless use of cat
" is more about how you write your code than about what actually runs when you execute the script. It's a sort of design anti-pattern, a way of going about something that could probably be done in a more efficient manner. It's a failure in understanding of how to best combine the given tools to create a new tool. I'd argue that stringing several sed
and/or awk
commands together in a pipeline also could be said to be a symptom of this same anti-pattern.
Fixing instances of "useless use of cat
" in a script is a primarily matter of fixing the source code of the script manually. A tool such as ShellCheck can help with this by pointing them out:
$ cat script.sh
#!/bin/sh
cat file | cat
$ shellcheck script.sh
In script.sh line 2:
cat file | cat
^-- SC2002: Useless cat. Consider 'cmd < file | ..' or 'cmd file | ..' instead.
Getting the shell to do this automatically would be difficult due to the nature of shell scripts. The way a script executes depends on the environment inherited from its parent process, and on the specific implementation of the available external commands.
The shell does not necessarily know what cat
is. It could potentially be any command from anywhere in your $PATH
, or a function.
If it was a built-in command (which it may be in some shells), it would have the ability to reorganise the pipeline as it would know of the semantics of its built-in cat
command. Before doing that, it would additionally have to make assumptions about the next command in the pipeline, after the original cat
.
Note that reading from standard input behaves slightly differently when it's connected to a pipe and when it's connected to a file. A pipe is not seekable, so depending on what the next command in the pipeline does, it may or may not behave differently if the pipeline was rearranged (it may detect whether the input is seekable and decide to do things differently if it is or if it isn't, in any case it would then behave differently).
This question is similar (in a very general sense) to "Are there any compilers that attempt to fix syntax errors on their own?" (at the Software Engineering StackExchange site), although that question is obviously about syntax errors, not useless design patterns. The idea about automatically changing the code based on intent is largely the same though.
It's perfectly conformant for a shell to know whatcat
is, and the other commands in the pipeline, (the as-if rule) and behave accordingly, they just don't here because it's pointless and too hard.
– Michael Homer
Apr 11 at 7:40
4
@MichaelHomer Yes. But it's also allowed to overload a standard command with a function of the same name.
– Kusalananda♦
Apr 11 at 7:42
2
@PhilipCouling It’s absolutely conformant as long as it’s known that none of the pipeline commands care. The shell is specifically allowed to replace utilities with builtins or shell functions and those have no execution environment restrictions, so as long as the external result is indistinguishable it’s permitted. For your case,cat /dev/tty
is the interesting one that would be different with<
.
– Michael Homer
Apr 11 at 9:28
1
@MichaelHomer so as long as the external result is indistinguishable it’s permitted That means the behavior of the entire set of utilities optimized in such a manner can never change. That has to be the ultimate dependency hell.
– Andrew Henle
Apr 11 at 10:58
2
@MichaelHomer As the other comments said, of course it's perfectly comformant for the shell to know that given the OP's input it is impossible to tell what thecat
command actually does without executing it. For all you (and the shell) know, the OP has a commandcat
in her path which is an interactive cat simulation, "myfile" is just the stored game state, andcommand1
andcommand2
are postprocessing some statistics about the current playing session...
– alephzero
Apr 11 at 12:44
|
show 4 more comments
"Useless use of cat
" is more about how you write your code than about what actually runs when you execute the script. It's a sort of design anti-pattern, a way of going about something that could probably be done in a more efficient manner. It's a failure in understanding of how to best combine the given tools to create a new tool. I'd argue that stringing several sed
and/or awk
commands together in a pipeline also could be said to be a symptom of this same anti-pattern.
Fixing instances of "useless use of cat
" in a script is a primarily matter of fixing the source code of the script manually. A tool such as ShellCheck can help with this by pointing them out:
$ cat script.sh
#!/bin/sh
cat file | cat
$ shellcheck script.sh
In script.sh line 2:
cat file | cat
^-- SC2002: Useless cat. Consider 'cmd < file | ..' or 'cmd file | ..' instead.
Getting the shell to do this automatically would be difficult due to the nature of shell scripts. The way a script executes depends on the environment inherited from its parent process, and on the specific implementation of the available external commands.
The shell does not necessarily know what cat
is. It could potentially be any command from anywhere in your $PATH
, or a function.
If it was a built-in command (which it may be in some shells), it would have the ability to reorganise the pipeline as it would know of the semantics of its built-in cat
command. Before doing that, it would additionally have to make assumptions about the next command in the pipeline, after the original cat
.
Note that reading from standard input behaves slightly differently when it's connected to a pipe and when it's connected to a file. A pipe is not seekable, so depending on what the next command in the pipeline does, it may or may not behave differently if the pipeline was rearranged (it may detect whether the input is seekable and decide to do things differently if it is or if it isn't, in any case it would then behave differently).
This question is similar (in a very general sense) to "Are there any compilers that attempt to fix syntax errors on their own?" (at the Software Engineering StackExchange site), although that question is obviously about syntax errors, not useless design patterns. The idea about automatically changing the code based on intent is largely the same though.
"Useless use of cat
" is more about how you write your code than about what actually runs when you execute the script. It's a sort of design anti-pattern, a way of going about something that could probably be done in a more efficient manner. It's a failure in understanding of how to best combine the given tools to create a new tool. I'd argue that stringing several sed
and/or awk
commands together in a pipeline also could be said to be a symptom of this same anti-pattern.
Fixing instances of "useless use of cat
" in a script is a primarily matter of fixing the source code of the script manually. A tool such as ShellCheck can help with this by pointing them out:
$ cat script.sh
#!/bin/sh
cat file | cat
$ shellcheck script.sh
In script.sh line 2:
cat file | cat
^-- SC2002: Useless cat. Consider 'cmd < file | ..' or 'cmd file | ..' instead.
Getting the shell to do this automatically would be difficult due to the nature of shell scripts. The way a script executes depends on the environment inherited from its parent process, and on the specific implementation of the available external commands.
The shell does not necessarily know what cat
is. It could potentially be any command from anywhere in your $PATH
, or a function.
If it was a built-in command (which it may be in some shells), it would have the ability to reorganise the pipeline as it would know of the semantics of its built-in cat
command. Before doing that, it would additionally have to make assumptions about the next command in the pipeline, after the original cat
.
Note that reading from standard input behaves slightly differently when it's connected to a pipe and when it's connected to a file. A pipe is not seekable, so depending on what the next command in the pipeline does, it may or may not behave differently if the pipeline was rearranged (it may detect whether the input is seekable and decide to do things differently if it is or if it isn't, in any case it would then behave differently).
This question is similar (in a very general sense) to "Are there any compilers that attempt to fix syntax errors on their own?" (at the Software Engineering StackExchange site), although that question is obviously about syntax errors, not useless design patterns. The idea about automatically changing the code based on intent is largely the same though.
edited Apr 11 at 20:52
answered Apr 11 at 7:36
Kusalananda♦Kusalananda
142k18266441
142k18266441
It's perfectly conformant for a shell to know whatcat
is, and the other commands in the pipeline, (the as-if rule) and behave accordingly, they just don't here because it's pointless and too hard.
– Michael Homer
Apr 11 at 7:40
4
@MichaelHomer Yes. But it's also allowed to overload a standard command with a function of the same name.
– Kusalananda♦
Apr 11 at 7:42
2
@PhilipCouling It’s absolutely conformant as long as it’s known that none of the pipeline commands care. The shell is specifically allowed to replace utilities with builtins or shell functions and those have no execution environment restrictions, so as long as the external result is indistinguishable it’s permitted. For your case,cat /dev/tty
is the interesting one that would be different with<
.
– Michael Homer
Apr 11 at 9:28
1
@MichaelHomer so as long as the external result is indistinguishable it’s permitted That means the behavior of the entire set of utilities optimized in such a manner can never change. That has to be the ultimate dependency hell.
– Andrew Henle
Apr 11 at 10:58
2
@MichaelHomer As the other comments said, of course it's perfectly comformant for the shell to know that given the OP's input it is impossible to tell what thecat
command actually does without executing it. For all you (and the shell) know, the OP has a commandcat
in her path which is an interactive cat simulation, "myfile" is just the stored game state, andcommand1
andcommand2
are postprocessing some statistics about the current playing session...
– alephzero
Apr 11 at 12:44
|
show 4 more comments
It's perfectly conformant for a shell to know whatcat
is, and the other commands in the pipeline, (the as-if rule) and behave accordingly, they just don't here because it's pointless and too hard.
– Michael Homer
Apr 11 at 7:40
4
@MichaelHomer Yes. But it's also allowed to overload a standard command with a function of the same name.
– Kusalananda♦
Apr 11 at 7:42
2
@PhilipCouling It’s absolutely conformant as long as it’s known that none of the pipeline commands care. The shell is specifically allowed to replace utilities with builtins or shell functions and those have no execution environment restrictions, so as long as the external result is indistinguishable it’s permitted. For your case,cat /dev/tty
is the interesting one that would be different with<
.
– Michael Homer
Apr 11 at 9:28
1
@MichaelHomer so as long as the external result is indistinguishable it’s permitted That means the behavior of the entire set of utilities optimized in such a manner can never change. That has to be the ultimate dependency hell.
– Andrew Henle
Apr 11 at 10:58
2
@MichaelHomer As the other comments said, of course it's perfectly comformant for the shell to know that given the OP's input it is impossible to tell what thecat
command actually does without executing it. For all you (and the shell) know, the OP has a commandcat
in her path which is an interactive cat simulation, "myfile" is just the stored game state, andcommand1
andcommand2
are postprocessing some statistics about the current playing session...
– alephzero
Apr 11 at 12:44
It's perfectly conformant for a shell to know what
cat
is, and the other commands in the pipeline, (the as-if rule) and behave accordingly, they just don't here because it's pointless and too hard.– Michael Homer
Apr 11 at 7:40
It's perfectly conformant for a shell to know what
cat
is, and the other commands in the pipeline, (the as-if rule) and behave accordingly, they just don't here because it's pointless and too hard.– Michael Homer
Apr 11 at 7:40
4
4
@MichaelHomer Yes. But it's also allowed to overload a standard command with a function of the same name.
– Kusalananda♦
Apr 11 at 7:42
@MichaelHomer Yes. But it's also allowed to overload a standard command with a function of the same name.
– Kusalananda♦
Apr 11 at 7:42
2
2
@PhilipCouling It’s absolutely conformant as long as it’s known that none of the pipeline commands care. The shell is specifically allowed to replace utilities with builtins or shell functions and those have no execution environment restrictions, so as long as the external result is indistinguishable it’s permitted. For your case,
cat /dev/tty
is the interesting one that would be different with <
.– Michael Homer
Apr 11 at 9:28
@PhilipCouling It’s absolutely conformant as long as it’s known that none of the pipeline commands care. The shell is specifically allowed to replace utilities with builtins or shell functions and those have no execution environment restrictions, so as long as the external result is indistinguishable it’s permitted. For your case,
cat /dev/tty
is the interesting one that would be different with <
.– Michael Homer
Apr 11 at 9:28
1
1
@MichaelHomer so as long as the external result is indistinguishable it’s permitted That means the behavior of the entire set of utilities optimized in such a manner can never change. That has to be the ultimate dependency hell.
– Andrew Henle
Apr 11 at 10:58
@MichaelHomer so as long as the external result is indistinguishable it’s permitted That means the behavior of the entire set of utilities optimized in such a manner can never change. That has to be the ultimate dependency hell.
– Andrew Henle
Apr 11 at 10:58
2
2
@MichaelHomer As the other comments said, of course it's perfectly comformant for the shell to know that given the OP's input it is impossible to tell what the
cat
command actually does without executing it. For all you (and the shell) know, the OP has a command cat
in her path which is an interactive cat simulation, "myfile" is just the stored game state, and command1
and command2
are postprocessing some statistics about the current playing session...– alephzero
Apr 11 at 12:44
@MichaelHomer As the other comments said, of course it's perfectly comformant for the shell to know that given the OP's input it is impossible to tell what the
cat
command actually does without executing it. For all you (and the shell) know, the OP has a command cat
in her path which is an interactive cat simulation, "myfile" is just the stored game state, and command1
and command2
are postprocessing some statistics about the current playing session...– alephzero
Apr 11 at 12:44
|
show 4 more comments
Because it's not useless.
In the case of cat file | cmd
, the fd 0
(stdin) of cmd
will be a pipe, and in the case of cmd <file
it may be a regular file, device, etc.
A pipe has different semantics from a regular file, and its semantics are not a subset of those of a regular file:
a regular file cannot be
select(2)
ed orpoll(2)
ed on in a meaningful way; aselect(2)
on it will always return "ready". Advanced interfaces likeepoll(2)
on Linux will simply not work with regular files.on Linux there are system calls (
splice(2)
,vmsplice(2)
,tee(2)
) which only work on pipes [1]
Since cat
is so much used, it could be implemented as a shell built-in which will avoid an extra process, but once you started on that path, the same thing could be done with most commands -- transforming the shell into a slower & clunkier perl
or python
. it's probably better to write another scripting language with an easy to use pipe-like syntax for continuations instead ;-)
[1] If you want a simple example not made up for the occasion, you can look at my "exec binary from stdin" git gist with some explanations in the comment here. Implementing cat
inside it in order to make it work without UUoC would have made it 2 or 3 times bigger.
2
In fact, ksh93 does implement some external commands likecat
internally.
– jrw32982
Apr 11 at 19:40
3
cat /dev/urandom | cpu_bound_program
runs theread()
system calls in a separate process. On Linux for example, the actual CPU work of generating more random numbers (when the pool is empty) is done in that system call, so using a separate process lets you take advantage of a separate CPU core to generate random data as input. e.g. in What's the fastest way to generate a 1 GB text file containing random digits?
– Peter Cordes
Apr 12 at 1:00
4
More importantly for most cases, it meanslseek
won't work.cat foo.mp4 | mpv -
will work, but you can't seek backward further than mpv's or mplayer's cache buffer. But with input redirected from a file, you can.cat | mpv -
is one way to check if an MP4 has itsmoov
atom at the start of the file, so it can be played without seeking to the end and back (i.e. if it's suitable for streaming). It's easy to imagine other cases where you want to test a program for non-seekable files by running it on/dev/stdin
withcat
vs. a redirect.
– Peter Cordes
Apr 12 at 1:03
This is even more true when usingxargs cat | somecmd
. If file paths extend beyond the command buffer limit,xargs
can runcat
multiple times resulting in a continuous stream, while usingxargs somecmd
directly often fails becausesomecmd
cannot be run in multiples to achieve a seamless result.
– tasket
Apr 13 at 0:08
add a comment |
Because it's not useless.
In the case of cat file | cmd
, the fd 0
(stdin) of cmd
will be a pipe, and in the case of cmd <file
it may be a regular file, device, etc.
A pipe has different semantics from a regular file, and its semantics are not a subset of those of a regular file:
a regular file cannot be
select(2)
ed orpoll(2)
ed on in a meaningful way; aselect(2)
on it will always return "ready". Advanced interfaces likeepoll(2)
on Linux will simply not work with regular files.on Linux there are system calls (
splice(2)
,vmsplice(2)
,tee(2)
) which only work on pipes [1]
Since cat
is so much used, it could be implemented as a shell built-in which will avoid an extra process, but once you started on that path, the same thing could be done with most commands -- transforming the shell into a slower & clunkier perl
or python
. it's probably better to write another scripting language with an easy to use pipe-like syntax for continuations instead ;-)
[1] If you want a simple example not made up for the occasion, you can look at my "exec binary from stdin" git gist with some explanations in the comment here. Implementing cat
inside it in order to make it work without UUoC would have made it 2 or 3 times bigger.
2
In fact, ksh93 does implement some external commands likecat
internally.
– jrw32982
Apr 11 at 19:40
3
cat /dev/urandom | cpu_bound_program
runs theread()
system calls in a separate process. On Linux for example, the actual CPU work of generating more random numbers (when the pool is empty) is done in that system call, so using a separate process lets you take advantage of a separate CPU core to generate random data as input. e.g. in What's the fastest way to generate a 1 GB text file containing random digits?
– Peter Cordes
Apr 12 at 1:00
4
More importantly for most cases, it meanslseek
won't work.cat foo.mp4 | mpv -
will work, but you can't seek backward further than mpv's or mplayer's cache buffer. But with input redirected from a file, you can.cat | mpv -
is one way to check if an MP4 has itsmoov
atom at the start of the file, so it can be played without seeking to the end and back (i.e. if it's suitable for streaming). It's easy to imagine other cases where you want to test a program for non-seekable files by running it on/dev/stdin
withcat
vs. a redirect.
– Peter Cordes
Apr 12 at 1:03
This is even more true when usingxargs cat | somecmd
. If file paths extend beyond the command buffer limit,xargs
can runcat
multiple times resulting in a continuous stream, while usingxargs somecmd
directly often fails becausesomecmd
cannot be run in multiples to achieve a seamless result.
– tasket
Apr 13 at 0:08
add a comment |
Because it's not useless.
In the case of cat file | cmd
, the fd 0
(stdin) of cmd
will be a pipe, and in the case of cmd <file
it may be a regular file, device, etc.
A pipe has different semantics from a regular file, and its semantics are not a subset of those of a regular file:
a regular file cannot be
select(2)
ed orpoll(2)
ed on in a meaningful way; aselect(2)
on it will always return "ready". Advanced interfaces likeepoll(2)
on Linux will simply not work with regular files.on Linux there are system calls (
splice(2)
,vmsplice(2)
,tee(2)
) which only work on pipes [1]
Since cat
is so much used, it could be implemented as a shell built-in which will avoid an extra process, but once you started on that path, the same thing could be done with most commands -- transforming the shell into a slower & clunkier perl
or python
. it's probably better to write another scripting language with an easy to use pipe-like syntax for continuations instead ;-)
[1] If you want a simple example not made up for the occasion, you can look at my "exec binary from stdin" git gist with some explanations in the comment here. Implementing cat
inside it in order to make it work without UUoC would have made it 2 or 3 times bigger.
Because it's not useless.
In the case of cat file | cmd
, the fd 0
(stdin) of cmd
will be a pipe, and in the case of cmd <file
it may be a regular file, device, etc.
A pipe has different semantics from a regular file, and its semantics are not a subset of those of a regular file:
a regular file cannot be
select(2)
ed orpoll(2)
ed on in a meaningful way; aselect(2)
on it will always return "ready". Advanced interfaces likeepoll(2)
on Linux will simply not work with regular files.on Linux there are system calls (
splice(2)
,vmsplice(2)
,tee(2)
) which only work on pipes [1]
Since cat
is so much used, it could be implemented as a shell built-in which will avoid an extra process, but once you started on that path, the same thing could be done with most commands -- transforming the shell into a slower & clunkier perl
or python
. it's probably better to write another scripting language with an easy to use pipe-like syntax for continuations instead ;-)
[1] If you want a simple example not made up for the occasion, you can look at my "exec binary from stdin" git gist with some explanations in the comment here. Implementing cat
inside it in order to make it work without UUoC would have made it 2 or 3 times bigger.
answered Apr 11 at 9:33
mosvymosvy
10.1k11237
10.1k11237
2
In fact, ksh93 does implement some external commands likecat
internally.
– jrw32982
Apr 11 at 19:40
3
cat /dev/urandom | cpu_bound_program
runs theread()
system calls in a separate process. On Linux for example, the actual CPU work of generating more random numbers (when the pool is empty) is done in that system call, so using a separate process lets you take advantage of a separate CPU core to generate random data as input. e.g. in What's the fastest way to generate a 1 GB text file containing random digits?
– Peter Cordes
Apr 12 at 1:00
4
More importantly for most cases, it meanslseek
won't work.cat foo.mp4 | mpv -
will work, but you can't seek backward further than mpv's or mplayer's cache buffer. But with input redirected from a file, you can.cat | mpv -
is one way to check if an MP4 has itsmoov
atom at the start of the file, so it can be played without seeking to the end and back (i.e. if it's suitable for streaming). It's easy to imagine other cases where you want to test a program for non-seekable files by running it on/dev/stdin
withcat
vs. a redirect.
– Peter Cordes
Apr 12 at 1:03
This is even more true when usingxargs cat | somecmd
. If file paths extend beyond the command buffer limit,xargs
can runcat
multiple times resulting in a continuous stream, while usingxargs somecmd
directly often fails becausesomecmd
cannot be run in multiples to achieve a seamless result.
– tasket
Apr 13 at 0:08
add a comment |
2
In fact, ksh93 does implement some external commands likecat
internally.
– jrw32982
Apr 11 at 19:40
3
cat /dev/urandom | cpu_bound_program
runs theread()
system calls in a separate process. On Linux for example, the actual CPU work of generating more random numbers (when the pool is empty) is done in that system call, so using a separate process lets you take advantage of a separate CPU core to generate random data as input. e.g. in What's the fastest way to generate a 1 GB text file containing random digits?
– Peter Cordes
Apr 12 at 1:00
4
More importantly for most cases, it meanslseek
won't work.cat foo.mp4 | mpv -
will work, but you can't seek backward further than mpv's or mplayer's cache buffer. But with input redirected from a file, you can.cat | mpv -
is one way to check if an MP4 has itsmoov
atom at the start of the file, so it can be played without seeking to the end and back (i.e. if it's suitable for streaming). It's easy to imagine other cases where you want to test a program for non-seekable files by running it on/dev/stdin
withcat
vs. a redirect.
– Peter Cordes
Apr 12 at 1:03
This is even more true when usingxargs cat | somecmd
. If file paths extend beyond the command buffer limit,xargs
can runcat
multiple times resulting in a continuous stream, while usingxargs somecmd
directly often fails becausesomecmd
cannot be run in multiples to achieve a seamless result.
– tasket
Apr 13 at 0:08
2
2
In fact, ksh93 does implement some external commands like
cat
internally.– jrw32982
Apr 11 at 19:40
In fact, ksh93 does implement some external commands like
cat
internally.– jrw32982
Apr 11 at 19:40
3
3
cat /dev/urandom | cpu_bound_program
runs the read()
system calls in a separate process. On Linux for example, the actual CPU work of generating more random numbers (when the pool is empty) is done in that system call, so using a separate process lets you take advantage of a separate CPU core to generate random data as input. e.g. in What's the fastest way to generate a 1 GB text file containing random digits?– Peter Cordes
Apr 12 at 1:00
cat /dev/urandom | cpu_bound_program
runs the read()
system calls in a separate process. On Linux for example, the actual CPU work of generating more random numbers (when the pool is empty) is done in that system call, so using a separate process lets you take advantage of a separate CPU core to generate random data as input. e.g. in What's the fastest way to generate a 1 GB text file containing random digits?– Peter Cordes
Apr 12 at 1:00
4
4
More importantly for most cases, it means
lseek
won't work. cat foo.mp4 | mpv -
will work, but you can't seek backward further than mpv's or mplayer's cache buffer. But with input redirected from a file, you can. cat | mpv -
is one way to check if an MP4 has its moov
atom at the start of the file, so it can be played without seeking to the end and back (i.e. if it's suitable for streaming). It's easy to imagine other cases where you want to test a program for non-seekable files by running it on /dev/stdin
with cat
vs. a redirect.– Peter Cordes
Apr 12 at 1:03
More importantly for most cases, it means
lseek
won't work. cat foo.mp4 | mpv -
will work, but you can't seek backward further than mpv's or mplayer's cache buffer. But with input redirected from a file, you can. cat | mpv -
is one way to check if an MP4 has its moov
atom at the start of the file, so it can be played without seeking to the end and back (i.e. if it's suitable for streaming). It's easy to imagine other cases where you want to test a program for non-seekable files by running it on /dev/stdin
with cat
vs. a redirect.– Peter Cordes
Apr 12 at 1:03
This is even more true when using
xargs cat | somecmd
. If file paths extend beyond the command buffer limit, xargs
can run cat
multiple times resulting in a continuous stream, while using xargs somecmd
directly often fails because somecmd
cannot be run in multiples to achieve a seamless result.– tasket
Apr 13 at 0:08
This is even more true when using
xargs cat | somecmd
. If file paths extend beyond the command buffer limit, xargs
can run cat
multiple times resulting in a continuous stream, while using xargs somecmd
directly often fails because somecmd
cannot be run in multiples to achieve a seamless result.– tasket
Apr 13 at 0:08
add a comment |
Because detecting useless cat is really really hard.
I had a shell script where I wrote
cat | (somecommand <<!
...
/proc/self/fd/3
...
!) 0<&3
The shell script failed in production if the cat
was removed because it was invoked via su -c 'script.sh' someuser
. The apparently superfluous cat
caused the owner of standard input to change to the user the script was running as so that reopening it via /proc
worked.
This case would be pretty easy because it clearly does not follow the simple model ofcat
followed by exactly one parameter so the shell should use realcat
executable instead of optimized shortcut. Good point on possibly different credentials or non-standard stdin for real processes, though.
– Mikko Rantalainen
Apr 13 at 8:47
add a comment |
Because detecting useless cat is really really hard.
I had a shell script where I wrote
cat | (somecommand <<!
...
/proc/self/fd/3
...
!) 0<&3
The shell script failed in production if the cat
was removed because it was invoked via su -c 'script.sh' someuser
. The apparently superfluous cat
caused the owner of standard input to change to the user the script was running as so that reopening it via /proc
worked.
This case would be pretty easy because it clearly does not follow the simple model ofcat
followed by exactly one parameter so the shell should use realcat
executable instead of optimized shortcut. Good point on possibly different credentials or non-standard stdin for real processes, though.
– Mikko Rantalainen
Apr 13 at 8:47
add a comment |
Because detecting useless cat is really really hard.
I had a shell script where I wrote
cat | (somecommand <<!
...
/proc/self/fd/3
...
!) 0<&3
The shell script failed in production if the cat
was removed because it was invoked via su -c 'script.sh' someuser
. The apparently superfluous cat
caused the owner of standard input to change to the user the script was running as so that reopening it via /proc
worked.
Because detecting useless cat is really really hard.
I had a shell script where I wrote
cat | (somecommand <<!
...
/proc/self/fd/3
...
!) 0<&3
The shell script failed in production if the cat
was removed because it was invoked via su -c 'script.sh' someuser
. The apparently superfluous cat
caused the owner of standard input to change to the user the script was running as so that reopening it via /proc
worked.
edited Apr 11 at 20:19
jlliagre
48k786138
48k786138
answered Apr 11 at 17:53
JoshuaJoshua
1,332817
1,332817
This case would be pretty easy because it clearly does not follow the simple model ofcat
followed by exactly one parameter so the shell should use realcat
executable instead of optimized shortcut. Good point on possibly different credentials or non-standard stdin for real processes, though.
– Mikko Rantalainen
Apr 13 at 8:47
add a comment |
This case would be pretty easy because it clearly does not follow the simple model ofcat
followed by exactly one parameter so the shell should use realcat
executable instead of optimized shortcut. Good point on possibly different credentials or non-standard stdin for real processes, though.
– Mikko Rantalainen
Apr 13 at 8:47
This case would be pretty easy because it clearly does not follow the simple model of
cat
followed by exactly one parameter so the shell should use real cat
executable instead of optimized shortcut. Good point on possibly different credentials or non-standard stdin for real processes, though.– Mikko Rantalainen
Apr 13 at 8:47
This case would be pretty easy because it clearly does not follow the simple model of
cat
followed by exactly one parameter so the shell should use real cat
executable instead of optimized shortcut. Good point on possibly different credentials or non-standard stdin for real processes, though.– Mikko Rantalainen
Apr 13 at 8:47
add a comment |
tl;dr: Shells don't do it automatically because the costs exceed the likely benefits.
Other answers have pointed out the technical difference between stdin being a pipe and it being a file. Keeping that in mind, the shell could do one of:
- Implement
cat
as a builtin, still preserving the file v. pipe distinction. This would save the cost of an exec and maybe, possibly, a fork. - Perform a full analysis of the pipeline with knowledge of the various commands used to see if file/pipe matters, then act based on that.
Next you have to consider the costs and benefits of each approach. The benefits are simple enough:
- In either case, avoid an exec (of
cat
) - In the second case, when redirect substitution is possible, avoidance of a fork.
- In cases where you have to use a pipe, it might be possible sometimes to avoid a fork/vfork, but often not. That's because the cat-equivalent needs to run at the same time as the rest of the pipeline.
So you save a little CPU time & memory, especially if you can avoid the fork. Of course, you only save this time & memory when the feature is actually used. And you're only really saving the fork/exec time; with larger files, the time is mostly the I/O time (i.e., cat reading a file from disk). So you have to ask: how often is cat
used (uselessly) in shell scripts where the performance actually matters? Compare it to other common shell builtins like test
— it's hard to imagine cat
is used (uselessly) even a tenth as often as test
is used in places that matter. That's a guess, I haven't measured, which is something you'd want to do before any attempt at implementation. (Or similarly, asking someone else to implement in e.g., a feature request.)
Next you ask: what are the costs. The two costs that come to mind are (a) additional code in the shell, which increases its size (and thus possibly memory use), requires more maintenance work, is another spot for bugs, etc.; and (b) backwards compatibility surprises, POSIX cat
omits a lot of features of e.g., GNU coreutils cat
, so you'd have to be careful exactly what the cat
builtin would implement.
The additional builtin option probably isn't that bad — adding one more builtin where a bunch already exist. If you had profiling data showing it'd help, you could probably convince your favorite shell's authors to add it.
As for analyzing the pipeline, I don't think shells do anything like this currently (a few recognize the end of a pipeline and can avoid a fork). Essentially you'd be adding a (primitive) optimizer to the shell; optimizers often turn out to be complicated code and the source of a lot of bugs. And those bugs can be surprising — slight changes in the shell script could wind up avoiding or triggering the bug.
Postscript: You can apply a similar analysis to your useless uses of cat. Benefits: easier to read (though if command1 will take a file as an argument, probably not). Costs: extra fork and exec (and if command1 can take a file as an argument, probably more confusing error messages). If your analysis tells you to uselessly use cat, then go ahead.
add a comment |
tl;dr: Shells don't do it automatically because the costs exceed the likely benefits.
Other answers have pointed out the technical difference between stdin being a pipe and it being a file. Keeping that in mind, the shell could do one of:
- Implement
cat
as a builtin, still preserving the file v. pipe distinction. This would save the cost of an exec and maybe, possibly, a fork. - Perform a full analysis of the pipeline with knowledge of the various commands used to see if file/pipe matters, then act based on that.
Next you have to consider the costs and benefits of each approach. The benefits are simple enough:
- In either case, avoid an exec (of
cat
) - In the second case, when redirect substitution is possible, avoidance of a fork.
- In cases where you have to use a pipe, it might be possible sometimes to avoid a fork/vfork, but often not. That's because the cat-equivalent needs to run at the same time as the rest of the pipeline.
So you save a little CPU time & memory, especially if you can avoid the fork. Of course, you only save this time & memory when the feature is actually used. And you're only really saving the fork/exec time; with larger files, the time is mostly the I/O time (i.e., cat reading a file from disk). So you have to ask: how often is cat
used (uselessly) in shell scripts where the performance actually matters? Compare it to other common shell builtins like test
— it's hard to imagine cat
is used (uselessly) even a tenth as often as test
is used in places that matter. That's a guess, I haven't measured, which is something you'd want to do before any attempt at implementation. (Or similarly, asking someone else to implement in e.g., a feature request.)
Next you ask: what are the costs. The two costs that come to mind are (a) additional code in the shell, which increases its size (and thus possibly memory use), requires more maintenance work, is another spot for bugs, etc.; and (b) backwards compatibility surprises, POSIX cat
omits a lot of features of e.g., GNU coreutils cat
, so you'd have to be careful exactly what the cat
builtin would implement.
The additional builtin option probably isn't that bad — adding one more builtin where a bunch already exist. If you had profiling data showing it'd help, you could probably convince your favorite shell's authors to add it.
As for analyzing the pipeline, I don't think shells do anything like this currently (a few recognize the end of a pipeline and can avoid a fork). Essentially you'd be adding a (primitive) optimizer to the shell; optimizers often turn out to be complicated code and the source of a lot of bugs. And those bugs can be surprising — slight changes in the shell script could wind up avoiding or triggering the bug.
Postscript: You can apply a similar analysis to your useless uses of cat. Benefits: easier to read (though if command1 will take a file as an argument, probably not). Costs: extra fork and exec (and if command1 can take a file as an argument, probably more confusing error messages). If your analysis tells you to uselessly use cat, then go ahead.
add a comment |
tl;dr: Shells don't do it automatically because the costs exceed the likely benefits.
Other answers have pointed out the technical difference between stdin being a pipe and it being a file. Keeping that in mind, the shell could do one of:
- Implement
cat
as a builtin, still preserving the file v. pipe distinction. This would save the cost of an exec and maybe, possibly, a fork. - Perform a full analysis of the pipeline with knowledge of the various commands used to see if file/pipe matters, then act based on that.
Next you have to consider the costs and benefits of each approach. The benefits are simple enough:
- In either case, avoid an exec (of
cat
) - In the second case, when redirect substitution is possible, avoidance of a fork.
- In cases where you have to use a pipe, it might be possible sometimes to avoid a fork/vfork, but often not. That's because the cat-equivalent needs to run at the same time as the rest of the pipeline.
So you save a little CPU time & memory, especially if you can avoid the fork. Of course, you only save this time & memory when the feature is actually used. And you're only really saving the fork/exec time; with larger files, the time is mostly the I/O time (i.e., cat reading a file from disk). So you have to ask: how often is cat
used (uselessly) in shell scripts where the performance actually matters? Compare it to other common shell builtins like test
— it's hard to imagine cat
is used (uselessly) even a tenth as often as test
is used in places that matter. That's a guess, I haven't measured, which is something you'd want to do before any attempt at implementation. (Or similarly, asking someone else to implement in e.g., a feature request.)
Next you ask: what are the costs. The two costs that come to mind are (a) additional code in the shell, which increases its size (and thus possibly memory use), requires more maintenance work, is another spot for bugs, etc.; and (b) backwards compatibility surprises, POSIX cat
omits a lot of features of e.g., GNU coreutils cat
, so you'd have to be careful exactly what the cat
builtin would implement.
The additional builtin option probably isn't that bad — adding one more builtin where a bunch already exist. If you had profiling data showing it'd help, you could probably convince your favorite shell's authors to add it.
As for analyzing the pipeline, I don't think shells do anything like this currently (a few recognize the end of a pipeline and can avoid a fork). Essentially you'd be adding a (primitive) optimizer to the shell; optimizers often turn out to be complicated code and the source of a lot of bugs. And those bugs can be surprising — slight changes in the shell script could wind up avoiding or triggering the bug.
Postscript: You can apply a similar analysis to your useless uses of cat. Benefits: easier to read (though if command1 will take a file as an argument, probably not). Costs: extra fork and exec (and if command1 can take a file as an argument, probably more confusing error messages). If your analysis tells you to uselessly use cat, then go ahead.
tl;dr: Shells don't do it automatically because the costs exceed the likely benefits.
Other answers have pointed out the technical difference between stdin being a pipe and it being a file. Keeping that in mind, the shell could do one of:
- Implement
cat
as a builtin, still preserving the file v. pipe distinction. This would save the cost of an exec and maybe, possibly, a fork. - Perform a full analysis of the pipeline with knowledge of the various commands used to see if file/pipe matters, then act based on that.
Next you have to consider the costs and benefits of each approach. The benefits are simple enough:
- In either case, avoid an exec (of
cat
) - In the second case, when redirect substitution is possible, avoidance of a fork.
- In cases where you have to use a pipe, it might be possible sometimes to avoid a fork/vfork, but often not. That's because the cat-equivalent needs to run at the same time as the rest of the pipeline.
So you save a little CPU time & memory, especially if you can avoid the fork. Of course, you only save this time & memory when the feature is actually used. And you're only really saving the fork/exec time; with larger files, the time is mostly the I/O time (i.e., cat reading a file from disk). So you have to ask: how often is cat
used (uselessly) in shell scripts where the performance actually matters? Compare it to other common shell builtins like test
— it's hard to imagine cat
is used (uselessly) even a tenth as often as test
is used in places that matter. That's a guess, I haven't measured, which is something you'd want to do before any attempt at implementation. (Or similarly, asking someone else to implement in e.g., a feature request.)
Next you ask: what are the costs. The two costs that come to mind are (a) additional code in the shell, which increases its size (and thus possibly memory use), requires more maintenance work, is another spot for bugs, etc.; and (b) backwards compatibility surprises, POSIX cat
omits a lot of features of e.g., GNU coreutils cat
, so you'd have to be careful exactly what the cat
builtin would implement.
The additional builtin option probably isn't that bad — adding one more builtin where a bunch already exist. If you had profiling data showing it'd help, you could probably convince your favorite shell's authors to add it.
As for analyzing the pipeline, I don't think shells do anything like this currently (a few recognize the end of a pipeline and can avoid a fork). Essentially you'd be adding a (primitive) optimizer to the shell; optimizers often turn out to be complicated code and the source of a lot of bugs. And those bugs can be surprising — slight changes in the shell script could wind up avoiding or triggering the bug.
Postscript: You can apply a similar analysis to your useless uses of cat. Benefits: easier to read (though if command1 will take a file as an argument, probably not). Costs: extra fork and exec (and if command1 can take a file as an argument, probably more confusing error messages). If your analysis tells you to uselessly use cat, then go ahead.
edited Apr 11 at 20:18
answered Apr 11 at 20:13
derobertderobert
75.4k8164223
75.4k8164223
add a comment |
add a comment |
The cat
command can accept -
as a marker for stdin. (POSIX, "If a file is '-', the cat utility shall read from the standard input at that point in the sequence.") This allows simple handling of a file or stdin where otherwise this would be disallowed.
Consider these two trivial alternatives, where the shell argument $1
is -
:
cat "$1" | nl # Works completely transparently
nl < "$1" # Fails with 'bash: -: No such file or directory'
Another time cat
is useful is where it's intentionally used as a no-op simply to maintain shell syntax:
file="$1"
reader=cat
[[ $file =~ .gz$ ]] && reader=zcat
[[ $file =~ .bz2$ ]] && reader=bzcat
"$reader" "$file"
Finally, I believe the only time that UUOC can really be correctly called out is when cat
is used with a filename that is known to be a regular file (i.e. not a device or named pipe), and that no flags are given to the command:
cat file.txt
In any other situation the oroperties of cat
itself may be required.
add a comment |
The cat
command can accept -
as a marker for stdin. (POSIX, "If a file is '-', the cat utility shall read from the standard input at that point in the sequence.") This allows simple handling of a file or stdin where otherwise this would be disallowed.
Consider these two trivial alternatives, where the shell argument $1
is -
:
cat "$1" | nl # Works completely transparently
nl < "$1" # Fails with 'bash: -: No such file or directory'
Another time cat
is useful is where it's intentionally used as a no-op simply to maintain shell syntax:
file="$1"
reader=cat
[[ $file =~ .gz$ ]] && reader=zcat
[[ $file =~ .bz2$ ]] && reader=bzcat
"$reader" "$file"
Finally, I believe the only time that UUOC can really be correctly called out is when cat
is used with a filename that is known to be a regular file (i.e. not a device or named pipe), and that no flags are given to the command:
cat file.txt
In any other situation the oroperties of cat
itself may be required.
add a comment |
The cat
command can accept -
as a marker for stdin. (POSIX, "If a file is '-', the cat utility shall read from the standard input at that point in the sequence.") This allows simple handling of a file or stdin where otherwise this would be disallowed.
Consider these two trivial alternatives, where the shell argument $1
is -
:
cat "$1" | nl # Works completely transparently
nl < "$1" # Fails with 'bash: -: No such file or directory'
Another time cat
is useful is where it's intentionally used as a no-op simply to maintain shell syntax:
file="$1"
reader=cat
[[ $file =~ .gz$ ]] && reader=zcat
[[ $file =~ .bz2$ ]] && reader=bzcat
"$reader" "$file"
Finally, I believe the only time that UUOC can really be correctly called out is when cat
is used with a filename that is known to be a regular file (i.e. not a device or named pipe), and that no flags are given to the command:
cat file.txt
In any other situation the oroperties of cat
itself may be required.
The cat
command can accept -
as a marker for stdin. (POSIX, "If a file is '-', the cat utility shall read from the standard input at that point in the sequence.") This allows simple handling of a file or stdin where otherwise this would be disallowed.
Consider these two trivial alternatives, where the shell argument $1
is -
:
cat "$1" | nl # Works completely transparently
nl < "$1" # Fails with 'bash: -: No such file or directory'
Another time cat
is useful is where it's intentionally used as a no-op simply to maintain shell syntax:
file="$1"
reader=cat
[[ $file =~ .gz$ ]] && reader=zcat
[[ $file =~ .bz2$ ]] && reader=bzcat
"$reader" "$file"
Finally, I believe the only time that UUOC can really be correctly called out is when cat
is used with a filename that is known to be a regular file (i.e. not a device or named pipe), and that no flags are given to the command:
cat file.txt
In any other situation the oroperties of cat
itself may be required.
edited Apr 12 at 7:54
answered Apr 11 at 14:11
roaimaroaima
46.2k758124
46.2k758124
add a comment |
add a comment |
The cat command can do things that the shell can't necessarily do ( or at least, can't do easily). For example, suppose you want to print characters that might otherwise be invisible, such as tabs, carriage returns, or newlines. There *might* be a way to do so with only shell builtin commands, but I can't think of any off the top of my head. The GNU version of cat can do so with the -A
argument or the -v -E -T
arguments (IDK about other versions of cat, though). You could also prefix each line with a line number using -n
(again, IDK if non-GNU versions can do this).
Another advantage of cat is that it can easily read multiple files. To do so, one can simply type cat file1 file2 file3
. To do the same with a shell, things would get tricky, although a carefully-crafted loop could most likely achieve the same result. That said, do you really want to take the time to write such a loop, when such a simple alternative exists? I don't!
Reading files with cat would probably use less CPU than the shell would, since cat is a pre-compiled program (the obvious exception is any shell that has a builtin cat). When reading a large group of files, this might become apparent, but I have never done so on my machines, so I can't be sure.
The cat command can also be useful for forcing a command to accept standard input in instances it might not. Consider the following:
echo 8 | sleep
The number "8" will be not accepted by the "sleep" command, since it was never really meant to accept standard input. Thus, sleep will disregard that input, complain about a lack of arguments, and exit. However, if one types:
echo 8 | sleep $(cat)
Many shells will expand this to sleep 8
, and sleep will wait for 8 seconds before exiting. You can also do something similar with ssh:
command | ssh 1.2.3.4 'cat >> example-file'
This command with append example-file on the machine with the address of 1.2.3.4 with whatever is outputted from "command".
And that's (probably) just scratching the surface. I'm sure I could find more example of cat being useful if I wanted to, but this post is long enough as it is. So, I'll conclude by saying this: asking the shell to anticipate all of these scenarios (and several others) is not really feasible.
I would end the last sentence by "is not easily feasible"
– Basile Starynkevitch
Apr 13 at 5:55
add a comment |
The cat command can do things that the shell can't necessarily do ( or at least, can't do easily). For example, suppose you want to print characters that might otherwise be invisible, such as tabs, carriage returns, or newlines. There *might* be a way to do so with only shell builtin commands, but I can't think of any off the top of my head. The GNU version of cat can do so with the -A
argument or the -v -E -T
arguments (IDK about other versions of cat, though). You could also prefix each line with a line number using -n
(again, IDK if non-GNU versions can do this).
Another advantage of cat is that it can easily read multiple files. To do so, one can simply type cat file1 file2 file3
. To do the same with a shell, things would get tricky, although a carefully-crafted loop could most likely achieve the same result. That said, do you really want to take the time to write such a loop, when such a simple alternative exists? I don't!
Reading files with cat would probably use less CPU than the shell would, since cat is a pre-compiled program (the obvious exception is any shell that has a builtin cat). When reading a large group of files, this might become apparent, but I have never done so on my machines, so I can't be sure.
The cat command can also be useful for forcing a command to accept standard input in instances it might not. Consider the following:
echo 8 | sleep
The number "8" will be not accepted by the "sleep" command, since it was never really meant to accept standard input. Thus, sleep will disregard that input, complain about a lack of arguments, and exit. However, if one types:
echo 8 | sleep $(cat)
Many shells will expand this to sleep 8
, and sleep will wait for 8 seconds before exiting. You can also do something similar with ssh:
command | ssh 1.2.3.4 'cat >> example-file'
This command with append example-file on the machine with the address of 1.2.3.4 with whatever is outputted from "command".
And that's (probably) just scratching the surface. I'm sure I could find more example of cat being useful if I wanted to, but this post is long enough as it is. So, I'll conclude by saying this: asking the shell to anticipate all of these scenarios (and several others) is not really feasible.
I would end the last sentence by "is not easily feasible"
– Basile Starynkevitch
Apr 13 at 5:55
add a comment |
The cat command can do things that the shell can't necessarily do ( or at least, can't do easily). For example, suppose you want to print characters that might otherwise be invisible, such as tabs, carriage returns, or newlines. There *might* be a way to do so with only shell builtin commands, but I can't think of any off the top of my head. The GNU version of cat can do so with the -A
argument or the -v -E -T
arguments (IDK about other versions of cat, though). You could also prefix each line with a line number using -n
(again, IDK if non-GNU versions can do this).
Another advantage of cat is that it can easily read multiple files. To do so, one can simply type cat file1 file2 file3
. To do the same with a shell, things would get tricky, although a carefully-crafted loop could most likely achieve the same result. That said, do you really want to take the time to write such a loop, when such a simple alternative exists? I don't!
Reading files with cat would probably use less CPU than the shell would, since cat is a pre-compiled program (the obvious exception is any shell that has a builtin cat). When reading a large group of files, this might become apparent, but I have never done so on my machines, so I can't be sure.
The cat command can also be useful for forcing a command to accept standard input in instances it might not. Consider the following:
echo 8 | sleep
The number "8" will be not accepted by the "sleep" command, since it was never really meant to accept standard input. Thus, sleep will disregard that input, complain about a lack of arguments, and exit. However, if one types:
echo 8 | sleep $(cat)
Many shells will expand this to sleep 8
, and sleep will wait for 8 seconds before exiting. You can also do something similar with ssh:
command | ssh 1.2.3.4 'cat >> example-file'
This command with append example-file on the machine with the address of 1.2.3.4 with whatever is outputted from "command".
And that's (probably) just scratching the surface. I'm sure I could find more example of cat being useful if I wanted to, but this post is long enough as it is. So, I'll conclude by saying this: asking the shell to anticipate all of these scenarios (and several others) is not really feasible.
The cat command can do things that the shell can't necessarily do ( or at least, can't do easily). For example, suppose you want to print characters that might otherwise be invisible, such as tabs, carriage returns, or newlines. There *might* be a way to do so with only shell builtin commands, but I can't think of any off the top of my head. The GNU version of cat can do so with the -A
argument or the -v -E -T
arguments (IDK about other versions of cat, though). You could also prefix each line with a line number using -n
(again, IDK if non-GNU versions can do this).
Another advantage of cat is that it can easily read multiple files. To do so, one can simply type cat file1 file2 file3
. To do the same with a shell, things would get tricky, although a carefully-crafted loop could most likely achieve the same result. That said, do you really want to take the time to write such a loop, when such a simple alternative exists? I don't!
Reading files with cat would probably use less CPU than the shell would, since cat is a pre-compiled program (the obvious exception is any shell that has a builtin cat). When reading a large group of files, this might become apparent, but I have never done so on my machines, so I can't be sure.
The cat command can also be useful for forcing a command to accept standard input in instances it might not. Consider the following:
echo 8 | sleep
The number "8" will be not accepted by the "sleep" command, since it was never really meant to accept standard input. Thus, sleep will disregard that input, complain about a lack of arguments, and exit. However, if one types:
echo 8 | sleep $(cat)
Many shells will expand this to sleep 8
, and sleep will wait for 8 seconds before exiting. You can also do something similar with ssh:
command | ssh 1.2.3.4 'cat >> example-file'
This command with append example-file on the machine with the address of 1.2.3.4 with whatever is outputted from "command".
And that's (probably) just scratching the surface. I'm sure I could find more example of cat being useful if I wanted to, but this post is long enough as it is. So, I'll conclude by saying this: asking the shell to anticipate all of these scenarios (and several others) is not really feasible.
answered Apr 11 at 22:35
TSJNachos117TSJNachos117
1916
1916
I would end the last sentence by "is not easily feasible"
– Basile Starynkevitch
Apr 13 at 5:55
add a comment |
I would end the last sentence by "is not easily feasible"
– Basile Starynkevitch
Apr 13 at 5:55
I would end the last sentence by "is not easily feasible"
– Basile Starynkevitch
Apr 13 at 5:55
I would end the last sentence by "is not easily feasible"
– Basile Starynkevitch
Apr 13 at 5:55
add a comment |
Remember that a user could have a cat
in his $PATH
which is not exactly the POSIX cat
(but perhaps some variant which could log something somewhere). In that case, you don't want the shell to remove it.
The PATH
could change dynamically, and then cat
is not what you believe it is. It would be quite difficult to write a shell doing the optimization you dream of.
Also, in practice, cat
is a quite quick program. There are few practical reasons (except aesthetics) to avoid it.
See also the excellent Parsing POSIX [s]hell talk by Yann Regis-Gianas at FOSDEM2018. It gives other good reasons to avoid attempting doing what you dream of in a shell.
If performance was really an issue for shells, someone would have proposed a shell which uses sophisticated whole program compiler optimization, static source code analysis, and just-in-time compilation techniques (all these three domains have decades of progress and scientific publications and dedicated conferences, e.g. under SIGPLAN). Sadly, even as an interesting research topic, that is not currently funded by research agencies or venture capitalists, and I am deducing that it is simply not worth the effort. In other words, there is probably no significant market for optimizing shells. If you have half a million euro to spend on such research, you'll easily find someone to do it, and I believe it would give worthwhile results.
On a practical side, rewriting, to improve its performance, a small (un hundred lines) shell script in any better scripting language (Python, AWK, Guile, ...) is commonly done. And it is not reasonable (for many software engineering reasons) to write large shell scripts: when you are writing a shell script exceeding a hundred lines, you do need to consider rewriting it (even for readability and maintenance reasons) in some more suitable language: as a programming language the shell is a very poor one. However, there are many large generated shell scripts, and for good reasons (e.g. GNU autoconf generated configure
scripts).
Regarding huge textual files, passing them to cat
as a single argument is not good practice, and most sysadmins know that (when any shell script takes more than a minute to run, you begin considering optimizing it). For large gigabytes files, cat
is never the good tool to process them.
3
"Quite few practical reasons to avoid it" -- anyone who's waited forcat some-huge-log | tail -n 5
to run (wheretail -n 5 some-huge-log
could jump straight to the end, whereascat
reads only front-to-back) would disagree.
– Charles Duffy
Apr 12 at 22:22
Comment checks out ^cat
ing a large text file in tens of GB range ( which was created for testing ) takes kinda long time. Wouldn't recommend.
– Sergiy Kolodyazhnyy
Apr 13 at 1:03
1
BTW, re: "no significant market for optimizing shells" -- ksh93 is an optimizing shell, and a quite good one. It was, for a while, successfully sold as a commercial product. (Sadly, being commercially licensed also made it sufficiently niche that poorly-written clones and other less-capable-but-free-of-cost successors took over the world outside of those sites willing to pay for a license, leading to the situation we have today).
– Charles Duffy
Apr 13 at 18:36
(not using the specific techniques you note, but frankly, those techniques don't make sense given the process model; the techniques it does apply are, well, well applied and to good effect).
– Charles Duffy
Apr 13 at 18:42
add a comment |
Remember that a user could have a cat
in his $PATH
which is not exactly the POSIX cat
(but perhaps some variant which could log something somewhere). In that case, you don't want the shell to remove it.
The PATH
could change dynamically, and then cat
is not what you believe it is. It would be quite difficult to write a shell doing the optimization you dream of.
Also, in practice, cat
is a quite quick program. There are few practical reasons (except aesthetics) to avoid it.
See also the excellent Parsing POSIX [s]hell talk by Yann Regis-Gianas at FOSDEM2018. It gives other good reasons to avoid attempting doing what you dream of in a shell.
If performance was really an issue for shells, someone would have proposed a shell which uses sophisticated whole program compiler optimization, static source code analysis, and just-in-time compilation techniques (all these three domains have decades of progress and scientific publications and dedicated conferences, e.g. under SIGPLAN). Sadly, even as an interesting research topic, that is not currently funded by research agencies or venture capitalists, and I am deducing that it is simply not worth the effort. In other words, there is probably no significant market for optimizing shells. If you have half a million euro to spend on such research, you'll easily find someone to do it, and I believe it would give worthwhile results.
On a practical side, rewriting, to improve its performance, a small (un hundred lines) shell script in any better scripting language (Python, AWK, Guile, ...) is commonly done. And it is not reasonable (for many software engineering reasons) to write large shell scripts: when you are writing a shell script exceeding a hundred lines, you do need to consider rewriting it (even for readability and maintenance reasons) in some more suitable language: as a programming language the shell is a very poor one. However, there are many large generated shell scripts, and for good reasons (e.g. GNU autoconf generated configure
scripts).
Regarding huge textual files, passing them to cat
as a single argument is not good practice, and most sysadmins know that (when any shell script takes more than a minute to run, you begin considering optimizing it). For large gigabytes files, cat
is never the good tool to process them.
3
"Quite few practical reasons to avoid it" -- anyone who's waited forcat some-huge-log | tail -n 5
to run (wheretail -n 5 some-huge-log
could jump straight to the end, whereascat
reads only front-to-back) would disagree.
– Charles Duffy
Apr 12 at 22:22
Comment checks out ^cat
ing a large text file in tens of GB range ( which was created for testing ) takes kinda long time. Wouldn't recommend.
– Sergiy Kolodyazhnyy
Apr 13 at 1:03
1
BTW, re: "no significant market for optimizing shells" -- ksh93 is an optimizing shell, and a quite good one. It was, for a while, successfully sold as a commercial product. (Sadly, being commercially licensed also made it sufficiently niche that poorly-written clones and other less-capable-but-free-of-cost successors took over the world outside of those sites willing to pay for a license, leading to the situation we have today).
– Charles Duffy
Apr 13 at 18:36
(not using the specific techniques you note, but frankly, those techniques don't make sense given the process model; the techniques it does apply are, well, well applied and to good effect).
– Charles Duffy
Apr 13 at 18:42
add a comment |
Remember that a user could have a cat
in his $PATH
which is not exactly the POSIX cat
(but perhaps some variant which could log something somewhere). In that case, you don't want the shell to remove it.
The PATH
could change dynamically, and then cat
is not what you believe it is. It would be quite difficult to write a shell doing the optimization you dream of.
Also, in practice, cat
is a quite quick program. There are few practical reasons (except aesthetics) to avoid it.
See also the excellent Parsing POSIX [s]hell talk by Yann Regis-Gianas at FOSDEM2018. It gives other good reasons to avoid attempting doing what you dream of in a shell.
If performance was really an issue for shells, someone would have proposed a shell which uses sophisticated whole program compiler optimization, static source code analysis, and just-in-time compilation techniques (all these three domains have decades of progress and scientific publications and dedicated conferences, e.g. under SIGPLAN). Sadly, even as an interesting research topic, that is not currently funded by research agencies or venture capitalists, and I am deducing that it is simply not worth the effort. In other words, there is probably no significant market for optimizing shells. If you have half a million euro to spend on such research, you'll easily find someone to do it, and I believe it would give worthwhile results.
On a practical side, rewriting, to improve its performance, a small (un hundred lines) shell script in any better scripting language (Python, AWK, Guile, ...) is commonly done. And it is not reasonable (for many software engineering reasons) to write large shell scripts: when you are writing a shell script exceeding a hundred lines, you do need to consider rewriting it (even for readability and maintenance reasons) in some more suitable language: as a programming language the shell is a very poor one. However, there are many large generated shell scripts, and for good reasons (e.g. GNU autoconf generated configure
scripts).
Regarding huge textual files, passing them to cat
as a single argument is not good practice, and most sysadmins know that (when any shell script takes more than a minute to run, you begin considering optimizing it). For large gigabytes files, cat
is never the good tool to process them.
Remember that a user could have a cat
in his $PATH
which is not exactly the POSIX cat
(but perhaps some variant which could log something somewhere). In that case, you don't want the shell to remove it.
The PATH
could change dynamically, and then cat
is not what you believe it is. It would be quite difficult to write a shell doing the optimization you dream of.
Also, in practice, cat
is a quite quick program. There are few practical reasons (except aesthetics) to avoid it.
See also the excellent Parsing POSIX [s]hell talk by Yann Regis-Gianas at FOSDEM2018. It gives other good reasons to avoid attempting doing what you dream of in a shell.
If performance was really an issue for shells, someone would have proposed a shell which uses sophisticated whole program compiler optimization, static source code analysis, and just-in-time compilation techniques (all these three domains have decades of progress and scientific publications and dedicated conferences, e.g. under SIGPLAN). Sadly, even as an interesting research topic, that is not currently funded by research agencies or venture capitalists, and I am deducing that it is simply not worth the effort. In other words, there is probably no significant market for optimizing shells. If you have half a million euro to spend on such research, you'll easily find someone to do it, and I believe it would give worthwhile results.
On a practical side, rewriting, to improve its performance, a small (un hundred lines) shell script in any better scripting language (Python, AWK, Guile, ...) is commonly done. And it is not reasonable (for many software engineering reasons) to write large shell scripts: when you are writing a shell script exceeding a hundred lines, you do need to consider rewriting it (even for readability and maintenance reasons) in some more suitable language: as a programming language the shell is a very poor one. However, there are many large generated shell scripts, and for good reasons (e.g. GNU autoconf generated configure
scripts).
Regarding huge textual files, passing them to cat
as a single argument is not good practice, and most sysadmins know that (when any shell script takes more than a minute to run, you begin considering optimizing it). For large gigabytes files, cat
is never the good tool to process them.
edited Apr 13 at 5:59
answered Apr 12 at 11:38
Basile StarynkevitchBasile Starynkevitch
8,1912041
8,1912041
3
"Quite few practical reasons to avoid it" -- anyone who's waited forcat some-huge-log | tail -n 5
to run (wheretail -n 5 some-huge-log
could jump straight to the end, whereascat
reads only front-to-back) would disagree.
– Charles Duffy
Apr 12 at 22:22
Comment checks out ^cat
ing a large text file in tens of GB range ( which was created for testing ) takes kinda long time. Wouldn't recommend.
– Sergiy Kolodyazhnyy
Apr 13 at 1:03
1
BTW, re: "no significant market for optimizing shells" -- ksh93 is an optimizing shell, and a quite good one. It was, for a while, successfully sold as a commercial product. (Sadly, being commercially licensed also made it sufficiently niche that poorly-written clones and other less-capable-but-free-of-cost successors took over the world outside of those sites willing to pay for a license, leading to the situation we have today).
– Charles Duffy
Apr 13 at 18:36
(not using the specific techniques you note, but frankly, those techniques don't make sense given the process model; the techniques it does apply are, well, well applied and to good effect).
– Charles Duffy
Apr 13 at 18:42
add a comment |
3
"Quite few practical reasons to avoid it" -- anyone who's waited forcat some-huge-log | tail -n 5
to run (wheretail -n 5 some-huge-log
could jump straight to the end, whereascat
reads only front-to-back) would disagree.
– Charles Duffy
Apr 12 at 22:22
Comment checks out ^cat
ing a large text file in tens of GB range ( which was created for testing ) takes kinda long time. Wouldn't recommend.
– Sergiy Kolodyazhnyy
Apr 13 at 1:03
1
BTW, re: "no significant market for optimizing shells" -- ksh93 is an optimizing shell, and a quite good one. It was, for a while, successfully sold as a commercial product. (Sadly, being commercially licensed also made it sufficiently niche that poorly-written clones and other less-capable-but-free-of-cost successors took over the world outside of those sites willing to pay for a license, leading to the situation we have today).
– Charles Duffy
Apr 13 at 18:36
(not using the specific techniques you note, but frankly, those techniques don't make sense given the process model; the techniques it does apply are, well, well applied and to good effect).
– Charles Duffy
Apr 13 at 18:42
3
3
"Quite few practical reasons to avoid it" -- anyone who's waited for
cat some-huge-log | tail -n 5
to run (where tail -n 5 some-huge-log
could jump straight to the end, whereas cat
reads only front-to-back) would disagree.– Charles Duffy
Apr 12 at 22:22
"Quite few practical reasons to avoid it" -- anyone who's waited for
cat some-huge-log | tail -n 5
to run (where tail -n 5 some-huge-log
could jump straight to the end, whereas cat
reads only front-to-back) would disagree.– Charles Duffy
Apr 12 at 22:22
Comment checks out ^
cat
ing a large text file in tens of GB range ( which was created for testing ) takes kinda long time. Wouldn't recommend.– Sergiy Kolodyazhnyy
Apr 13 at 1:03
Comment checks out ^
cat
ing a large text file in tens of GB range ( which was created for testing ) takes kinda long time. Wouldn't recommend.– Sergiy Kolodyazhnyy
Apr 13 at 1:03
1
1
BTW, re: "no significant market for optimizing shells" -- ksh93 is an optimizing shell, and a quite good one. It was, for a while, successfully sold as a commercial product. (Sadly, being commercially licensed also made it sufficiently niche that poorly-written clones and other less-capable-but-free-of-cost successors took over the world outside of those sites willing to pay for a license, leading to the situation we have today).
– Charles Duffy
Apr 13 at 18:36
BTW, re: "no significant market for optimizing shells" -- ksh93 is an optimizing shell, and a quite good one. It was, for a while, successfully sold as a commercial product. (Sadly, being commercially licensed also made it sufficiently niche that poorly-written clones and other less-capable-but-free-of-cost successors took over the world outside of those sites willing to pay for a license, leading to the situation we have today).
– Charles Duffy
Apr 13 at 18:36
(not using the specific techniques you note, but frankly, those techniques don't make sense given the process model; the techniques it does apply are, well, well applied and to good effect).
– Charles Duffy
Apr 13 at 18:42
(not using the specific techniques you note, but frankly, those techniques don't make sense given the process model; the techniques it does apply are, well, well applied and to good effect).
– Charles Duffy
Apr 13 at 18:42
add a comment |
Adding to @Kusalananda answer (and @alephzero comment), cat could be anything:
alias cat='gcc -c'
cat "$MYFILE" | command1 | command2 > "$OUTPUT"
or
echo 'echo 1' > /usr/bin/cat
cat "$MYFILE" | command1 | command2 > "$OUTPUT"
There is no reason that cat (on its own) or /usr/bin/cat on the system is actually cat the concatenate tool.
New contributor
3
Other than the behaviour ofcat
is defined by POSIX and so shouldn't be wildly different.
– roaima
Apr 11 at 14:13
2
@roaima:PATH=/home/Joshua/bin:$PATH cat ...
Are you sure you know whatcat
does now?
– Joshua
Apr 11 at 17:53
1
@Joshua it doesn't really matter. We both knowcat
can be overridden, but we also both know it shouldn't be wantonly replaced with something else. My comment points out that POSIX mandates a particular (subset of) behaviour that can reasonably be expected to exist. I have, at times, written a shell script that extends behaviour of a standard utility. In this case the shell script acted and behaved just like the tool it replaced, except that it had additional capabilities.
– roaima
Apr 11 at 21:12
@Joshua: On most platforms, shells know (or could know) which directories hold executables that implement POSIX commands. So you could just defer the substitution until after alias expansion and path resolution, and only do it for/bin/cat
. (And you'd make it an option you could turn off.) Or you'd makecat
a shell built-in (which maybe falls back to/bin/cat
for multiple args?) so users could control whether or not they wanted the external version the normal way, withenable cat
. Like forkill
. (I was thinking that bashcommand cat
would work, but that doesn't skip builtins)
– Peter Cordes
Apr 12 at 1:13
If you provide an alias, the shell will know thatcat
in that environment no longer refers to the usualcat
. Obviously, the optimization should be implemented after the aliases have been processed. I consider shell built-ins to represent commands in virtual directory that is always prepended to your path. If you want to avoid shell built-in version of any command (e.g.test
) you have to use a variant with a path.
– Mikko Rantalainen
Apr 13 at 8:52
add a comment |
Adding to @Kusalananda answer (and @alephzero comment), cat could be anything:
alias cat='gcc -c'
cat "$MYFILE" | command1 | command2 > "$OUTPUT"
or
echo 'echo 1' > /usr/bin/cat
cat "$MYFILE" | command1 | command2 > "$OUTPUT"
There is no reason that cat (on its own) or /usr/bin/cat on the system is actually cat the concatenate tool.
New contributor
3
Other than the behaviour ofcat
is defined by POSIX and so shouldn't be wildly different.
– roaima
Apr 11 at 14:13
2
@roaima:PATH=/home/Joshua/bin:$PATH cat ...
Are you sure you know whatcat
does now?
– Joshua
Apr 11 at 17:53
1
@Joshua it doesn't really matter. We both knowcat
can be overridden, but we also both know it shouldn't be wantonly replaced with something else. My comment points out that POSIX mandates a particular (subset of) behaviour that can reasonably be expected to exist. I have, at times, written a shell script that extends behaviour of a standard utility. In this case the shell script acted and behaved just like the tool it replaced, except that it had additional capabilities.
– roaima
Apr 11 at 21:12
@Joshua: On most platforms, shells know (or could know) which directories hold executables that implement POSIX commands. So you could just defer the substitution until after alias expansion and path resolution, and only do it for/bin/cat
. (And you'd make it an option you could turn off.) Or you'd makecat
a shell built-in (which maybe falls back to/bin/cat
for multiple args?) so users could control whether or not they wanted the external version the normal way, withenable cat
. Like forkill
. (I was thinking that bashcommand cat
would work, but that doesn't skip builtins)
– Peter Cordes
Apr 12 at 1:13
If you provide an alias, the shell will know thatcat
in that environment no longer refers to the usualcat
. Obviously, the optimization should be implemented after the aliases have been processed. I consider shell built-ins to represent commands in virtual directory that is always prepended to your path. If you want to avoid shell built-in version of any command (e.g.test
) you have to use a variant with a path.
– Mikko Rantalainen
Apr 13 at 8:52
add a comment |
Adding to @Kusalananda answer (and @alephzero comment), cat could be anything:
alias cat='gcc -c'
cat "$MYFILE" | command1 | command2 > "$OUTPUT"
or
echo 'echo 1' > /usr/bin/cat
cat "$MYFILE" | command1 | command2 > "$OUTPUT"
There is no reason that cat (on its own) or /usr/bin/cat on the system is actually cat the concatenate tool.
New contributor
Adding to @Kusalananda answer (and @alephzero comment), cat could be anything:
alias cat='gcc -c'
cat "$MYFILE" | command1 | command2 > "$OUTPUT"
or
echo 'echo 1' > /usr/bin/cat
cat "$MYFILE" | command1 | command2 > "$OUTPUT"
There is no reason that cat (on its own) or /usr/bin/cat on the system is actually cat the concatenate tool.
New contributor
New contributor
answered Apr 11 at 14:01
RobRob
211
211
New contributor
New contributor
3
Other than the behaviour ofcat
is defined by POSIX and so shouldn't be wildly different.
– roaima
Apr 11 at 14:13
2
@roaima:PATH=/home/Joshua/bin:$PATH cat ...
Are you sure you know whatcat
does now?
– Joshua
Apr 11 at 17:53
1
@Joshua it doesn't really matter. We both knowcat
can be overridden, but we also both know it shouldn't be wantonly replaced with something else. My comment points out that POSIX mandates a particular (subset of) behaviour that can reasonably be expected to exist. I have, at times, written a shell script that extends behaviour of a standard utility. In this case the shell script acted and behaved just like the tool it replaced, except that it had additional capabilities.
– roaima
Apr 11 at 21:12
@Joshua: On most platforms, shells know (or could know) which directories hold executables that implement POSIX commands. So you could just defer the substitution until after alias expansion and path resolution, and only do it for/bin/cat
. (And you'd make it an option you could turn off.) Or you'd makecat
a shell built-in (which maybe falls back to/bin/cat
for multiple args?) so users could control whether or not they wanted the external version the normal way, withenable cat
. Like forkill
. (I was thinking that bashcommand cat
would work, but that doesn't skip builtins)
– Peter Cordes
Apr 12 at 1:13
If you provide an alias, the shell will know thatcat
in that environment no longer refers to the usualcat
. Obviously, the optimization should be implemented after the aliases have been processed. I consider shell built-ins to represent commands in virtual directory that is always prepended to your path. If you want to avoid shell built-in version of any command (e.g.test
) you have to use a variant with a path.
– Mikko Rantalainen
Apr 13 at 8:52
add a comment |
3
Other than the behaviour ofcat
is defined by POSIX and so shouldn't be wildly different.
– roaima
Apr 11 at 14:13
2
@roaima:PATH=/home/Joshua/bin:$PATH cat ...
Are you sure you know whatcat
does now?
– Joshua
Apr 11 at 17:53
1
@Joshua it doesn't really matter. We both knowcat
can be overridden, but we also both know it shouldn't be wantonly replaced with something else. My comment points out that POSIX mandates a particular (subset of) behaviour that can reasonably be expected to exist. I have, at times, written a shell script that extends behaviour of a standard utility. In this case the shell script acted and behaved just like the tool it replaced, except that it had additional capabilities.
– roaima
Apr 11 at 21:12
@Joshua: On most platforms, shells know (or could know) which directories hold executables that implement POSIX commands. So you could just defer the substitution until after alias expansion and path resolution, and only do it for/bin/cat
. (And you'd make it an option you could turn off.) Or you'd makecat
a shell built-in (which maybe falls back to/bin/cat
for multiple args?) so users could control whether or not they wanted the external version the normal way, withenable cat
. Like forkill
. (I was thinking that bashcommand cat
would work, but that doesn't skip builtins)
– Peter Cordes
Apr 12 at 1:13
If you provide an alias, the shell will know thatcat
in that environment no longer refers to the usualcat
. Obviously, the optimization should be implemented after the aliases have been processed. I consider shell built-ins to represent commands in virtual directory that is always prepended to your path. If you want to avoid shell built-in version of any command (e.g.test
) you have to use a variant with a path.
– Mikko Rantalainen
Apr 13 at 8:52
3
3
Other than the behaviour of
cat
is defined by POSIX and so shouldn't be wildly different.– roaima
Apr 11 at 14:13
Other than the behaviour of
cat
is defined by POSIX and so shouldn't be wildly different.– roaima
Apr 11 at 14:13
2
2
@roaima:
PATH=/home/Joshua/bin:$PATH cat ...
Are you sure you know what cat
does now?– Joshua
Apr 11 at 17:53
@roaima:
PATH=/home/Joshua/bin:$PATH cat ...
Are you sure you know what cat
does now?– Joshua
Apr 11 at 17:53
1
1
@Joshua it doesn't really matter. We both know
cat
can be overridden, but we also both know it shouldn't be wantonly replaced with something else. My comment points out that POSIX mandates a particular (subset of) behaviour that can reasonably be expected to exist. I have, at times, written a shell script that extends behaviour of a standard utility. In this case the shell script acted and behaved just like the tool it replaced, except that it had additional capabilities.– roaima
Apr 11 at 21:12
@Joshua it doesn't really matter. We both know
cat
can be overridden, but we also both know it shouldn't be wantonly replaced with something else. My comment points out that POSIX mandates a particular (subset of) behaviour that can reasonably be expected to exist. I have, at times, written a shell script that extends behaviour of a standard utility. In this case the shell script acted and behaved just like the tool it replaced, except that it had additional capabilities.– roaima
Apr 11 at 21:12
@Joshua: On most platforms, shells know (or could know) which directories hold executables that implement POSIX commands. So you could just defer the substitution until after alias expansion and path resolution, and only do it for
/bin/cat
. (And you'd make it an option you could turn off.) Or you'd make cat
a shell built-in (which maybe falls back to /bin/cat
for multiple args?) so users could control whether or not they wanted the external version the normal way, with enable cat
. Like for kill
. (I was thinking that bash command cat
would work, but that doesn't skip builtins)– Peter Cordes
Apr 12 at 1:13
@Joshua: On most platforms, shells know (or could know) which directories hold executables that implement POSIX commands. So you could just defer the substitution until after alias expansion and path resolution, and only do it for
/bin/cat
. (And you'd make it an option you could turn off.) Or you'd make cat
a shell built-in (which maybe falls back to /bin/cat
for multiple args?) so users could control whether or not they wanted the external version the normal way, with enable cat
. Like for kill
. (I was thinking that bash command cat
would work, but that doesn't skip builtins)– Peter Cordes
Apr 12 at 1:13
If you provide an alias, the shell will know that
cat
in that environment no longer refers to the usual cat
. Obviously, the optimization should be implemented after the aliases have been processed. I consider shell built-ins to represent commands in virtual directory that is always prepended to your path. If you want to avoid shell built-in version of any command (e.g. test
) you have to use a variant with a path.– Mikko Rantalainen
Apr 13 at 8:52
If you provide an alias, the shell will know that
cat
in that environment no longer refers to the usual cat
. Obviously, the optimization should be implemented after the aliases have been processed. I consider shell built-ins to represent commands in virtual directory that is always prepended to your path. If you want to avoid shell built-in version of any command (e.g. test
) you have to use a variant with a path.– Mikko Rantalainen
Apr 13 at 8:52
add a comment |
Two "useless" uses for cat:
sort file.txt | cat header.txt - footer.txt | less
...here cat
is used to mix file and piped input.
find . -name '*.info' -type f | sh -c 'xargs cat' | sort
...here xargs
can accept a virtually infinite number of filenames and run cat
as many times as needed while making it all behave like one stream. So this works for large file lists where direct use of xargs sort
does not.
Both of these use cases would be trivially avoided by making shell built-in only step-in ifcat
is called with exactly one argument. Especially the case wheresh
is passed a string andxargs
will callcat
directly there's no way the shell could use it's built-in implementation.
– Mikko Rantalainen
Apr 13 at 8:55
add a comment |
Two "useless" uses for cat:
sort file.txt | cat header.txt - footer.txt | less
...here cat
is used to mix file and piped input.
find . -name '*.info' -type f | sh -c 'xargs cat' | sort
...here xargs
can accept a virtually infinite number of filenames and run cat
as many times as needed while making it all behave like one stream. So this works for large file lists where direct use of xargs sort
does not.
Both of these use cases would be trivially avoided by making shell built-in only step-in ifcat
is called with exactly one argument. Especially the case wheresh
is passed a string andxargs
will callcat
directly there's no way the shell could use it's built-in implementation.
– Mikko Rantalainen
Apr 13 at 8:55
add a comment |
Two "useless" uses for cat:
sort file.txt | cat header.txt - footer.txt | less
...here cat
is used to mix file and piped input.
find . -name '*.info' -type f | sh -c 'xargs cat' | sort
...here xargs
can accept a virtually infinite number of filenames and run cat
as many times as needed while making it all behave like one stream. So this works for large file lists where direct use of xargs sort
does not.
Two "useless" uses for cat:
sort file.txt | cat header.txt - footer.txt | less
...here cat
is used to mix file and piped input.
find . -name '*.info' -type f | sh -c 'xargs cat' | sort
...here xargs
can accept a virtually infinite number of filenames and run cat
as many times as needed while making it all behave like one stream. So this works for large file lists where direct use of xargs sort
does not.
answered Apr 13 at 0:20
taskettasket
807
807
Both of these use cases would be trivially avoided by making shell built-in only step-in ifcat
is called with exactly one argument. Especially the case wheresh
is passed a string andxargs
will callcat
directly there's no way the shell could use it's built-in implementation.
– Mikko Rantalainen
Apr 13 at 8:55
add a comment |
Both of these use cases would be trivially avoided by making shell built-in only step-in ifcat
is called with exactly one argument. Especially the case wheresh
is passed a string andxargs
will callcat
directly there's no way the shell could use it's built-in implementation.
– Mikko Rantalainen
Apr 13 at 8:55
Both of these use cases would be trivially avoided by making shell built-in only step-in if
cat
is called with exactly one argument. Especially the case where sh
is passed a string and xargs
will call cat
directly there's no way the shell could use it's built-in implementation.– Mikko Rantalainen
Apr 13 at 8:55
Both of these use cases would be trivially avoided by making shell built-in only step-in if
cat
is called with exactly one argument. Especially the case where sh
is passed a string and xargs
will call cat
directly there's no way the shell could use it's built-in implementation.– Mikko Rantalainen
Apr 13 at 8:55
add a comment |
Aside from other things, cat
-check would add additional performance overhead and confusion as to which use of cat
is actually useless, IMHO, because such checks can be inefficient and create problems with legitimate cat
usage.
When commands deal with the standard streams, they only have to care about reading/writing to the standard file descriptors. Commands can know if stdin is seekable/lseekable or not, which indicates a pipe or file.
If we add to the mix checking what process actually provides that stdin content, we will need to find the process on the other side of the pipe and apply appropriate optimization. This can be done in terms of shell itself, as shown in the SuperUser post by Kyle Jones, and in terms of shell that's
(find /proc -type l | xargs ls -l | fgrep 'pipe:[20043922]') 2>/dev/null
as shown in the linked post. This is 3 more commands (so extra fork()
s and exec()
s) and recursive traversals ( so whole lot of readdir()
calls ).
In terms of C and shell source code, the shell already knows the child process,so there's no need for recursion, but how do we know when to optimize and when cat
is actually useless ? There are in fact useful uses of cat, such as
# adding header and footer to file
( cmd; cat file; cmd ) | cmd
# tr command does not accept files as arguments
cat log1 log2 log3 | tr '[:upper:]' '[:lower:]'
It would probably be waste and unnecessary overhead to add such optimization to the shell. As Kusalanda's answer mentioned already, UUOC is more about user's own lack of understanding of how to best combine commands for best results.
add a comment |
Aside from other things, cat
-check would add additional performance overhead and confusion as to which use of cat
is actually useless, IMHO, because such checks can be inefficient and create problems with legitimate cat
usage.
When commands deal with the standard streams, they only have to care about reading/writing to the standard file descriptors. Commands can know if stdin is seekable/lseekable or not, which indicates a pipe or file.
If we add to the mix checking what process actually provides that stdin content, we will need to find the process on the other side of the pipe and apply appropriate optimization. This can be done in terms of shell itself, as shown in the SuperUser post by Kyle Jones, and in terms of shell that's
(find /proc -type l | xargs ls -l | fgrep 'pipe:[20043922]') 2>/dev/null
as shown in the linked post. This is 3 more commands (so extra fork()
s and exec()
s) and recursive traversals ( so whole lot of readdir()
calls ).
In terms of C and shell source code, the shell already knows the child process,so there's no need for recursion, but how do we know when to optimize and when cat
is actually useless ? There are in fact useful uses of cat, such as
# adding header and footer to file
( cmd; cat file; cmd ) | cmd
# tr command does not accept files as arguments
cat log1 log2 log3 | tr '[:upper:]' '[:lower:]'
It would probably be waste and unnecessary overhead to add such optimization to the shell. As Kusalanda's answer mentioned already, UUOC is more about user's own lack of understanding of how to best combine commands for best results.
add a comment |
Aside from other things, cat
-check would add additional performance overhead and confusion as to which use of cat
is actually useless, IMHO, because such checks can be inefficient and create problems with legitimate cat
usage.
When commands deal with the standard streams, they only have to care about reading/writing to the standard file descriptors. Commands can know if stdin is seekable/lseekable or not, which indicates a pipe or file.
If we add to the mix checking what process actually provides that stdin content, we will need to find the process on the other side of the pipe and apply appropriate optimization. This can be done in terms of shell itself, as shown in the SuperUser post by Kyle Jones, and in terms of shell that's
(find /proc -type l | xargs ls -l | fgrep 'pipe:[20043922]') 2>/dev/null
as shown in the linked post. This is 3 more commands (so extra fork()
s and exec()
s) and recursive traversals ( so whole lot of readdir()
calls ).
In terms of C and shell source code, the shell already knows the child process,so there's no need for recursion, but how do we know when to optimize and when cat
is actually useless ? There are in fact useful uses of cat, such as
# adding header and footer to file
( cmd; cat file; cmd ) | cmd
# tr command does not accept files as arguments
cat log1 log2 log3 | tr '[:upper:]' '[:lower:]'
It would probably be waste and unnecessary overhead to add such optimization to the shell. As Kusalanda's answer mentioned already, UUOC is more about user's own lack of understanding of how to best combine commands for best results.
Aside from other things, cat
-check would add additional performance overhead and confusion as to which use of cat
is actually useless, IMHO, because such checks can be inefficient and create problems with legitimate cat
usage.
When commands deal with the standard streams, they only have to care about reading/writing to the standard file descriptors. Commands can know if stdin is seekable/lseekable or not, which indicates a pipe or file.
If we add to the mix checking what process actually provides that stdin content, we will need to find the process on the other side of the pipe and apply appropriate optimization. This can be done in terms of shell itself, as shown in the SuperUser post by Kyle Jones, and in terms of shell that's
(find /proc -type l | xargs ls -l | fgrep 'pipe:[20043922]') 2>/dev/null
as shown in the linked post. This is 3 more commands (so extra fork()
s and exec()
s) and recursive traversals ( so whole lot of readdir()
calls ).
In terms of C and shell source code, the shell already knows the child process,so there's no need for recursion, but how do we know when to optimize and when cat
is actually useless ? There are in fact useful uses of cat, such as
# adding header and footer to file
( cmd; cat file; cmd ) | cmd
# tr command does not accept files as arguments
cat log1 log2 log3 | tr '[:upper:]' '[:lower:]'
It would probably be waste and unnecessary overhead to add such optimization to the shell. As Kusalanda's answer mentioned already, UUOC is more about user's own lack of understanding of how to best combine commands for best results.
answered Apr 13 at 1:30
Sergiy KolodyazhnyySergiy Kolodyazhnyy
10.7k42765
10.7k42765
add a comment |
add a comment |
22
Those commands are not actually equivalent, since in one case stdin is a file, and in the other it's a pipe, so it wouldn't be a strictly safe conversion. You could make a system that did it, though.
– Michael Homer
Apr 11 at 7:32
14
That you can't imagine a use case doesn't mean that an application isn't allowed to rely on the specified behaviour uselessly. Getting an error from
lseek
is still defined behaviour and could cause a different outcome, the different blocking behaviour can be semantically meaningful, etc. It would be allowable to make the change if you knew what the other commands were and knew they didn't care, or if you just didn't care about compatibility at that level, but the benefit is pretty small. I do imagine the lack of benefit drives the situation more than the conformance cost.– Michael Homer
Apr 11 at 7:57
3
The shell absolutely is allowed to implement
cat
itself, though, or any other utility. It's also allowed to know how the other utilities that belong to the system work (e.g. it can know how the externalgrep
implementation that came with the system behaves). This is completely viable to do, so it's entirely fair to wonder why they don't.– Michael Homer
Apr 11 at 8:04
6
@MichaelHomer e.g. it can know how the external grep implementation that came with the system behaves So the shell now has a dependency on the behavior of
grep
. Andsed
. Andawk
. Anddu
. And how many hundreds if not thousands of other utilities?– Andrew Henle
Apr 11 at 11:01
19
It would be pretty uncool of my shell to edit my commands for me.
– Azor Ahai
Apr 11 at 16:35