Many commands in Linux are designed to work in pipelines. This means you can chain them like this:
command1 | command2 | … | commandN
In this case the standard output (stdout) of command1
goes to the standard input (stdin) of command2
and so on. A command may or may not use its stdin/stdout, depending on its design and options+operands you provide. You can read manuals to tell which commands can work like this.
The pipe symbol (|
) belongs to the shell syntax. It's the shell what arranges connections (pipes) between commands.
In the above example command1
has no predecessor and commandN
has no successor in the pipe. In such case stdin of command1
and stdout of commandN
get connected to whatever the shell considers its stdin and stdout respectively. Usually it's the terminal.
You can change it on demand. Example:
command1 <input | command2 | … | commandN >output
input
and output
are files (note this is a broad term). Now if command1
reads from its stdin then it reads from input
; if commandN
writes to its stdout then it writes to output
. (More: if command1
does anything to its stdin then it does it to input
etc; e.g. in general it's possible to write to stdin or read from stdout because these are just file descriptors; this is rather rare, don't worry about it).
I prefer syntax that shows the data flow more clearly (compare this answer):
<input command1 | command2 | … | commandN >output
If each command is designed (and configured with proper options+operands) to read from its stdin and write to its stdout then data will flow from input
to output
. Each command will do something to the data stream (in this context "no change" is also "something") and pass it further down the pipeline. This way you can chain many "effects" that affect data. Commands working like this are called filters (especially when they process text data).
In Unix philosophy a tool should do one thing and do it well. The idea is to use many simple, well defined tools to achieve what you want. Often it means to chain many filters to get a complex filter.
On the other end of the spectrum there's a situation where the number of commands is one. You can have just one command reading from input
, doing something and writing to output
:
<input command1 >output# or equivalentlycommand1 <input >output# or (extra spaces allowed)command1 < input > output
Again, the command needs to be designed to work like this. Many are, some are not. Some commands read from their stdin if you don't specify a path to any input file (as a command line operand) or the path specified is literally -
(a convention). Read the respective documentation to learn how a particular command behaves.
The command in question
pv < /dev/sdX > /dev/sdY
is equivalent to
</dev/sdX pv >/dev/sdY
and pv
is designed to work like this. You may have thought < /dev/sdX >
is some form (syntax) of providing the /dev/sdX
operand to pv
. It's not. < /dev/sdX
and > /dev/sdY
are two separate redirections, the actual command is sole pv
without arguments.
Thanks to how pv
is able to work with its stdin and stdout, the whole line means "read from /dev/sdX
, do something to the data stream (whatever pv
does) and write to /dev/sdY
".
pv
would also understand this:
pv /dev/sdX >/dev/sdY
In this case /dev/sdX
is a command line argument to pv
. The tool will read from the specified file and ignore its stdin (only because it's designed to work this way), so the effect will basically be the same.
Your goal is to clone, so the desired "something to do" is "no change"; you want to "read from /dev/sdX
, change nothing and write to /dev/sdY
". The most basic program (command) that does "no change" is cat
. Why pv
then?
There are many commands that can do "no change". For a particular command the ability to perform "no change" may be because:
- "no change" is a special case of the task the command is designed to do ("do one thing and do it well", remember?); with proper options (or lack of them) the task becomes "no change";
- the command is designed to pass data unchanged while doing another thing on the side; this may be:
- forking the data stream,
- providing a buffer (in memory or on disk),
- affecting throughput,
- showing throughput, progress or other statistics (via a separate channel: standard error, stderr, so this doesn't interfere with stdout), or logging to a file,
- something else.
Examples of commands that perform "no change":
</dev/sdX cat >/dev/sdY
The main purpose of
cat
is to concatenate many files (data streams). When there is no file specified (like in this case), stdin is used. When there is just one input stream, you get "no change".</dev/sdX tee >/dev/sdY
The main purpose of
tee
is to fork the stream. One copy gets to stdout and each specified file gets a copy as well. When there is no file specified (like in this case), you get just one copy to stdout, "no change".</dev/sdX dd >/dev/sdY
dd
is a common tool for cloning block devices (compare this answer), but there are quirks. Traditionally one would useif=/dev/sdX
andof=/dev/sdY
, without them stdin and stdout are used respectively. And you want to usebs=
for performance. Some options or circumstances may make the output stream differ from the input, still usually you get "no change" (plus some statistics to stderr whendd
exits).</dev/sdX pv >/dev/sdY
This will show you progress, throughput. With proper options you can limit the throughput. The data stream is not mangled with, so you get "no change".
</dev/sdX mbuffer >/dev/sdY
Somewhat similar to
pv
, additionally you can choose the size of a buffer it uses (in memory or in a file). The data stream is not mangled with, so you get "no change".
Note there is yet another tool that can "read from input
, change nothing and write to output
. It's cp
. Normally you specify the files like this:
cp input output
This may be not obvious but cp
can clone /dev/sdX
to /dev/sdY
:
cp /dev/sdX /dev/sdY
The tool is not designed to use its stdin or stdout, it needs paths. This will not work:
</dev/sdX cp >/dev/sdY
In modern Linux you can make cp
use its stdin and stdout like this:
<input cp /dev/stdin /dev/stdout >output# or<input cp /proc/self/fd/0 /proc/self/fd/1 >output
Where /dev/stdin
and /dev/stdout
link to /proc/self/fd/0
and /proc/self/fd/1
respectively. These are special files provided by the kernel. If any process tries to open /proc/self/fd/0
, it will get its own stdin. Similarly with /proc/self/fd/1
and stdout. This way you can force some not-pipeline-friendly processes to work with redirections or in a pipeline. It's hardly ever useful and there may be some problems (usually commands are not-pipeline-friendly for reasons), I won't elaborate.