Quantcast
Channel: User Kamil Maciorowski - Super User
Viewing all articles
Browse latest Browse all 656

Answer by Kamil Maciorowski for Cloning an SSD with pv command

$
0
0

Many commands in Linux are designed to work in pipelines. This means you can chain them like this:

command1 | command2 | … | commandN

In this case the standard output (stdout) of command1 goes to the standard input (stdin) of command2 and so on. A command may or may not use its stdin/stdout, depending on its design and options+operands you provide. You can read manuals to tell which commands can work like this.

The pipe symbol (|) belongs to the shell syntax. It's the shell what arranges connections (pipes) between commands.

In the above example command1 has no predecessor and commandN has no successor in the pipe. In such case stdin of command1 and stdout of commandN get connected to whatever the shell considers its stdin and stdout respectively. Usually it's the terminal.

You can change it on demand. Example:

command1 <input | command2 | … | commandN >output

input and output are files (note this is a broad term). Now if command1 reads from its stdin then it reads from input; if commandN writes to its stdout then it writes to output. (More: if command1 does anything to its stdin then it does it to input etc; e.g. in general it's possible to write to stdin or read from stdout because these are just file descriptors; this is rather rare, don't worry about it).

I prefer syntax that shows the data flow more clearly (compare this answer):

<input command1 | command2 | … | commandN >output

If each command is designed (and configured with proper options+operands) to read from its stdin and write to its stdout then data will flow from input to output. Each command will do something to the data stream (in this context "no change" is also "something") and pass it further down the pipeline. This way you can chain many "effects" that affect data. Commands working like this are called filters (especially when they process text data).

In Unix philosophy a tool should do one thing and do it well. The idea is to use many simple, well defined tools to achieve what you want. Often it means to chain many filters to get a complex filter.

On the other end of the spectrum there's a situation where the number of commands is one. You can have just one command reading from input, doing something and writing to output:

<input command1 >output# or equivalentlycommand1 <input >output# or (extra spaces allowed)command1 < input > output

Again, the command needs to be designed to work like this. Many are, some are not. Some commands read from their stdin if you don't specify a path to any input file (as a command line operand) or the path specified is literally - (a convention). Read the respective documentation to learn how a particular command behaves.


The command in question

pv < /dev/sdX > /dev/sdY

is equivalent to

</dev/sdX pv >/dev/sdY

and pv is designed to work like this. You may have thought < /dev/sdX > is some form (syntax) of providing the /dev/sdX operand to pv. It's not. < /dev/sdX and > /dev/sdY are two separate redirections, the actual command is sole pv without arguments.

Thanks to how pv is able to work with its stdin and stdout, the whole line means "read from /dev/sdX, do something to the data stream (whatever pv does) and write to /dev/sdY".

pv would also understand this:

pv /dev/sdX >/dev/sdY

In this case /dev/sdX is a command line argument to pv. The tool will read from the specified file and ignore its stdin (only because it's designed to work this way), so the effect will basically be the same.

Your goal is to clone, so the desired "something to do" is "no change"; you want to "read from /dev/sdX, change nothing and write to /dev/sdY". The most basic program (command) that does "no change" is cat. Why pv then?

There are many commands that can do "no change". For a particular command the ability to perform "no change" may be because:

  • "no change" is a special case of the task the command is designed to do ("do one thing and do it well", remember?); with proper options (or lack of them) the task becomes "no change";
  • the command is designed to pass data unchanged while doing another thing on the side; this may be:
    • forking the data stream,
    • providing a buffer (in memory or on disk),
    • affecting throughput,
    • showing throughput, progress or other statistics (via a separate channel: standard error, stderr, so this doesn't interfere with stdout), or logging to a file,
    • something else.

Examples of commands that perform "no change":

  • </dev/sdX cat >/dev/sdY

    The main purpose of cat is to concatenate many files (data streams). When there is no file specified (like in this case), stdin is used. When there is just one input stream, you get "no change".

  • </dev/sdX tee >/dev/sdY

    The main purpose of tee is to fork the stream. One copy gets to stdout and each specified file gets a copy as well. When there is no file specified (like in this case), you get just one copy to stdout, "no change".

  • </dev/sdX dd >/dev/sdY

    dd is a common tool for cloning block devices (compare this answer), but there are quirks. Traditionally one would use if=/dev/sdX and of=/dev/sdY, without them stdin and stdout are used respectively. And you want to use bs= for performance. Some options or circumstances may make the output stream differ from the input, still usually you get "no change" (plus some statistics to stderr when dd exits).

  • </dev/sdX pv >/dev/sdY

    This will show you progress, throughput. With proper options you can limit the throughput. The data stream is not mangled with, so you get "no change".

  • </dev/sdX mbuffer >/dev/sdY

    Somewhat similar to pv, additionally you can choose the size of a buffer it uses (in memory or in a file). The data stream is not mangled with, so you get "no change".


Note there is yet another tool that can "read from input, change nothing and write to output. It's cp. Normally you specify the files like this:

cp input output

This may be not obvious but cp can clone /dev/sdX to /dev/sdY:

cp /dev/sdX /dev/sdY

The tool is not designed to use its stdin or stdout, it needs paths. This will not work:

</dev/sdX cp >/dev/sdY

In modern Linux you can make cp use its stdin and stdout like this:

<input cp /dev/stdin /dev/stdout >output# or<input cp /proc/self/fd/0 /proc/self/fd/1 >output

Where /dev/stdin and /dev/stdout link to /proc/self/fd/0 and /proc/self/fd/1 respectively. These are special files provided by the kernel. If any process tries to open /proc/self/fd/0, it will get its own stdin. Similarly with /proc/self/fd/1 and stdout. This way you can force some not-pipeline-friendly processes to work with redirections or in a pipeline. It's hardly ever useful and there may be some problems (usually commands are not-pipeline-friendly for reasons), I won't elaborate.


Viewing all articles
Browse latest Browse all 656

Trending Articles