Quantcast
Channel: User Kamil Maciorowski - Super User
Viewing all articles
Browse latest Browse all 837

Answer by Kamil Maciorowski for How can I find what files differ from corresponding files in a larger set?

$
0
0

This other answer that uses diff -q … | grep -v "^Only in" is fine, but…

The fact that in general pathnames may contain newline characters makes the output of diff -q ambiguous. Even if you accept this (because e.g. you know all the filenames and with this knowledge you are able to properly interpret the output where it looks kinda ambiguous), an unfortunate name containing a newline character immediately followed by Only in will make grep -v remove lines that are just parts of a message saying some files differ.

This is an edge case, but still. The point of my answer is to provide a way to do the comparison and present or use the result unambiguously.

With powerful enough find this is how you find files in subset that differ from their counterparts in superset:

( cd subset && find . ! -type d ! -exec cmp -s {} ../superset/{} \; -print )

(The purpose of ( ) is to allow our code to cd without affecting whatever follows in the shell where you run the whole command; the current working directory of the shell will not change.)

If a file from subset does not have a counterpart in superset (so the directories are not really a subset and a superset to each other) then cmp will fail and the pathname will be printed as if files differed. To check if a counterpart exists you may add -exec test -e ../superset/{} \; ! -exec test -d ../superset/{} \; just after ! -type d. The rest of this answer assumes subset is really a subset of superset.

To find files that do not differ, just remove ! from the command.

All the output comes from -print, lines are newline-terminated, so the output is still ambiguous because pathnames may contain newline characters. This can easily be fixed. If your find supports -print0 then you can use it instead of -print and then the output will be unambiguous (not really when you read it in a terminal, but certainly when you pipe it to a tool that expects null-terminated lines, like xargs -0). If your find does not support -print0 then you can still use something like -exec printf '%s\0' {} \;. Regardless, if you need to do something to files that pass test(s) inside find, often you don't need to parse the output at all (so you don't need to care whether the output is ambiguous or not), because you can build -exec … {} … \; or -exec … {} + that does the job from the inside of find.

I wrote "powerful enough find" in the first place, because by the POSIX standard find may or may not expand {} inside ../superset/{} we used. If your find only expands {} given as a single word then instead of -exec cmp -s {} ../superset/{} \; you need something like -exec sh -c 'cmp -s "$1" "../superset/$1"' find-sh {} \;.

As far as I know, the support for -print0 in find and the support of -0 in xargs were added to the POSIX standard in 2024; in the wild you can still find find and xargs that lack the support.

If your find is powerful enough then use the command already given, with -print0 if needed. For basic implementations of find that have not been updated to the 2024 standard, here is a command that should work with arbitrary pathnames:

( cd subset && find . ! -type d ! -exec sh -c '     cmp -s "$1" "../superset/$1"' find-sh {} \; -exec … {} + )

where you shall replace with printf '%s\0' if you want null-terminated strings that will make the output unambiguous (but note the sole POSIX minimal toolset without extra functionalities is not well equipped to handle null-terminated strings as input; I mean tools like sort or head may still not support working with such strings, even after 2024). Or you shall replace with whatever you want to do to the files that pass the tests.

Like above, the command detects files in subset that differ from their counterparts in superset. To find files that do not differ, just remove ! from the command.

Notes:


Viewing all articles
Browse latest Browse all 837

Trending Articles