Quantcast
Channel: User Kamil Maciorowski - Super User
Viewing all articles
Browse latest Browse all 837

Answer by Kamil Maciorowski for `find`-like tool pruning most directories along a golden path

$
0
0

FreeBSD implementation of find supports -depth n (note it has nothing to do with portable -depth without a parameter, also supported).

-depth n

True if the depth of the file relative to the starting point of the traversal is n.

This implementation also supports -mindepth and -maxdepth.

If you want to find all infos at depth 6, each in a directory named r1 at depth 5, each in a directory named q1 at depth 4, each in a directory named p1 at depth 3, then this is what you can do with FreeBSD find:

find . \  -mindepth 3 \  -maxdepth 6 \     -depth 3 ! -name p1   -prune \  -o -depth 4 ! -name q1   -prune \  -o -depth 5 ! -name r1   -prune \  -o -depth 6   -name info -print

(I have not tested though.)

Unfortunately, many implementations of find (including GNU find) do not support -depth n. -mindepth and -maxdepth are also not portable (but I can see you can use them). Portably you can "implement" -mindepth, -maxdepth or -depth n by using -path (or its less portable equivalent -wholename) and patterns that include * and /.

Our -mindepth 3 will be like:

find . -path '*/*/*/*' \( … \)

where is the rest of the expression that may contain -o and therefore we need the parentheses.

Our -maxdepth 6 will be an additional test like:

\( ! -path '*/*/*/*/*/*/*' -o -prune \)

which prunes every file at depth 6 (and deeper, but because of -prune at depth 6 we will never get there anyway). The reason we use ! -path … -o -prune and not just -path … -prune is we need this test to always succeed, which the former form does and the latter doesn't.

Notes:

  • * in -path may match / character(s), so our -path '*/*/*/*' means "depth 3 or deeper" which nicely corresponds to -mindepth 3, similarly our -path '*/*/*/*/*/*/*' means "depth 6 or deeper".

  • In general the patterns we need to use depend on how many slashes there are in the starting point (. in our example, zero slashes). If the starting point was /foo/bar then we would need to increase the number of slashes in our patterns by two. If the starting point ended with / then we wouldn't be able to tell apart depth 0 (the starting point itself) from depth 1 just by counting slashes. If we used two (or more) starting points with different number of slashes then replacing -mindepth, -maxdepth or -depth n with -path would get complicated and ugly.

  • A pattern containing * may "misbehave" if there are invalid characters in some pathname(s). More information: here.

All the above notes apply when we create a replacement for -depth 3:

-path '*/*/*/*' ! -path '*/*/*/*/*'

Similarly for other depths. Now we can make our command portable:

find . \  -path '*/*/*/*' \  \( ! -path '*/*/*/*/*/*/*' -o -prune \) \  \( \     -path '*/*/*/*'       ! -path '*/*/*/*/*'         ! -name p1   -prune \  -o -path '*/*/*/*/*'     ! -path '*/*/*/*/*/*'       ! -name q1   -prune \  -o -path '*/*/*/*/*/*'   ! -path '*/*/*/*/*/*/*'     ! -name r1   -prune \  -o -path '*/*/*/*/*/*/*' ! -path '*/*/*/*/*/*/*/*'     -name info -print \  \)

Your find does support -mindepth and -maxdepth, so you don't really need to replace them; it probably does not support -depth n though.

The code is somewhat redundant, I mean not minimal. In particular:

  • Because of the first -path '*/*/*/*' (or -mindepth 3), -path '*/*/*/*' in the line with -name p1 is not needed.
  • Because we prune at depth 6 (or use -maxdepth 6), ! -path '*/*/*/*/*/*/*/*' in the line with -name info is not needed.

I left the redundant parts, so the structure is clearly shown and nothing will be missing if you decide to add lines for depth 2 and/or depth 7 (of course you will need to adjust -mindepth and/or -maxdepth accordingly).

In the question you mentioned regex. Our code uses -name to decide what to prune at what depth, but if your find supports -regex then use it at will. Keep in mind -regex operates on the whole pathname like -path, not on the basename like -name.

-regex can be used to build replacements of -depth n, so n will appear explicitly. In GNU find 4.9.0 in Debian 12 the following works for me:

find . \  -regextype egrep \  -mindepth 3 \  -maxdepth 6 \     -regex '([^/]*/){3}[^/]*' ! -name p1   -prune \  -o -regex '([^/]*/){4}[^/]*' ! -name q1   -prune \  -o -regex '([^/]*/){5}[^/]*' ! -name r1   -prune \  -o -regex '([^/]*/){6}[^/]*'   -name info -print

There may be a difference in performance between replacements of -depth n with -path and replacements with -regex. Do your own tests.


Viewing all articles
Browse latest Browse all 837

Trending Articles