Solution
With GNU toolset:
find . -type f -exec sh -c 'LC_ALL=C stat --printf="%.Y|%y|%n\0" -- "$@"' find-sh {} + \| LC_ALL=C sort -zr -t '|' -k 1,1 \| LC_ALL=C sort -zsu -t '|' -k 2.1,2.7 \| cut -d '|' -zf 3- \| tr '\0''\n'
Adjust the invocation of find
(up to but excluding -exec
) to your needs.
Explanation
For each file that gets to
-exec
,LC_ALL=C stat --printf='%.Y|%y|%n\0'
is run. Its output consists of lines like1711542530.762649374|2024-03-27 13:28:50.762649374 +0100|./path to/something
where the first
|
-separated field is the time of last data modification, seconds since Epoch (with precision); the second field is the time of last data modification, human-readable. Each line is null-terminated, so newlines (if any) in the pathname should be safe. Only the first two|
characters will matter later, they both come from the format for sure, so|
(if any) in the pathname should also be safe (see the explanation ofcut
below).I used
LC_ALL=C
to make the format independent from your current locale. NoteLC_ALL=C find …
would affectfind
and everything it runs, in general this may be unwanted; so instead of-exec stat …
I used-exec sh -c …
and this way I was able to setLC_ALL=C
only forstat
.Then the first
sort
sorts lines according to the first|
-separated field. Lines associated with files recently modified will end up first. Our format is so strict that the default way of sorting in theC
locale will work.The second
sort
considers only theYYYY-MM
(year-month) part of the second field (2024-03
in the example) and because of-u
(--unique
) it passes only one line perYYYY-MM
. With-s
(--stable
) this is the line associated with the most recently modified file perYYYY-MM
, because the firstsort
has already placed most recent files first.Then for each line
cut
prints|
-separated fields from the 3rd one to the last one. In each line this is a pathname. Formally a pathname containing (one or more)|
characters will form the 3rd, 4th and possibly later fields, but as the fields in the output will also be separated by|
, the output will be the exact pathname anyway.Finally
tr
converts null bytes to newlines, just to make the output human-readable (but also potentially ambiguous).
Notes
If you want to process the result further, keep it in the form of null-terminated strings if possible. In other words: a tool expecting null-terminated strings (e.g.
xargs -r0 …
) and placed instead oftr
is better than a tool expecting newline-terminated strings and placed aftertr
.Linux timestamps are just numbers, without the notion of timezone. Your
stat
will "translate" them to your current timezone. In particular it will assign files toYYYY-MM
according to your current timezone and this can give different results in different timezones. E.g. a file modified (no matter where) around2024-04-01 00:00:00 UTC
will be assigned to2024-04
if you are in India (the file was modified when India had already experienced few hours of the new month), but to2024-03
if you are in Mexico (the file was modified when Mexico had few hours of the old month yet to come).You may wonder if we really need two
sort
s. At first glance we don't need%.Y
fromstat
, sorting by the well-defined%y
should be enough. Well, I it's not. Consider these two lines:1698542400.000000000|2023-10-29 02:20:00.000000000 +0100|./newer1698540000.000000000|2023-10-29 02:40:00.000000000 +0200|./older
This example is in the
Europe/Warsaw
timezone. Thenewer
file is indeed newer than theolder
file, seconds since Epoch show this and the order is like from our firstsort
: newest first. But if I sorted by the second|
-separated field and tried to achieve "newest first", then it would appear the other way around. The truth is02:40
for theolder
file happened before my clocks were set from03:00
back to02:00
due to the end of Daylight Saving Time that year;02:20
for thenewer
file happened after. There is no ambiguity, the strings+0200
and+0100
carry the information; butsort
does not understand the format. This is why in the solution we first sort by seconds since Epoch, then we use the secondsort
to pick the newest (and I mean really newest) file perYYYY-MM
.I think GNU
date --reference
can be used instead ofstat
to get the mtime of a file. I chosestat
though.If you are interested in the result for a certain
YYYY-MM
, placegrep
betweenfind
and the firstsort
. E.g. for2024-02
it may be:LC_ALL=C grep -Zza '^[^|]*|2024-02'
An empty result means there is no file modified that month. With such
grep
the entire solution should pass at most one null-terminated line totr
. More than one null-terminated line passed totr
means my solution is buggy.Your locale probably uses UTF-8, but in general pathnames may contain sequences of bytes that are invalid in UTF-8. I used
LC_ALL=C grep -a
, sogrep
should not complain.find-sh
is explained here: What is the second sh insh -c 'some shell code' sh
?