Solution
With GNU toolset:
find . -type f -exec sh -c 'LC_ALL=C stat --printf="%.Y|%y|%n\0" -- "$@"' find-sh {} + \| LC_ALL=C sort -zr -t '|' -k 1,1 \| LC_ALL=C sort -zsu -t '|' -k 2.1,2.7 \| cut -d '|' -zf 3- \| tr '\0''\n'Adjust the invocation of find (up to but excluding -exec) to your needs.
Explanation
For each file that gets to
-exec,LC_ALL=C stat --printf='%.Y|%y|%n\0'is run. Its output consists of lines like1711542530.762649374|2024-03-27 13:28:50.762649374 +0100|./path to/somethingwhere the first
|-separated field is the time of last data modification, seconds since Epoch (with precision); the second field is the time of last data modification, human-readable. Each line is null-terminated, so newlines (if any) in the pathname should be safe. Only the first two|characters will matter later, they both come from the format for sure, so|(if any) in the pathname should also be safe (see the explanation ofcutbelow).I used
LC_ALL=Cto make the format independent from your current locale. NoteLC_ALL=C find …would affectfindand everything it runs, in general this may be unwanted; so instead of-exec stat …I used-exec sh -c …and this way I was able to setLC_ALL=Conly forstat.Then the first
sortsorts lines according to the first|-separated field. Lines associated with files recently modified will end up first. Our format is so strict that the default way of sorting in theClocale will work.The second
sortconsiders only theYYYY-MM(year-month) part of the second field (2024-03in the example) and because of-u(--unique) it passes only one line perYYYY-MM. With-s(--stable) this is the line associated with the most recently modified file perYYYY-MM, because the firstsorthas already placed most recent files first.Then for each line
cutprints|-separated fields from the 3rd one to the last one. In each line this is a pathname. Formally a pathname containing (one or more)|characters will form the 3rd, 4th and possibly later fields, but as the fields in the output will also be separated by|, the output will be the exact pathname anyway.Finally
trconverts null bytes to newlines, just to make the output human-readable (but also potentially ambiguous).
Notes
If you want to process the result further, keep it in the form of null-terminated strings if possible. In other words: a tool expecting null-terminated strings (e.g.
xargs -r0 …) and placed instead oftris better than a tool expecting newline-terminated strings and placed aftertr.Linux timestamps are just numbers, without the notion of timezone. Your
statwill "translate" them to your current timezone. In particular it will assign files toYYYY-MMaccording to your current timezone and this can give different results in different timezones. E.g. a file modified (no matter where) around2024-04-01 00:00:00 UTCwill be assigned to2024-04if you are in India (the file was modified when India had already experienced few hours of the new month), but to2024-03if you are in Mexico (the file was modified when Mexico had few hours of the old month yet to come).You may wonder if we really need two
sorts. At first glance we don't need%.Yfromstat, sorting by the well-defined%yshould be enough. Well, I it's not. Consider these two lines:1698542400.000000000|2023-10-29 02:20:00.000000000 +0100|./newer1698540000.000000000|2023-10-29 02:40:00.000000000 +0200|./olderThis example is in the
Europe/Warsawtimezone. Thenewerfile is indeed newer than theolderfile, seconds since Epoch show this and the order is like from our firstsort: newest first. But if I sorted by the second|-separated field and tried to achieve "newest first", then it would appear the other way around. The truth is02:40for theolderfile happened before my clocks were set from03:00back to02:00due to the end of Daylight Saving Time that year;02:20for thenewerfile happened after. There is no ambiguity, the strings+0200and+0100carry the information; butsortdoes not understand the format. This is why in the solution we first sort by seconds since Epoch, then we use the secondsortto pick the newest (and I mean really newest) file perYYYY-MM.I think GNU
date --referencecan be used instead ofstatto get the mtime of a file. I chosestatthough.If you are interested in the result for a certain
YYYY-MM, placegrepbetweenfindand the firstsort. E.g. for2024-02it may be:LC_ALL=C grep -Zza '^[^|]*|2024-02'An empty result means there is no file modified that month. With such
grepthe entire solution should pass at most one null-terminated line totr. More than one null-terminated line passed totrmeans my solution is buggy.Your locale probably uses UTF-8, but in general pathnames may contain sequences of bytes that are invalid in UTF-8. I used
LC_ALL=C grep -a, sogrepshould not complain.find-shis explained here: What is the second sh insh -c 'some shell code' sh?