Unix find is a great utility for pattern searching for files. This article shows a way of using it to organize or clean up a messy file system.
Introduction
This tip shows how the Unix findutils can be used to organize or clean up a messy file system.
Background
This article was inspired by a stack exchange post and a script by phunehehe that he made available on github.
Using the Code
The below script regorg.sh works the way that it searches a target folder structure for configuration files named regex.conf, evaluates the regular expression with find and moves all matching files from a source folder structure into that target folder. An example below the script illustrates the usage.
Script regorg.sh
FILE_SEARCH_ROOT=$1
echo "File search root is: $FILE_SEARCH_ROOT"
CONF_SEARCH_ROOT=$2
echo "Conf search root is: $CONF_SEARCH_ROOT"
CONF_FILE_LIST=$(find $CONF_SEARCH_ROOT -name regex.conf)
echo "Looping: \n$CONF_FILE_LIST"
for CONF_FILE in $CONF_FILE_LIST
do
CONF=$(<$CONF_FILE)
CONF_PATH=$(dirname $CONF_FILE)
echo "Searching matches using regex: $CONF from: $CONF_FILE excluding path: $CONF_PATH"
find -E $FILE_SEARCH_ROOT -type f -regex $CONF -not -path "$CONF_PATH/*"
-print -exec mv -i '{}' $CONF_PATH \;
done
echo "Looping done."
echo "Clean Up: $FILE_SEARCH_ROOT done."
Example
Let's say you have a folder download with different file types as shown below:
ls -1 ./download/
file1.jpg
file1.mp3
file1.txt
file2.doc
file2.jpg
file2.mp3
file3.bmp
file3.pdf
file3.wav
To have that cleaned up, follow these steps:
- Setup a target folder structure.
- In each folder in the target folder structure, create a configuration file that contains the test definition for regex used by the find utility.
- Run the script below to loop over the target folder structure, collect the configuration files and evaluate the test against the messy source folder.
Setup Target Folder Structure
In our example, we want to organize the files by their content and therefore set up the below folders in the folder home:
$ cd home/
$ mkdir docs
$ mkdir music
$ mkdir pics
Create Configuration Files
The script regorg.sh expects the configuration files to have the name regex.conf. We therefore create a regex.conf file in each directory in the target folders with the below content:
$ cat ./home/docs/regex.conf
.*\.(txt|doc|pdf)
$ cat ./home/pics/regex.conf
.*\.(gif|jpeg|bmp|tiff|jpg)
$ cat ./home/music/regex.conf
.*\.(mp3|wav)
Run the Script
Before running the script, the directory structure in the example would look like this:
$ ls -1Rp
download/
home/
regorg.sh
./download:
file1.jpg
file1.mp3
file1.txt
file2.doc
file2.jpg
file2.mp3
file3.bmp
file3.pdf
file3.wav
./home:
docs/
music/
pics/
./home/docs:
regex.conf
./home/music:
regex.conf
./home/pics:
regex.conf
Running the script generates the output as per below:
$ ./regorg.sh ./download ./home
File search root is: ./download
Conf search root is: ./home
Looping:
./home/docs/regex.conf
./home/music/regex.conf
./home/pics/regex.conf
Searching matches using regex: .*\.(txt|doc|pdf)
from: ./home/docs/regex.conf excluding path: ./home/docs
./download/file1.txt
./download/file2.doc
./download/file3.pdf
Searching matches using regex: .*\.(mp3|wav)
from: ./home/music/regex.conf excluding path: ./home/music
./download/file1.mp3
./download/file2.mp3
./download/file3.wav
Searching matches using regex: .*\.(gif|jpeg|bmp|tiff|jpg)
from: ./home/pics/regex.conf excluding path: ./home/pics
./download/file1.jpg
./download/file2.jpg
./download/file3.bmp
Looping done.
Clean Up: ./download done.
After running the script, the directory structure in the example would look like this:
$ ls -1Rp
download/
home/
regorg.sh
./download:
./home:
docs/
music/
pics/
./home/docs:
file1.txt
file2.doc
file3.pdf
regex.conf
./home/music:
file1.mp3
file2.mp3
file3.wav
regex.conf
./home/pics:
file1.jpg
file2.jpg
file3.bmp
regex.conf
After the script has run, all files from download are moved to the target directories.
Extra care needs to be taken when the script is used with a target directory structure that already contains files of the same name as they might be overwritten.
Points of Interest
The above version is based on the file extension and case sensitive regex matching. You may, however, change the script to use iregex which is case insensitive or change the configuration files to contain any match pattern.
The above script was developed on MacOS with Darwin but should run on other Unix / Linux distributions.
History
- 14th October, 2020: Initial version