This is a small tutorial on how to use file name databases on MacOS Unix with tips and tricks to get around pitfalls.
Introduction
As probably all Unix systems, MacOS also supports file name databases which are part of the Unix findutils. File name databases are relatively unknown to occasional Unix users but provide some useful features that are worth exploring.
Background
I myself came across the file name database feature when searching the internet for ways to organize the files on my Mac. While file name databases are not doing the job of organizing your files, they can become quite useful for searching files by name.
Using the Code
Usually, you would use the Unix find
command to search for files by name. Say, you want to find all files with extension jpg
starting in the current directory and searching all its subfolders, you would enter at the Unix prompt:
$ find . -name *.jpg -print
This works perfectly fine but find
will do a full scan of the file system it is supposed to search each time it is invoked. If there are not too many directories to search, this is still efficient. However, if you repeatedly need to search for files in a larger directory structure, it is more efficient to use a file name database.
A file name database is a file that contains a list of file names including the full path where the files are located. When searching for files using a file name database, you would not scan the directory structure but just lookup the file and path in the file name database which is of course much quicker for large directory structures.
So to search for all files in your home directory and all its subfolders with extension jpg
, you would enter at the Unix prompt:
$ locate $HOME/*.jpg
The drawback, however, is that in order to be able to use a file name database in your search, you need to first build one and then update it regularly.
Building a File Name Database
Before you can use the locate
command on the Unix prompt, you first need to build a file name database which is done by running the /usr/libexec/locate.updatedb command.
Running the command can be done straight away, however it is advisable to first look at the settings that are used to build the file name database.
Settings for Building the File Name Database
The /usr/libexec/locate.updatedb command takes the settings for building the file name database from the following variables:
TMPDIR
: This is the directory which is used for temporary files. FCODES
: This variable holds the name and path of the file name database. SEARCHPATHS
: This variable holds a list of paths to be searched. PRUNEPATHS
: This variable holds a list of paths inside the paths of SEARCHPATHS
to be excluded.
The values of these variables are set as per below:
- The /usr/libexec/locate.updatedb command first checks for the environment variable
LOCATE_CONFIG
. If it is set to a file name, the variable settings will be taken from this file. - In case the environment variable
LOCATE_CONFIG
is not set, the variable settings will be taken from the /etc/locate.rc file. - In case the /etc/locate.rc file does not exist or contains no settings, defaults will be used that are hard coded in /usr/libexec/locate.updatedb
Usually, neither the LOCATE_CONFIG
environment variable is set nor the /etc/locate.rc file has any variable settings, so the defaults are used which are:
TMPDIR="/tmp"
FCODES="/var/db/locate.database"
SEARCHPATHS="/"
PRUNEPATHS="/tmp /var/tmp"
So with these settings, the /usr/libexec/locate.updatedb command would search the complete directory structure starting from the root directory (/) excluding directories /tmp /var/tmp.
At first glance, these settings look like a good starting point, however there is a twist to it as only files are added to the file name database that the user under which the command is run has actually access to.
As another twist, running it with sudo as super user will also not give you the full picture due to the internal workings of locate.updatedb
.
Internal Workings of locate.updatedb
The command locate.updatedb
is in fact a Unix shell script that basically does the following:
- in case it is invoked by super user (using
sudo
), it recursively calls itself under user nobody
. Otherwise (which is also the case in the recursive call as user nobody
), it starts directly with the following next step. - It calls another Unix script /usr/libexec/locate.mklocatedb that uses the
find
command to search for all files starting in the directory tree(s) specified by the SEARCHPATHS
variable (omitting the subtrees specified by the PRUNEPATHS
variable) and writes them with their full path to a temporary file name database. - It copies the content of the temporary file name database to the name and location specified by the
FCODES
variable.
This means that if you run it as super user using sudo
, you will end up with a file name database that only contains file names of files to which the user nobody
would have access to.
The reason for this behavior is this: As Unix is a multi user system and the file name database is accessible by every user, users could query each other's directory structure and the names of files therein which they normally could not.
A more sensible approach would therefore be to have individual file name databases for each user containing the directory structure and files of their respective home directories and a single central one for all other directories (excluding users' home directories).
Creation of a Central File Name Database
To create a central file name database excluding users' home directories as outlined in the previous section, edit the /etc/locate.rc file as per below:
$ sudo -e /etc/locate.rc
#
# /etc/locate.rc - command script for updatedb(8)
#
# $FreeBSD: src/usr.bin/locate/locate/locate.rc,v 1.9 2005/08/22 08:22:48 cperciva Exp $
#
# All commented values are the defaults
#
# temp directory
TMPDIR="/tmp"
# the actual database
#FCODES="/var/db/locate.database"
# directories to be put in the database
SEARCHPATHS="/"
# directories unwanted in output
PRUNEPATHS="/tmp /var/tmp /Users /Volumes"
# filesystems allowed. Beware: a non-listed filesystem will be pruned
# and if the SEARCHPATHS starts in such a filesystem locate will build
# an empty database.
#
# be careful if you add 'nfs'
FILESYSTEMS="hfs ufs apfs"
This will search the directory structure starting with the root directory (/) omitting /tmp /var/tmp /Users and /Volumes. As you might have noticed, the FCODES
Variable is not commented out. See the Points of Interest section below on the reasons behind it.
Once you have made the changes to /etc/locate.rc, you may start the creation of the file name database by running /usr/libexec/locate.updatedb as super user:
$ sudo /usr/libexec/locate.updatedb
If no output is printed, the command completed successfully and you will be able to use this file name database to locate files. You can check this out by trying the following examples (it is important that you start with a / before the *
):
$ locate /*.txt
$ locate /*.jpg
These should return a more or less lengthy output.
To check that user directories were not scanned, run the below command:
$ locate /Users
This should not return any files in user directories.
Creation of an Individual File Name Databases per User
To create individual file name databases for directories and files in a user's home directory, first copy the /etc/locate.rc file to /etc/locate.users.rc and then edit it as per below:
$ sudo cp /etc/locate.rc /etc/locate.users.rc
$ sudo -e /etc/locate.users.rc
#
# Configuration for user home directory search
#
# temp directory
TMPDIR="/tmp"
# the actual database
FCODES="$HOME/locate.user.database"
# directories to be put in the database
SEARCHPATHS="$HOME"
# directories unwanted in output
# PRUNEPATHS="/tmp /var/tmp /Users /Volumes"
# filesystems allowed. Beware: a non-listed filesystem will be pruned
# and if the SEARCHPATHS starts in such a filesystem locate will build
# an empty database.
#
# be careful if you add 'nfs'
FILESYSTEMS="hfs ufs apfs"
Once you have made the changes to /etc/locate.users.rc,
you may start the creation by running /usr/libexec/locate.updatedb
as per below:
$ export LOCATE_CONFIG="/etc/locate.users.rc";/usr/libexec/locate.updatedb
If no output is printed, the command completed successfully and you will be able to use this file name database to locate files. You can check this out by trying the following examples (it is important that you start with a / before the *
):
$ locate -d $HOME/locate.user.database /*.txt
$ locate -d $HOME/locate.user.database /*.jpg
These should return a more or less lengthy output with files from your home directory. The -d
option tells locate
to use the user's individual file name database.
Updating the File Name Directories
As files get continuously added, renamed or removed and also the directory structure is subject to change, you will need to regularly update the file name directories. To update the file name directories, you need to follow the same steps as for building them as outlined above, either manually or in a /System/Library/LaunchDaemons/com.apple.locate.plist job.
When manually updating the central file name database, first check that the value of LOCATE_CONFIG
is not pointing to the configuration for the user file name database.
Points of Interest
The way the /usr/libexec/locate.updatedb script is implemented, it creates a temporary file name database in Step 1 (see section Internal Workings of locate.updatedb above) when invoked as super user that it then passes on to the recursive invocation as user nobody
as value for the variable FCODES
and to which the content of the other temporary file name database is copied to in Step 3. In Step 2 however, the FCODES
value from the configuration file (/etc/locate.rc or specified by LOCATE_CONFIG
variable) is loaded if set and is overwriting the FCODES
value.
Very confusing but the bottom line is that in case you set FCODES
to a value in the configuration file for locate.updatedb
(even if it is to the default value) and invoke the script as super user, you will get the below error message and the script is aborted:
/usr/libexec/locate.updatedb: line 97: /var/db/locate.database: Permission denied
Also when invoked as super user, the script uses /var/db/locate.database hardcoded as the final file name database so even without the permission denied error, the value from /etc/locate.rc would not be used as the final name of the file name database.
I have attached a script to illustrate how a fix for these issues could look like.
History
- 15th October, 2020: Initial version