See how easy it is to add full-text indexing and implement set operations in a fun little Windows app.
Introduction
My Media Search is a fun little Windows app for finding and enjoying your media files.
You tell it what directories to index like Pictures, Music, and Videos...
...and it indexes the files in those folders (and not in ones you don't want) to support fast searches...
...then you can search for silliness...
Once you get a list of search results, you can open the files, open containing folders, and get all the detailed metadata the Windows maintains for each file:
Man, that's a lot of metadata!
My Media Search Explained
My Media Search is a Windows Forms application, using .NET Framework 4.7.2. I wanted to use .NET 5.0, but the only out-of-the-box code I knew of for getting file properties and thumbnails was from Windows APIs, namely the Microsoft.WindowsAPICodePack
libraries.
You had to see this coming... My Media Search is powered by the metastrings database! I retooled metastrings for this application, and for its general coherence as software:
- Dropped support for MySQL. Using metastrings only makes sense where performance and scalability are not issues. This positions it to be useful for adding lightweight database support to applications, not to be have to support being part of an unwieldy client-server database solution.
- With MySQL out of the way, I was able to lift the length limit on strings, as SQLite has no such limitation. This means you won't have to use the "long strings" API, which I was tempted to remove, but mscript uses it, so it lives on. For now.
- Cementing the coherent role as a small, easy to use database, I added full-text indexing for all string values. Not long strings, just the Define / SELECT stuff. With full-text in place, metastrings was poised to implement My Media Search.
Code Overview
The lib project has the SearchInfo
class that implements most all non-UI functionality.
The cmd
project is a proof-of-concept for SearchInfo
. You can use it to index an arbitrary directory, update the index, and perform searches. Note that it indexes the one directory you give it, erasing indexing for any other directories. Just a proof of concept.
using System;
using System.Threading.Tasks;
using System.Collections.Generic;
using System.IO;
namespace fql
{
class Program
{
[STAThread]
static async Task<int> Main(string[] args)
{
if (args.Length < 1)
{
Console.WriteLine("Usage: <directory path>");
return 0;
}
string dirPath = args[0].Trim();
if (!Directory.Exists(dirPath))
{
Console.WriteLine("ERROR: Directory does not exist: {0}", dirPath);
return 1;
}
#if !DEBUG
try
#endif
{
while (true)
{
Console.WriteLine();
Console.WriteLine("Commands: reset, update, search, quit");
Console.WriteLine();
Console.Write("> ");
string line = Console.ReadLine().Trim();
if (string.IsNullOrEmpty(line))
continue;
Console.WriteLine();
if (line == "reset")
{
SearchInfo.Reset();
Console.WriteLine("DB reset");
}
else if (line == "update")
{
var updateResult =
await SearchInfo.UpdateAsync
(
new List<string> { dirPath },
new List<string>(),
OnDirectoryUpdate
);
Console.WriteLine("DB updated: files added:
{0} - removed: {1} - modified: {2} - indexed: {3}",
updateResult.filesAdded,
updateResult.filesRemoved,
updateResult.filesModified,
updateResult.indexSize);
}
else if (line.StartsWith("search "))
{
var results = await SearchInfo.SearchAsync
(line.Substring("search ".Length).Trim());
Console.WriteLine($"Search results: {results.Count}");
foreach (var result in results)
Console.WriteLine(result);
}
else if (line == "quit")
{
Console.WriteLine("Quitting...");
break;
}
}
}
#if !DEBUG
catch (Exception exp)
{
Console.WriteLine("EXCEPTION: {0}", exp);
return 1;
}
#endif
Console.WriteLine("All done.");
return 0;
}
static void OnDirectoryUpdate(UpdateInfo update)
{
Console.WriteLine(update.ToString());
}
}
}
The app
project is the top-level Windows Forms application, nothing too interesting there, just usual Windows Forms stuff.
The SearchInfo Class
Get All Windows Metadata For a File
public static Dictionary<string, string> GetFileMetadata(string filePath)
{
Dictionary<string, string> metadata = new Dictionary<string, string>();
Shell32.Shell shell = new Shell32.Shell();
Shell32.Folder objFolder = shell.NameSpace(Path.GetDirectoryName(filePath));
List<string> headers = new List<string>();
for (int i = 0; i < short.MaxValue; ++i)
{
string header = objFolder.GetDetailsOf(null, i);
if (string.IsNullOrEmpty(header))
break;
headers.Add(header);
}
if (headers.Count == 0)
return metadata;
foreach (Shell32.FolderItem2 item in objFolder.Items())
{
if (!filePath.Equals(item.Path, StringComparison.OrdinalIgnoreCase))
continue;
for (int i = 0; i < headers.Count; ++i)
{
string details = objFolder.GetDetailsOf(item, i);
if (!string.IsNullOrWhiteSpace(details))
metadata.Add(headers[i], details);
}
}
return metadata;
}
That code is one of the requirements for using .NET Framework and not .NET 5+. Without this function and without the tiles view of search results, .NET 5+ would probably work.
The Search Index Algorithm
- Gather all file system file paths and last modified dates from the chosen directories
- Remove all file system file paths in the exclusion directories
- Gather all database file paths and last modified dates
ProcessFiles
: March all file paths - file system and database - determining which to add or remove from the database - Do database operations to update the index
Here's the implementation of the central ProcessFiles
function:
private static void ProcessFiles(IEnumerable<string> filePaths, DirProcessInfo info)
{
List<string> filesToAdd = new List<string>();
List<object> filesToRemove = new List<object>();
foreach (string filePath in filePaths)
{
bool inDb = info.filesLastModifiedInDb.ContainsKey(filePath);
bool inFs = info.filesLastModifiedInFs.ContainsKey(filePath);
if (inDb && !inFs)
{
++info.filesRemoved;
filesToRemove.Add(filePath);
continue;
}
if (inFs && !inDb)
{
++info.filesAdded;
filesToAdd.Add(filePath);
continue;
}
if (!inDb && !inFs)
{
++info.filesRemoved;
filesToRemove.Add(filePath);
continue;
}
if (info.filesLastModifiedInDb[filePath] < info.filesLastModifiedInFs[filePath])
{
++info.filesModified;
filesToAdd.Add(filePath);
continue;
}
}
info.toDelete = filesToRemove;
info.toAdd = new List<Tuple<string, long, string>>(filesToAdd.Count);
foreach (string filePath in filesToAdd)
{
string searchData =
filePath.Substring(UserRoot.Length)
.Replace(Path.DirectorySeparatorChar, ' ')
.Replace('.', ' ');
while (searchData.Contains(" "))
searchData = searchData.Replace(" ", " ");
searchData = searchData.Trim();
long lastModified = info.filesLastModifiedInFs[filePath];
info.toAdd.Add
(
new Tuple<string, long, string>(filePath, lastModified, searchData)
);
}
}
The code for computing the searchData
string for full-text indexing splits up the path component, strips out file extensions, eliminates double spaces, and trims the result.
Once ProcessFiles
figures out what needs to be done, this code interacts with metastrings to do the deed:
using (var ctxt = msctxt.GetContext())
{
update.Start("Cleaning search index...", dirProcInfo.toDelete.Count);
updater?.Invoke(update);
await ctxt.Cmd.DeleteAsync("files", dirProcInfo.toDelete);
update.Start("Updating search index...", dirProcInfo.toAdd.Count);
updater?.Invoke(update);
Define define = new Define("files", null);
foreach (var tuple in dirProcInfo.toAdd)
{
define.key = tuple.Item1;
define.metadata["filelastmodified"] = tuple.Item2;
define.metadata["searchdata"] = tuple.Item3;
await ctxt.Cmd.DefineAsync(define);
++update.current;
if ((update.current % 100) == 0)
updater?.Invoke(update);
}
}
Conclusion
So build the app and enjoy playing with it. I think you will find it useful for digging through your thousands of pictures and songs to find just what you're looking for.
Implementing this app was made easy by metastrings.
I hope you now have confidence adding full-text searching to your applications.
Enjoy!
History
- 29th August, 2021: Initial version