Introduction
My personal mailbox, with emails going back to the late 90s, is full of old attachments that bloat the PST file, but aren't really needed. The PST file, with attachments, is around 40Gb now.
I decided to write a simple C# console app to extract them to reduce the size of my PST file.
The application itself will perform a few simple tasks:
- Find the root folder in the Outlook Datastore
- Iterate recursively through the folder structure
- Iterate through each email message in each folder, looking for attachments
- When found, save each attachment in a folder structure on the hard disk, representing the Outlook folder structure
Prerequisites
Firstly, create a C# console application in Visual Studio, targeting the .NET 4.5 or higher framework.
The application makes use of the Microsoft.Office.Interop.Outlook assembly, so you'll need to add this as a reference in your project.
The Outlook Primary Interop Assembly (PIA) Reference provides help for developing managed applications for Outlook 2013 and 2016. It extends the Outlook 2013 and 2016 Developer Reference from the COM environment to the managed environment, allowing you to interact with Outlook from a .NET application.
You also need to have Microsoft Outlook installed on your PC - otherwise the Interop assembly has nothing to talk to.
Learn more on MSDN.
Iterating through Outlook Accounts
Before we can go through each folder and email in Outlook, we need to find an actual account, and build the root folder from this.
The root folder is in the format \\foldername\, and the inbox is located one level below this, at \\foldername\Inbox\.
To do this, we simply iterate through the Outlook.Application.Session.Accounts
collection.
Outlook.Application Application = new Outlook.Application();
Outlook.Accounts accounts = Application.Session.Accounts;
foreach (Outlook.Account account in accounts)
{
Console.WriteLine(account.DisplayName);
}
From these, we can derive the root folder name.
Recursing Through Folders
Using the function below, we initially pass it the root folder. It then looks for any child (sub) folders, and passes this to itself recursively, following the folder structure until it reaches the end.
static void EnumerateFolders(Outlook.Folder folder)
{
Outlook.Folders childFolders = folder.Folders;
if (childFolders.Count > 0)
{
foreach (Outlook.Folder childFolder in childFolders)
{
if (childFolder.FolderPath.Contains("Inbox"))
{
Console.WriteLine(childFolder.FolderPath);
EnumerateFolders(childFolder);
}
}
}
}
Iterating through Emails in a Folder and Listing Their Attachments
Using the function below, we initially pass it the current folder. It will then iterate through the folder.Items
object, which literally contains a collection of the actual email messages in the Outlook folder.
Each email is returned as an item, containing the property .Attachments.Count
, which indicates how many attachments the email message has.
Where this is not zero (!= 0
), we simply list out each attachment in the email. From here, you can save the attachment, delete it, or otherwise process it however you wish.
static void IterateMessages(Outlook.Folder folder)
{
var fi = folder.Items;
if (fi != null)
{
foreach (Object item in fi)
{
Outlook.MailItem mi = (Outlook.MailItem)item;
var attachments = mi.Attachments;
if (attachments.Count != 0)
{
for (int i = 1; i <= mi.Attachments.Count; i++)
{
Console.WriteLine("Attachment: " + mi.Attachments[i].FileName);
}
}
}
}
}
Looking for Specific Types of Attachments
It's quite common for Outlook to store embedded images (such as logos in an email) and other files you wouldn't normally need as attachments, so I create an array of extension types that I'd like to extract, ignoring those that aren't useful to me.
By comparing the attachment filename to the array of extensions, I can then determine what to keep.
As this is only performing a basic string
comparison, any file containing one of the string
s in the array will be identified. For example, both hellowworld.doc (Office) and hellowworld.docx (Office Open XML format from Outlook 2007 onwards) contain .doc so will both be identified.
string[] extensionsArray = { ".pdf", ".doc",
".xls", ".ppt", ".vsd", ".zip",
".rar", ".txt", ".csv", ".proj" };
if (extensionsArray.Any(mi.Attachments[i].FileName.Contains)) {
}
Saving and Deleting the Attachments
Saving each attachment is remarkably easy, and the assembly provides a function to perform the save to the local disk. In the example below, pathToSaveFile
is a local disk path, such as c:\temp\.
mi.Attachments[i].SaveAsFile(pathToSaveFile);
Similarly, deleting attachments is as simple as invoking the .Delete
function.
mi.Attachments[i].Delete();
In the example code below, we save each attachment to a folder based on the structure:
(basepath)(accountname)(folderstructure)(sender)
Download
You can download the code to this project from GitHub, or check out the code below.
Download Follow @matthewproctor
The Full Code
using System;
using System.Linq;
using System.IO;
using Outlook = Microsoft.Office.Interop.Outlook;
namespace OutlookAttachmentExtractor
{
class Program
{
static string basePath = @"c:\temp\emails\";
static int totalfilesize = 0;
static void Main(string[] args)
{
EnumerateAccounts();
}
static void EnumerateFolders(Outlook.Folder folder)
{
Outlook.Folders childFolders = folder.Folders;
if (childFolders.Count > 0)
{
foreach (Outlook.Folder childFolder in childFolders)
{
if (childFolder.FolderPath.Contains("Inbox"))
{
Console.WriteLine(childFolder.FolderPath);
EnumerateFolders(childFolder);
}
}
}
Console.WriteLine("Looking for items in " + folder.FolderPath);
IterateMessages(folder);
}
static void IterateMessages(Outlook.Folder folder)
{
string[] extensionsArray = { ".pdf", ".doc", ".xls",
".ppt", ".vsd", ".zip", ".rar", ".txt", ".csv", ".proj" };
var fi = folder.Items;
if (fi != null)
{
try
{
foreach (Object item in fi)
{
Outlook.MailItem mi = (Outlook.MailItem)item;
var attachments = mi.Attachments;
if (attachments.Count != 0)
{
if (!Directory.Exists(basePath + folder.FolderPath))
{
Directory.CreateDirectory(basePath + folder.FolderPath);
}
for (int i = 1; i <= mi.Attachments.Count; i++)
{
var fn = mi.Attachments[i].FileName.ToLower();
if (extensionsArray.Any(fn.Contains))
{
if (!Directory.Exists(basePath + folder.FolderPath +
@"\" + mi.Sender.Address))
{
Directory.CreateDirectory(basePath +
folder.FolderPath + @"\" + mi.Sender.Address);
}
totalfilesize = totalfilesize + mi.Attachments[i].Size;
if (!File.Exists(basePath + folder.FolderPath + @"\" +
mi.Sender.Address + @"\" + mi.Attachments[i].FileName))
{
Console.WriteLine("Saving " + mi.Attachments[i].FileName);
mi.Attachments[i].SaveAsFile(basePath + folder.FolderPath +
@"\" + mi.Sender.Address + @"\" +
mi.Attachments[i].FileName);
}
else
{
Console.WriteLine("Already saved " + mi.Attachments[i].FileName);
}
}
}
}
}
}
catch (Exception e)
{
}
}
}
static string EnumerateAccountEmailAddress(Outlook.Account account)
{
try
{
if (string.IsNullOrEmpty(account.SmtpAddress) || string.IsNullOrEmpty(account.UserName))
{
Outlook.AddressEntry oAE = account.CurrentUser.AddressEntry as Outlook.AddressEntry;
if (oAE.Type == "EX")
{
Outlook.ExchangeUser oEU = oAE.GetExchangeUser() as Outlook.ExchangeUser;
return oEU.PrimarySmtpAddress;
}
else
{
return oAE.Address;
}
}
else
{
return account.SmtpAddress;
}
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
return "";
}
}
static void EnumerateAccounts()
{
Console.Clear();
Console.WriteLine("Outlook Attachment Extractor v0.1");
Console.WriteLine("---------------------------------");
int id;
Outlook.Application Application = new Outlook.Application();
Outlook.Accounts accounts = Application.Session.Accounts;
string response = "";
while (true == true)
{
id = 1;
foreach (Outlook.Account account in accounts)
{
Console.WriteLine(id + ":" + EnumerateAccountEmailAddress(account));
id++;
}
Console.WriteLine("Q: Quit Application");
response = Console.ReadLine().ToUpper();
if (response == "Q")
{
Console.WriteLine("Quitting");
return;
}
if (response != "")
{
if (Int32.Parse(response.Trim()) >= 1 && Int32.Parse(response.Trim()) < id)
{
Console.WriteLine("Processing: " +
accounts[Int32.Parse(response.Trim())].DisplayName);
Console.WriteLine("Processing: " +
EnumerateAccountEmailAddress(accounts[Int32.Parse(response.Trim())]));
Outlook.Folder selectedFolder =
Application.Session.DefaultStore.GetRootFolder() as Outlook.Folder;
selectedFolder = GetFolder(@"\\" +
accounts[Int32.Parse(response.Trim())].DisplayName);
EnumerateFolders(selectedFolder);
Console.WriteLine("Finished Processing " +
accounts[Int32.Parse(response.Trim())].DisplayName);
Console.WriteLine("");
}
else
{
Console.WriteLine("Invalid Account Selected");
}
}
}
}
static Outlook.Folder GetFolder(string folderPath)
{
Console.WriteLine("Looking for: " + folderPath);
Outlook.Folder folder;
string backslash = @"\";
try
{
if (folderPath.StartsWith(@"\\"))
{
folderPath = folderPath.Remove(0, 2);
}
String[] folders = folderPath.Split(backslash.ToCharArray());
Outlook.Application Application = new Outlook.Application();
folder = Application.Session.Folders[folders[0]] as Outlook.Folder;
if (folder != null)
{
for (int i = 1; i <= folders.GetUpperBound(0); i++)
{
Outlook.Folders subFolders = folder.Folders;
folder = subFolders[folders[i]] as Outlook.Folder;
if (folder == null)
{
return null;
}
}
}
return folder;
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
return null;
}
}
}
}
Testing
I've tested this code on mailboxes hosted with an on-premises Exchange 2013 environment, Office 365 and a POP3/IMAP mailbox as well - all functioning exactly the same.
Further Reading
The links below provide more information on how to use the Outlook Interop service.