Introduction
Using dtSearch and the techniques in this article will make your data searches lightning fast, making it possible to search terabytes of data with sub-second response time.
But first, two preliminary notes about this blog post. (1) The blog post describes source code data, but the same approach would apply to other data stored in the Microsoft Azure cloud: HTML, XML, MS Office documents -- even email data. (2) While the data in this blog post resides in the Microsoft Azure cloud, the indexes are on a local PC. A subsequent article will address data and indexes in the cloud.
Here is a workplan of our overall project:
Overall Workplan
In part one of this article we are going to go to the Azure portal and provision the storage account. Naturally, the assumption is that you have signed up for an Azure account. If you have not, it's relatively easy to sign up for a free trial, so you can see if it meets your needs before you commit your money.
Once you provision your storage account, access keys will be automatically generated. These access keys will be copied into our Visual Studio project, because they are the secret keys that give privileged access to your storage account, the place where we’re going to copy the source code to be indexed and later searched.
Part two of this article will show you where we can get the Visual Studio solution with the starter code. This solution will dramatically reduce the amount of work we actually have to do to implement this useful source code searching application. If you install the full edition of the dtSearch Engine, the starter project actually gets installed in your program files folder.
We will be using Visual Studio 2013 with the latest updates. We will also install the latest Azure Storage SDK binaries.
It's in part three where the real work starts. What we want to do here is build the capability to upload your source code into your storage account. There are various utilities that you can download to perform the task of uploading source code to your storage account, but it will be far more convenient if we can build this into our main searching application. Once we finish this retrofit and upgrade, we can then run the application to upload the source code, index it, and then move to part four of our work plan.
Part four will be fast and easy because we will be pretty much done with the difficult work. Part four is about testing and packaging our application. The index files that get generated could be copied to other client computers. That means we can copy the application along with the generated index files to any computer to perform lightning fast source code searches.
Part 1 - Provisioning at the Azure portal
Provisioning the storage account is actually quite simple. At the time of this writing the traditional Azure portal is the place to go. But after the first week of May 2015, Microsoft will release the new portal.
Once you log into the Azure portal, it's a simple matter of navigating to the STORAGE menu item and clicking NEW.
Provisioning a storage account at the Azure portal
A QUICK CREATE menu item will become visible. Click on that to continue.
At this point you are ready to provide the URL, location, and the replication mode. The URL you come up with needs to be globally unique. As you can see "mysourcecode" was not taken. I chose "East US" for my location, but you can choose from among the world’s data centers. A closer data center means lower latency. You can read about replication options here: http://blogs.msdn.com/b/windowsazurestorage/archive/2013/12/11/introducing-read-access-geo-replicated-storage-ra-grs-for-windows-azure-storage.aspx.
When you are done, click CREATE STORAGE ACCOUNT in the lower right corner. It should take less than five minutes to provision your storage account. It took less than a minute for me when I did it.
Creating your storage account
When the portal indicates that your storage account is ONLINE, you are ready to move forward. Click on the small arrow that's pointing right to drill into the details of this newly provisioned storage account.
The provisioned storage account
You are now ready to copy access keys to the clipboard. Click on MANAGE ACCESS KEYS.
Copying the Access Keys
Click on the icon of the red box to copy the PRIMARY ACCESS KEY into your clipboard and store it in a safe place along with the STORAGE ACCOUNT NAME. Both your STORAGE ACCOUNT NAME and your PRIMARY ACCESS KEY will be different from what you see here.
Copying the Storage Account Name and the Primary Access Key
Storage Account Name | mysourcecode |
Primary Access Key | CnQ6dUXdOQ81qSCFJhscuB3PCNM92o4bIuDoKG7mO
7tJ1imxa5sMkzKtnghsG11EwKgxRaTW5g6fFKRcXZ8z6g== |
Part 2 - Locating the starter project
The starter project that ships with the dtSearch Engine can be found under the program files folder here:
- C:\Program Files (x86)\dtSearch Developer\examples\cs4\AzureBlobDemo\AzureBlobDemo.sln
The starter project provides an excellent starting point for us to begin our work. Be sure you are using Visual Studio 2013 with all the latest updates installed.
The project should open up seamlessly, but we want to be sure we have the latest Azure Storage binaries installed. We will right-click in Visual Studio's Solution Explorer and select Manage NuGet Packages.
Adding NuGet Packages
In the upper right search box, type in "Azure Storage." As you would expect, this brings up the Windows Azure Storage client library, which we are going to use to read and write from and to the Windows Azure Storage account that we will provision momentarily.
Updating the project with latest Azure Storage SDK
In Visual Studio Solution Explorer you can expand the references node to validate that we have the storage client libraries installed.
Validating the Azure Storage Client Libraries
Part 3 - Adding the storage account connection information to app.config
Now is a good time to copy the storage account information into your app.config file. The app.config file provides a convenient location that is globally accessible to your application. It will be accessed at run time. It is not appropriate to ask users to continually provide the connection information every time they use the application.
Modifying App.Config
="1.0"
<configuration>
<startup>
<supportedRuntime
version = "v4.0"
sku = ".NETFramework,Version=v4.0"/>
</startup>
<appSettings>
<add
key = "StorageAccountName"
value = "mysourcecode"/>
<add
key = "AccessKey"
value = "CnQ6dUXdOQ81qSCFJhscuB3PCNM92o4bIuDoKG7mO7tJ1imxa5sMkzKtnghsG11EwKgxRaTW5g6fFKRcXZ8z6g=="/>
</appSettings>
</configuration>
Options for encryption
If you would like to encrypt this information, there are several options here:
Adding support to upload source code to your Azure Storage Account
Our next task is to enhance the starter project to enable source code uploads. Adding this capability directly into the application will dramatically improve usability. In this section, we will add a command button and then write some code.
Here's what the application looks like before our changes. This is MainForm.cs.
Before Adding a button to MainForm.cs
We will now add a third button as seen below. The name of the button is cmdAddCode and the caption reads (Text Property) Add source code to Azure Storage. You will need to move the index and search buttons down a little bit to make room for this new third button.
From the designer, click on the Add source code to Azure storage button to retrieve the code.
After Adding a button to MainForm.cs
We will now add some code that will provide the ability to upload source code.
Adding Code-Behind
Repeat the steps from an earlier step to ADD A REFERENCE. The reference we will add is System.Configuration
. Be sure you have the check box inside the red box checked before clicking OK.
Adding a reference to System.configuration
Be sure that the top of MainForm.cs has the following new statements in place.
The necessary using statements
Modifying MainForm.cs
private void cmdAddCode_Click(object sender, EventArgs e)
{
string windowTitle = this.Text;
try
{
string selectedFolder = null;
FolderBrowserDialog fDialog = new FolderBrowserDialog();
if (fDialog.ShowDialog() == DialogResult.OK)
{
selectedFolder = fDialog.SelectedPath.ToString();
}
string storageAccountName = ConfigurationManager.AppSettings["StorageAccountName"];
string accessKey = ConfigurationManager.AppSettings["AccessKey"];
string connString = string.Format("DefaultEndpointsProtocol=https;AccountName={0};AccountKey={1}",
storageAccountName, accessKey);
var storageAccount = CloudStorageAccount.Parse(connString);
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
List<FileInfo> filesToUpload = new List<FileInfo>();
RecursiveFileUpload(selectedFolder, filesToUpload, "*.*");
var fileUploadParallelism = new ParallelOptions() {MaxDegreeOfParallelism = 4};
string blobContainerName = "code";
blobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference(blobContainerName);
container.CreateIfNotExists();
Parallel.ForEach(filesToUpload, fileUploadParallelism, currentFileInfo =>
{
string cloudFileNamePath = currentFileInfo.FullName.Replace(@"\", @"_");
cloudFileNamePath = cloudFileNamePath.Length == 0 ? "" : cloudFileNamePath;
if (cloudFileNamePath.Length > 0)
{
if (cloudFileNamePath.Substring(0, 1).Equals("/"))
{
cloudFileNamePath = cloudFileNamePath.Substring(1);
}
}
try
{
var blobFileToUpload = container.GetBlockBlobReference(cloudFileNamePath);
ShowTitle("Uploading..." + currentFileInfo.Name);
if (!blobFileToUpload.Exists())
{
blobFileToUpload.OpenWrite(null, null, null);
blobFileToUpload.UploadFromFile(currentFileInfo.FullName, FileMode.Open, null, null, null);
}
}
catch (Exception exception)
{
MessageBox.Show("Issue with blob upload = " + exception.Message.ToString());
}
}
);
}
catch (Exception ex)
{
throw;
}
finally
{
this.Text = windowTitle;
}
}
delegate void StringParameterDelegate(string value);
public void ShowTitle(string value)
{
if (InvokeRequired)
{
BeginInvoke(new StringParameterDelegate(ShowTitle), new object[] { value });
return;
}
this.Text = value;
}
private List<FileInfo> RecursiveFileUpload(string sourceDir, List<FileInfo> filesToCopy, string search_type)
{
DirectoryInfo sDirInfo = null;
FileInfo sFileInfo = null;
if (!(sourceDir.EndsWith(Path.DirectorySeparatorChar.ToString())))
{
sourceDir += Path.DirectorySeparatorChar;
}
try
{
foreach (string sDir in Directory.GetDirectories(sourceDir))
{
sDirInfo = new DirectoryInfo(sDir);
RecursiveFileUpload(sDir, filesToCopy, search_type);
sDirInfo = null;
}
}
catch (Exception ex)
{
MessageBox.Show("Issue with RecursiveFileUpload " + ex.Message.ToString());
}
try
{
string[] theFiles = Directory.GetFiles(sourceDir);
foreach (string sFile in theFiles)
{
if (sFile.Length >= 1024)
continue;
sFileInfo = new FileInfo(sFile);
try
{
filesToCopy.Add(sFileInfo);
}
catch (System.IO.IOException ex)
{
MessageBox.Show("Skipping " + sDirInfo.FullName + " because of " + ex.Message.ToString());
}
sFileInfo = null;
}
}
catch (System.UnauthorizedAccessException ex)
{
MessageBox.Show("Skipping " + sourceDir + " because of " + ex.Message.ToString());
}
catch (System.Exception ex)
{
MessageBox.Show("Skipping " + sourceDir + " because of " + ex.Message.ToString());
}
return filesToCopy;
}
Some of the code needs updating in the Rewind()
method of the BLOBDATASOURCE.CS file.
public override bool Rewind()
{
if (_isStorageFailed)
return false;
try
{
var storageAccount = CloudStorageAccount.Parse(_connectionString);
_blobClient = storageAccount.CreateCloudBlobClient();
_blobTable = new Dictionary<string, List<string>>();
foreach (CloudBlobContainer container in _blobClient.ListContainers())
{
string containerName = container.Name;
List<string> blobURIs = new List<string>();
var blobs = container.ListBlobs();
foreach (var blobItem in blobs)
{
blobURIs.Add(blobItem.Uri.ToString());
}
_blobTable.Add(containerName, blobURIs);
}
if (!ResetIterators())
{
_isStorageFailed = true;
return false;
}
_isStorageFailed = false;
return true;
}
catch (Exception ex)
{
_isStorageFailed = true;
return false;
}
}
We have made some modifications to AskConnectForm.cs.
This will always retrieve the connection string so that the user doesn't have to type it in continually. Ideally, we could write some code to completely bypass the AskConnectForm form, but I'm trying to avoid too many modifications to keep this post straightforward.
public AskConnectForm()
{
InitializeComponent();
string storageAccountName = ConfigurationManager.AppSettings["StorageAccountName"];
string accessKey = ConfigurationManager.AppSettings["AccessKey"];
string connString = string.Format("DefaultEndpointsProtocol=https;AccountName={0};AccountKey={1}",
storageAccountName, accessKey);
this.ConnectString.Text = connString;
}
Part 4 - Testing
We are now ready to start testing the application that we just updated. One thing that might be of interest is to verify that we correctly updated our storage account with the source code. I ran the application once and uploaded source code to Azure Storage, as seen in the picture below.
You can download the Azure Storage Explorer for free at the following URL:
http://azurestorageexplorer.codeplex.com/
Once you've installed and configured Azure Storage Explorer, you can go and browse the containers for whatever source code you may have previously uploaded. It also allows you to delete the content should you want to do so.
Tools like Storage Explorer
Although we are adding source code, you can pretty much add any file, whether those are Word documents or PowerPoint. dtSearch will automatically index many different types of documents.
By the way, the previous code performs the upload asynchronously, and the developer can control the level of concurrency depending on network and system resources.
See the code snippet:
var fileUploadParallelism = new ParallelOptions() {MaxDegreeOfParallelism = 4};
Click the highlighted button to add source code up to your Azure Storage account.
Adding code
You can repeat this process of selecting a folder that contains the source code you wish to upload. All the files in the folder (and sub-folders) you pick will also be used to populate Windows Azure Storage with source code.
Selecting a folder that contains source code
When the index is created it will need a location to store the index files.
Enter a valid location and then hit the Index button.
Entering information about the index and creating the index
We already entered the necessary code above to populate this dialog box with the appropriate connection string. You can just hit OK on this dialog box.
Entering the connection string and clicking OK
You will click on two buttons in this dialog box. The first button is Index an Azure storage account. The second button is Search.
Performing the indexing operation and then clicking Search
Entering the search term and hitting Search
Our work is complete. You are now able to get lightning quick results searching your keywords up against your Azure Storage account.
If you decide to add more source code to the Azure Storage account, you will need to regenerate the indexes.
Viewing the search results
Conclusion
You can now search literally terabytes of source code and get instant search results. One of the core advantages here is that you don't have to store all the source code locally on your own laptop or desktop computer. All the source code can be securely stored up in your Azure Storage account, available only to those that have the access keys.
Other Resources
More on dtSearch
dtSearch.com
A Search Engine in Your Pocket – Introducing dtSearch on Android
Blazing Fast Source Code Search in the Cloud
Using Azure Files, RemoteApp and dtSearch for Secure Instant Search Across Terabytes of A Wide Range of Data Types from Any Computer or Device
Windows Azure SQL Database Development with the dtSearch Engine
Faceted Search with dtSearch – Not Your Average Search Filter
Turbo Charge your Search Experience with dtSearch and Telerik UI for ASP.NET
Put a Search Engine in Your Windows 10 Universal (UWP) Applications
Indexing SharePoint Site Collections Using the dtSearch Engine DataSource API
Working with the dtSearch® ASP.NET Core WebDemo Sample Application
Using dtSearch on Amazon Web Services with EC2 & EBS
Full-Text Search with dtSearch and AWS Aurora