(untagged)

Small Lucene.NET Demo App

Sacha Barber

0.00/5 (No votes)

21 Jun 2013

A small demo app that shows how to store/search using Lucene.NET

Demo project : Lucene_VS2012_Demo_App.zip

Introduction
Lucene .NET Basics
The Demo App
Lucene GUI
That's It

Introduction

I have just left a job, and am about to start a new one, but just before I left one of the other guys in my team was tasked with writing something using the Lucene search engine for .NET. We had to search across 300,000 or some objects and it appeared to be pretty quick. We were doing it in response to a user typing a character, and no delay was not noticeable at all, even though it was going through loads of Rx and loads of different application layers, and finally hitting Lucene to search for results.

This spiked my interest a bit and I decided to give Lucene a try and see if I could some up with a simple demo that I could share.

So that is what I did and this is the results of that.

Lucene .NET Basics

Lucene.Net is a port of the Lucene search engine library, written in C# and targeted at .NET runtime users.

The general idea is that you build an Index of .NET objects that are stored within a specialized Lucene Document with searchable fields. You are then able to run queries against these stored Documents are rehydrate them back into .NET objects

Building An Index (Document)

Index build is the phase where you take your .NET objects and created a Document for each one and add certain fields (you do not have to store/add fields for all the .NET objects properties, that is up to you to decide) and then save these Document(s) to a physical directory on disk, which will later be searched.

Querying Data

Querying data is obviously one of the main reasons that you would want to use Lucene .NET and it should come as no suprise that it has good querying facilities.

I think one of the nice resources of the query syntax that Lucene .NET uses can be found here :

http://www.lucenetutorial.com/lucene-query-syntax.html

Some simple examples might be

title:foo	: Search for word "foo" in the title field.
title:"foo bar"	: Search for phrase "foo bar" in the title field.
title:"foo bar" AND body:"quick fox"	: Search for phrase "foo bar" in the title field AND the phrase "quick fox" in the body field.
(title:"foo bar" AND body:"quick fox") OR title:fox	: Search for either the phrase "foo bar" in the title field AND the phrase "quick fox" in the body field, or the word "fox" in the title field.
title:foo -title:bar	: Search for word "foo" and not "bar" in the title field.

Types Of Analyzer

There are actual lots of different Analyzer types in Lucene.NET, such as (there are many more than this, these are just a few):

SimpleAnalyzer
StandardAnalyzer
StopAnalyzer
WhiteSpaceAnalyzer

Choosing the correct one, will depend on what you are trying to achieve, and what your requirements dictate.

The Demo App

This section will talk about the attached demo app, and should give you enough information to start building your own Lucene.NET powered search should you wish to use it in your own applications.

What Does The Demo App Do?

The demo app is pretty simple really, here is what it does:

There is a static text file (in my case a poem) that is available to index
On startup the text file is indexed and added to the overall Lucene Index directory (which in my case is hardcoded to C:\Temp\LuceneIndex)
There is a UI (I used WPF, but that is irrelavent) which :
- Allows a user to enter a search key word that is used to search the indexed Lucene data
- Will show all the lines from the text file that was originally used to create the Lucene Index data
- Will show the matching lines in the poem when the user conducts a search.

I think the best bet is to see an example. So this is what the UI looks like when it first loads:

Then we type a search term in, say the word "when", and we would see this:

And that is all the demo does, but I think that is enough to demonstrate how Lucene works.

What Gets Stored And How

So what gets stored. Well that is pretty simple, recall I stated that we had a static text file (a poem), well we start by reading that static text file using a simple utility class which is shown below, into actual SampleDataFileRowobjects that be added to the Lucene index

public class SampleDataFileReader : ISampleDataFileReader
{
    public IEnumerable<SampleDataFileRow> ReadAllRows()
    {
        FileInfo assFile = new FileInfo(Assembly.GetExecutingAssembly().Location);
        string file = string.Format(@"{0}\Lucene\SampleDataFile.txt", assFile.Directory.FullName);
        string[] lines = File.ReadAllLines(file);
        for (int i = 0; i < lines.Length; i++)
		{
            yield return new SampleDataFileRow
            {
                LineNumber = i + 1,
                LineText = lines[i]
            };
		}     
    }
}

Where the SampleDataFileRow objects look like this

public class SampleDataFileRow
{
    public int LineNumber { get; set; }
    public string LineText { get; set; }
    public float Score { get; set; }
}

And then from there we build the Lucene Index, which is done as follows:

public class LuceneService : ILuceneService
{
    // Note there are many different types of Analyzer that may be used with Lucene, the exact one you use
    // will depend on your requirements
    private Analyzer analyzer = new WhitespaceAnalyzer(); 
    private Directory luceneIndexDirectory;
    private IndexWriter writer;
    private string indexPath = @"c:\temp\LuceneIndex";

    public LuceneService()
    {
        InitialiseLucene();
    }

    private void InitialiseLucene()
    {
        if(System.IO.Directory.Exists(indexPath))
        {
            System.IO.Directory.Delete(indexPath,true);
        }

        luceneIndexDirectory = FSDirectory.GetDirectory(indexPath);
        writer = new IndexWriter(luceneIndexDirectory, analyzer, true);
    }

    public void BuildIndex(IEnumerable<SampleDataFileRow> dataToIndex)
    {
        foreach (var sampleDataFileRow in dataToIndex)
	    {
		    Document doc = new Document();
            doc.Add(new Field("LineNumber", 
			sampleDataFileRow.LineNumber.ToString() , 
			Field.Store.YES, 
			Field.Index.UN_TOKENIZED));
            doc.Add(new Field("LineText", 
			sampleDataFileRow.LineText, 
			Field.Store.YES, 
			Field.Index.TOKENIZED));
            writer.AddDocument(doc);
	    }
        writer.Optimize();
        writer.Flush();
        writer.Close();
        luceneIndexDirectory.Close();
    }


    ....
    ....
    ....
    ....
    ....
}

I think that code is fairly simple and easy to follow, we essentially just do this:

Create new Lucene index directory
Create a Lucene writer
Create a new Lucene Document for our source object,
Add the fields to the Lucene Document
Write the Lucene Document to disk

One thing that may be of interest, is that if you are dealing with vast quantites of data you may want to create static Field fields and reuse them rather than creating new one each time you rebuild the index. Obviously for this demo the Lucene index is only created once per application run, but in a production application you may build the index every 5 mins or something like that, in which case I would recommend reusing the Fieldobjects by making static fields that get re-used.

What Gets Searched And How

So in terms of searching the indexed data this is really easy and all you need to do is something like this:

public class LuceneService : ILuceneService
{
    // Note there are many different types of Analyzer that may be used with Lucene, the exact one you use
    // will depend on your requirements
    private Analyzer analyzer = new WhitespaceAnalyzer(); 
    private Directory luceneIndexDirectory;
    private IndexWriter writer;
    private string indexPath = @"c:\temp\LuceneIndex";

    public LuceneService()
    {
        InitialiseLucene();
    }

    ....
    ....


    public IEnumerable<SampleDataFileRow> Search(string searchTerm)
    {
        IndexSearcher searcher = new IndexSearcher(luceneIndexDirectory);
        QueryParser parser = new QueryParser("LineText", analyzer);

        Query query = parser.Parse(searchTerm);
        Hits hitsFound = searcher.Search(query);
        List<SampleDataFileRow> results = new List<SampleDataFileRow>();
        SampleDataFileRow sampleDataFileRow = null;

        for (int i = 0; i < hitsFound.Length(); i++)
        {
            sampleDataFileRow = new SampleDataFileRow();
            Document doc = hitsFound.Doc(i);
            sampleDataFileRow.LineNumber = int.Parse(doc.Get("LineNumber"));
            sampleDataFileRow.LineText = doc.Get("LineText");
            float score = hitsFound.Score(i);
            sampleDataFileRow.Score = score;
            results.Add(sampleDataFileRow);
        }
           
        return results.OrderByDescending(x => x.Score).ToList();
    }
}

There is not much too that to be honest, and I think the code explains all you need to know

Lucene GUI

There is also a pretty cool GUI for examining your stored Lucene data, which is called "Luke.NET", and it freely available from codeplex using the following link:

http://luke.codeplex.com/releases/view/82033

When you run this tool you will need to enter the path to the index directory for the Lucene index that was created. For this demo app that is

C:\Temp\LuceneIndex

One you enter that you click "Ok" and you will be presented with a UI that allows you to examine all the indexed data that Lucene stored, and also run searches should you wish to.

Its a nice tool and worth a look.

That's It

Anyway that is all I have to say for now, I do have a few article done, but they just need writing up and I am struggling to find time of late. I'll get there when I get there I guess. Anyway as always if you enjoyed this, a vote/comment is most welcome.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

Small Lucene.NET Demo App

Table Of Contents

License