Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Small Lucene.NET Demo App

0.00/5 (No votes)
21 Jun 2013 1  
A small demo app that shows how to store/search using Lucene.NET

Demo project : Lucene_VS2012_Demo_App.zip

 

Table Of Contents

Introduction

I have just left a job, and am about to start a new one, but just before I  left one of the other guys in my team was tasked with writing something using  the Lucene search engine for .NET. We had to search across 300,000 or some  objects and it appeared to be pretty quick. We were doing it in response to a  user typing a character, and no delay was not noticeable at all, even though it  was going through loads of Rx and loads of different application layers, and  finally hitting Lucene to search for results.

This spiked my interest a bit and I decided to give Lucene a try and see if I  could some up with a simple demo that I could share.

So that is what I did and this is the results of that.

 

Lucene .NET Basics

Lucene.Net is a port of the Lucene search engine library, written in C# and  targeted at .NET runtime users.

The general idea is that you build an Index of .NET objects that are stored  within a specialized Lucene Document with searchable fields. You are then able  to run queries against these stored Documents are rehydrate them back into .NET  objects 

Building An Index (Document)

Index build is the phase where you take your .NET objects and created a  Document for each one and add certain fields (you do not have to store/add  fields for all the .NET objects properties, that is up to you to decide) and  then save these Document(s) to a physical directory on disk, which will later be  searched.

Querying Data

Querying data is obviously one of the main reasons that you would want to use  Lucene .NET and it should come as no suprise that it has good querying  facilities.

I think one of the nice resources of the query syntax that Lucene .NET uses  can be found here :

http://www.lucenetutorial.com/lucene-query-syntax.html

Some simple examples might be 

title:foo  : Search for word "foo" in the title  field.
title:"foo bar" : Search for phrase "foo bar" in the title  field.
title:"foo bar" AND  body:"quick fox" : Search for phrase "foo bar" in the title field  AND the phrase "quick fox" in the body field.
(title:"foo bar" AND  body:"quick fox") OR title:fox : Search for either the phrase "foo bar" in the  title field AND the phrase "quick fox" in the body field, or the word  "fox" in the title field.
title:foo -title:bar : Search for word "foo" and not "bar" in the  title field.

 

Types Of Analyzer

There are actual lots of different Analyzer types in Lucene.NET, such as  (there are many more than this, these are just a few):

  • SimpleAnalyzer
  • StandardAnalyzer
  • StopAnalyzer
  • WhiteSpaceAnalyzer

Choosing the correct one, will depend on what you are trying to achieve, and  what your requirements dictate.

 

The Demo App

This section will talk about the attached demo app, and should give you  enough information to start building your own Lucene.NET powered search should  you wish to use it in your own applications.

What Does The Demo App Do?

The demo app is pretty simple really, here is what it does:

  • There is a static text file (in my case a poem) that is available to  index
  • On startup the text file is indexed and added to the overall Lucene  Index directory (which in my case is hardcoded to C:\Temp\LuceneIndex)
  • There is a UI (I used WPF, but that is irrelavent) which :
    • Allows a user to enter a search key word that is used to search the  indexed Lucene data
    • Will show all the lines from the text file that was originally used  to create the Lucene Index data
    • Will show the matching lines in the poem when the user conducts a  search.

I think the best bet is to see an example. So this is what the UI looks like  when it first loads:

Then we type a search term in, say the word "when", and we would see this:

And that is all the demo does, but I think that is enough to demonstrate how  Lucene works.

 

What Gets Stored And How

So what gets stored. Well that is pretty simple, recall I stated that we had  a static text file (a poem), well we start by reading that static text file  using a simple utility class which is shown below, into actual SampleDataFileRow objects that be added to the Lucene index

public class SampleDataFileReader : ISampleDataFileReader
{
    public IEnumerable<SampleDataFileRow> ReadAllRows()
    {
        FileInfo assFile = new FileInfo(Assembly.GetExecutingAssembly().Location);
        string file = string.Format(@"{0}\Lucene\SampleDataFile.txt", assFile.Directory.FullName);
        string[] lines = File.ReadAllLines(file);
        for (int i = 0; i < lines.Length; i++)
		{
            yield return new SampleDataFileRow
            {
                LineNumber = i + 1,
                LineText = lines[i]
            };
		}     
    }
}

Where the SampleDataFileRow objects look like this

public class SampleDataFileRow
{
    public int LineNumber { get; set; }
    public string LineText { get; set; }
    public float Score { get; set; }
}

And then from there we build the Lucene Index, which is done as follows:

public class LuceneService : ILuceneService
{
    // Note there are many different types of Analyzer that may be used with Lucene, the exact one you use
    // will depend on your requirements
    private Analyzer analyzer = new WhitespaceAnalyzer(); 
    private Directory luceneIndexDirectory;
    private IndexWriter writer;
    private string indexPath = @"c:\temp\LuceneIndex";

    public LuceneService()
    {
        InitialiseLucene();
    }

    private void InitialiseLucene()
    {
        if(System.IO.Directory.Exists(indexPath))
        {
            System.IO.Directory.Delete(indexPath,true);
        }

        luceneIndexDirectory = FSDirectory.GetDirectory(indexPath);
        writer = new IndexWriter(luceneIndexDirectory, analyzer, true);
    }

    public void BuildIndex(IEnumerable<SampleDataFileRow> dataToIndex)
    {
        foreach (var sampleDataFileRow in dataToIndex)
	    {
		    Document doc = new Document();
            doc.Add(new Field("LineNumber", 
			sampleDataFileRow.LineNumber.ToString() , 
			Field.Store.YES, 
			Field.Index.UN_TOKENIZED));
            doc.Add(new Field("LineText", 
			sampleDataFileRow.LineText, 
			Field.Store.YES, 
			Field.Index.TOKENIZED));
            writer.AddDocument(doc);
	    }
        writer.Optimize();
        writer.Flush();
        writer.Close();
        luceneIndexDirectory.Close();
    }


    ....
    ....
    ....
    ....
    ....
}

I think that code is fairly simple and easy to follow, we essentially just do  this:

  1. Create new Lucene index directory
  2. Create a Lucene writer
  3. Create a new Lucene Document for our source object,
  4. Add the fields to the Lucene Document
  5. Write the Lucene Document to disk

One thing that may be of interest, is that if you are dealing with vast  quantites of data you may want to create static Field fields and  reuse them rather than creating new one each time you rebuild the index.  Obviously for this demo the Lucene index is only created once per application  run, but in a production application you may build the index every 5 mins or  something like that, in which case I would recommend reusing the Field objects by making static fields that get re-used.

 

What Gets Searched And How

So in terms of searching the indexed data this is really easy and all you  need to do is something like this:

public class LuceneService : ILuceneService
{
    // Note there are many different types of Analyzer that may be used with Lucene, the exact one you use
    // will depend on your requirements
    private Analyzer analyzer = new WhitespaceAnalyzer(); 
    private Directory luceneIndexDirectory;
    private IndexWriter writer;
    private string indexPath = @"c:\temp\LuceneIndex";

    public LuceneService()
    {
        InitialiseLucene();
    }

    ....
    ....


    public IEnumerable<SampleDataFileRow> Search(string searchTerm)
    {
        IndexSearcher searcher = new IndexSearcher(luceneIndexDirectory);
        QueryParser parser = new QueryParser("LineText", analyzer);

        Query query = parser.Parse(searchTerm);
        Hits hitsFound = searcher.Search(query);
        List<SampleDataFileRow> results = new List<SampleDataFileRow>();
        SampleDataFileRow sampleDataFileRow = null;

        for (int i = 0; i < hitsFound.Length(); i++)
        {
            sampleDataFileRow = new SampleDataFileRow();
            Document doc = hitsFound.Doc(i);
            sampleDataFileRow.LineNumber = int.Parse(doc.Get("LineNumber"));
            sampleDataFileRow.LineText = doc.Get("LineText");
            float score = hitsFound.Score(i);
            sampleDataFileRow.Score = score;
            results.Add(sampleDataFileRow);
        }
           
        return results.OrderByDescending(x => x.Score).ToList();
    }
}

There is not much too that to be honest, and I think the code explains all you need to know

Lucene GUI

There is also a pretty cool GUI for examining your stored Lucene data, which  is called "Luke.NET", and it freely available from codeplex using the following  link:

http://luke.codeplex.com/releases/view/82033

When you run this tool you will need to enter the path to the index directory  for the Lucene index that was created. For this demo app that is

C:\Temp\LuceneIndex

One you enter that you click "Ok" and you will be presented with a UI that  allows you to examine all the indexed data that Lucene stored, and also run  searches should you wish to.

Its a nice tool and worth a look.

 

That's It

Anyway that is all I have to say for now, I do have a few article done, but  they just need writing up and I am struggling to find time of late. I'll get  there when I get there I guess. Anyway as always if you enjoyed this, a  vote/comment is most welcome.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here