Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Extending Cuyahoga FullText Indexing (Lucene.NET)

0.00/5 (No votes)
27 Mar 2008 1  
In this article we will extend classes in Cuyahoga.Core.Search namespace in order to provide more generic full text indexing service

Introduction

Cuyahoga provides module developers with a simple to use wrapper classes for full text indexing with Lucene.NET.
Although wrapper classes are easy to use these classes are not generic in nature. I mean
we, as module developers, have to convert our custom content (be it static or dynamic)
to a SearchContent object so that wrapper classes can index that content. Another similar limitation is when we perform search on the indexed content we can only get collection of SearchResult objects. I also want to mention that we, as module developers, have no control over which fields will be stored/unstored, which fields will be used as keywords or which fields will not be indexed while wrapper objects build the full text index.

In this article I will try to show you how we can make Cuyahoga.Core fulltext indexing capabilities more generic by providing a seperate extension assembly.

Background

I created two web sites using Cuyahoga framework

While developing these sites I needed to fulltext index some dynamic content, I mean content persisted in the database. After exploring the wrapper classes in Cuyahoga.Core.Search namespace I noticed the limitations of the present search architecture and decided to make these wrapper classes more generic by appliying .NET Generics and Reflection.

Bo.Cuyahoga.Extensions.Search Namespace

In order to extend Cuyahoga.Core.Search namespace functionality using .NET Generics and Reflection I copied all files to my project and started refactoring. I appended Ex to class names in order to distinguish extended wrapper classes from the original classes in Cuyahoga.Core.Search namespace.

IndexBuilder<T> class

Refactored generic version of Cuyahoga.Core.Search.IndexBuilder class. This class is
used as a wrapper around Lucene.NET index building functionality. Take a closer look at BuildDocumentFromSearchContent DeleteContent methods. These methods were modified so that we can use reflection to retreive fields of content object.

private Document BuildDocumentFromSearchContent(T searchContent)
{
    Document doc = new Document();
    IList fields = SearchGenericUtils.GetSearchContentFields(typeof(T), searchContent);
    for(int i = 0; i < fields.Count ; i++)
    {
        SearchContentFieldInfo fi = fields[i];
        switch (fi.FieldType)
        {
            case SearchContentFieldType.Text:
                doc.Add(Field.Text(fi.Name, fi.Value));
                break;
            case SearchContentFieldType.UnStored:
                doc.Add(Field.UnStored(fi.Name, fi.Value));
                break;
            case SearchContentFieldType.UnIndexed:
                doc.Add(Field.UnIndexed(fi.Name, fi.Value));
                break;
            case SearchContentFieldType.Keyword:
                doc.Add(Field.Keyword(fi.Name, fi.Value));
                break;
            default:
                break;
        }
    }
    return doc;
}


public void DeleteContent(T searchContent)
{
    if (this._rebuildIndex)
    {
        throw new InvalidOperationException("Cannot delete documents when rebuilding the index.");
    }
    else
    {
        this._indexWriter.Close();
        this._indexWriter = null;

        // Search content key field uniquely identifies a document in the index.
        SearchContentFieldInfo ki = SearchGenericUtils.GetSearchContentKeyFieldInfo(typeof(T), searchContent);
        if (String.IsNullOrEmpty(ki.Name))
            throw new Exception("SearchContentKey Field not specified on target class!");

        Term term = new Term(ki.Name, ki.Name);
        IndexReader rdr = IndexReader.Open(this._indexDirectory);
        rdr.Delete(term);
        rdr.Close();
    }
}

IndexQuery<T> class

Refactored generic version of Cuyahoga.Core.Search.IndexQuery class. This class is used as a wrapper around Lucene.NET index search functionality. Find was modified to so that we can
create instance of the content object and set property values via reflection.

public SearchResultCollection<T> Find(string queryText, Hashtable keywordFilter, int pageIndex, int pageSize)
{
    long startTicks = DateTime.Now.Ticks;
    //We get query fileds with reflection
    string[] qryFields = SearchGenericUtils.GetSearchContentQueryFields(typeof(T));
    if (qryFields.Length == 0)
        throw new Exception("No query field specified on target class!");

    Query query = MultiFieldQueryParser.Parse(queryText, qryFields, new StandardAnalyzer());
    IndexSearcher searcher = new IndexSearcher(this._indexDirectory);
    Hits hits;
    if (keywordFilter != null && keywordFilter.Count > 0)
    {
        QueryFilter qf = BuildQueryFilterFromKeywordFilter(keywordFilter);
        hits = searcher.Search(query, qf);
    }
    else
    {
        hits = searcher.Search(query);
    }
    int start = pageIndex * pageSize;
    int end = (pageIndex + 1) * pageSize;
    if (hits.Length() <= end)
    {
        end = hits.Length();
    }
    SearchResultCollection<T> results = new SearchResultCollection<T>();
    results.TotalCount = hits.Length();
    results.PageIndex = pageIndex;

    //We get filds that will be populated as a result of the search via reflection
    string[] resultFields = SearchGenericUtils.GetSearchContentResultFields(typeof(T));
    for (int i = start; i < end; i++)
    {
      // We create instance of the target type with Activator and set field(property) values with reflection
        T instance = Activator.CreateInstance<T>();
        for (int j = 0; j < resultFields.Length; j++)
        {
            SearchGenericUtils.SetSearchResultField(instance, resultFields[j], hits.Doc(i).Get(resultFields[j]));
        }

        //If target type implements ISearchResultStat we set Boost and Score properties of the instance
        if (instance is ISearchResultStat)
        {
            SearchGenericUtils.SetSearchResultField(instance, "Boost", hits.Doc(i).GetBoost());
            SearchGenericUtils.SetSearchResultField(instance, "Score", hits.Score(i));
        }
        results.Add(instance);
    }

    searcher.Close();
    results.ExecutionTime = DateTime.Now.Ticks - startTicks;
    return results;
}

SearchContentFieldAttribute Class

.NET attributes are very neat implementation of declarative programming. We use SearchContentFieldAttribute to identify how we will perform fulltext indexing on our content. A field in our fulltext index directly maps to a property of our custom content type. For example if we want to index a Person class we specify which properties of this class will be indexed, how they are indexed and which properties will be set as a result of a fulltext index search with SearchContentFieldAttribute.

SearchContentFieldAttribute is used to declare

  • Field Type: This declaration is needed because Lucene.NET has different field declarations. For details see Field class in
    Lucene.Net.Documents namespace
  • Key Field: Key field is used to uniquely identify our copntent. Only one key field declaration is allowed. If you declare two properties
    as key fields on the content class only the first one will be used.
  • Result Field: Content class properties with IsResultField=true will be automatically set after index search, other fields will not be set.
  • Query Field: Content class properties with IsQueryField=true will be used in index searches.Index search will be performed only on these
    properties
namespace Bo.Cuyahoga.Extensions.Search
{
    [AttributeUsage(AttributeTargets.Property, AllowMultiple = false, Inherited = true)]
    public class SearchContentFieldAttribute : Attribute
    {
        private SearchContentFieldType _fieldType;
        /// 
        /// Default is Text
        /// 
        public SearchContentFieldType FieldType
        {
            get { return _fieldType; }
            set { _fieldType = value; }
        }

        private bool _isResultField = true;
        /// 
        /// Default is true
        /// 
        public bool IsResultField
        {
            get { return _isResultField; }
            set { _isResultField = value; }
        }

        private bool _isQueryField = false;
        /// 
        /// Default is false
        /// 
        public bool IsQueryField
        {
            get { return _isQueryField; }
            set { _isQueryField = value; }
        }

        private bool _isKeyField = false;
        /// 
        /// Default is false
        /// 
        public bool IsKeyField
        {
            get { return _isKeyField; }
            set { _isKeyField = value; }
        }

        public SearchContentFieldAttribute(SearchContentFieldType fieldType)
        {
            _fieldType = fieldType;
        }

    }

    public enum SearchContentFieldType
    {
        Text,
        UnStored,
        UnIndexed,
        Keyword
    }
}

SearchResultCollection<T> Class

This is the generic version of the SearchResultCollection class found in Cuyahoga.Core.Search namespace.Nothing special to mention about this class

ISearchResultStat interface

We use this interface in order to specify that our custom content type will hold query statistics like Boost and Score. If our custom content type implements this interface value of the Boost and Score properties are set after an index query.

public interface ISearchResultStat
{
    float Boost{get;set;}
    float Score{ get;set;}
}

ReflectionHelper Singleton

Reflection helper class designed to improve performance

/// 
    /// Singleton reflection helper
    /// 
    public sealed class ReflectionHelper
    {
        #region Static Fields And Properties

        static ReflectionHelper _instance = null;
        static readonly object _padlock = new object();
        public static ReflectionHelper Instance
        {
            get
            {
                lock (_padlock)
                {
                    if (_instance == null)
                    {
                        _instance = new ReflectionHelper();
                    }
                    return _instance;
                }
            }
        }
        
        #endregion Static Fields And Properties

        #region Instance Fields And Properties
        
        private Dictionary> _cache;
        
        #endregion Instance Fields And Properties

        #region CTOR

        private ReflectionHelper()
        {
            _cache = new Dictionary>();
        }

        #endregion //CTOR

        #region Reflection Helper Methods

        public SearchContentFieldInfo[] GetKeyFields(Type t)
        {
            if (!_cache.ContainsKey(t))
                AddTypeToCache(t);


            List keyFields =  _cache[t].FindAll(
                delegate(SearchContentFieldInfo fi)
                {
                    return fi.IsKeyField == true;
                });
            return keyFields.ToArray();
        }

        public SearchContentFieldInfo[] GetKeyFields(Type t, object instance)
        {
            SearchContentFieldInfo[] fields = GetKeyFields(t);
            if (instance == null)
                return fields;

            GenericGetter getMethod;
            for (int i = 0; i < fields.Length; i++)
            {
                getMethod = CreateGetMethod(fields[i].PropertyInfo);
                object val = getMethod(instance);
                fields[i].Value = val == null ? String.Empty : val.ToString();
            }
            return fields;
        }

        public IList GetFields(Type t, object instance)
        {
            if (!_cache.ContainsKey(t))
                AddTypeToCache(t);

            Listresult = new List(_cache[t].ToArray());

            if (instance == null)
                return result;
            
            GenericGetter getMethod;
            for (int i = 0; i < result.Count; i++)
            {
                SearchContentFieldInfo fi = result[i];
                getMethod = CreateGetMethod(fi.PropertyInfo);
                object val = getMethod(instance);
                fi.Value = val == null ? String.Empty : val.ToString();
            }
            return result;
        }

        public string[] GetQueryFields(Type t)
        {
            if (!_cache.ContainsKey(t))
                AddTypeToCache(t);

            List < SearchContentFieldInfo > qryFields = _cache[t].FindAll(
                delegate(SearchContentFieldInfo fi)
                {
                    return fi.IsQueryField == true;
                });

            
            List result = qryFields.ConvertAll(
                delegate(SearchContentFieldInfo fi) 
                {
                    return fi.Name;
                });

            return result.ToArray();
        }

        public string[] GetResultFields(Type t)
        {
            if (!_cache.ContainsKey(t))
                AddTypeToCache(t);

            List qryFields = _cache[t].FindAll(
                delegate(SearchContentFieldInfo fi)
                {
                    return fi.IsResultField == true;
                });


            List result = qryFields.ConvertAll(
                delegate(SearchContentFieldInfo fi)
                {
                    return fi.Name;
                });

            return result.ToArray();
        }

        public void SetSearchResultField(string fieldName, object instance, object value)
        {
            Type t = instance.GetType();
            if (!_cache.ContainsKey(t))
                AddTypeToCache(t);
            
            SearchContentFieldInfo field = _cache[t].Find(
            delegate(SearchContentFieldInfo fi)
            {
                return fi.Name == fieldName;
            });

            if (field.Name != fieldName)
                throw new Exception(String.Format("Field with name \"{0}\" not found on type \"{1}\"!", fieldName, t));

            GenericSetter setter = CreateSetMethod(field.PropertyInfo);
            setter(instance, Convert.ChangeType(value,field.PropertyInfo.PropertyType));
        }

        #endregion //Reflection Helper Methods

        #region Cache Operations
        private void AddTypeToCache(Type t)
        {
            if (_cache.ContainsKey(t))
                return;

            List fields = new List();

            PropertyInfo[] props = t.GetProperties(BindingFlags.Public | BindingFlags.Instance);
            for (int i = 0; i < props.Length; i++)
            {
                PropertyInfo pi = props[i];
                SearchContentFieldAttribute[] atts = (SearchContentFieldAttribute[])pi.GetCustomAttributes(typeof(SearchContentFieldAttribute), true);
                if (atts.Length > 0)
                {
                    SearchContentFieldInfo fi = new SearchContentFieldInfo();
                    fi.Name = pi.Name;
                    fi.FieldType = atts[0].FieldType;
                    fi.IsKeyField = atts[0].IsKeyField;
                    fi.IsResultField = atts[0].IsResultField;
                    fi.IsQueryField = atts[0].IsResultField;
                    fi.PropertyInfo = pi;
                    fields.Add(fi);
                }
            }
            if (fields.Count > 0)
                _cache.Add(t, fields);
        }
        #endregion //Cache Operations

        #region Emit Getter/Setter

        /* Source for CreateSetMethod and CreateGetMethod taken from
         * http://jachman.wordpress.com/2006/08/22/2000-faster-using-dynamic-method-calls/
         */
        private GenericSetter CreateSetMethod(PropertyInfo propertyInfo)
        {
            /*
            * If there’s no setter return null
            */
            MethodInfo setMethod = propertyInfo.GetSetMethod();
            if (setMethod == null)
                return null;

            /*
            * Create the dynamic method
            */
            Type[] arguments = new Type[2];
            arguments[0] = arguments[1] = typeof(object);

            DynamicMethod setter = new DynamicMethod(
                String.Concat("_Set", propertyInfo.Name, "_"),
                typeof(void), arguments, propertyInfo.DeclaringType);
            ILGenerator generator = setter.GetILGenerator();
            generator.Emit(OpCodes.Ldarg_0);
            generator.Emit(OpCodes.Castclass, propertyInfo.DeclaringType);
            generator.Emit(OpCodes.Ldarg_1);

            if (propertyInfo.PropertyType.IsClass)
                generator.Emit(OpCodes.Castclass, propertyInfo.PropertyType);
            else
                generator.Emit(OpCodes.Unbox_Any, propertyInfo.PropertyType);

            generator.EmitCall(OpCodes.Callvirt, setMethod, null);
            generator.Emit(OpCodes.Ret);

            /*
            * Create the delegate and return it
            */
            return (GenericSetter)setter.CreateDelegate(typeof(GenericSetter));
        }

        ///
        /// Creates a dynamic getter for the property
        ///
        private static GenericGetter CreateGetMethod(PropertyInfo propertyInfo)
        {
            /*
            * If there’s no getter return null
            */
            MethodInfo getMethod = propertyInfo.GetGetMethod();
            if (getMethod == null)
                return null;

            /*
            * Create the dynamic method
            */
            Type[] arguments = new Type[1];
            arguments[0] = typeof(object);

            DynamicMethod getter = new DynamicMethod(
                String.Concat("_Get", propertyInfo.Name, "_"),
                typeof(object), arguments, propertyInfo.DeclaringType);
            ILGenerator generator = getter.GetILGenerator();
            generator.DeclareLocal(typeof(object));
            generator.Emit(OpCodes.Ldarg_0);
            generator.Emit(OpCodes.Castclass, propertyInfo.DeclaringType);
            generator.EmitCall(OpCodes.Callvirt, getMethod, null);

            if (!propertyInfo.PropertyType.IsClass)
                generator.Emit(OpCodes.Box, propertyInfo.PropertyType);

            generator.Emit(OpCodes.Ret);

            /*
            * Create the delegate and return it
            */
            return (GenericGetter)getter.CreateDelegate(typeof(GenericGetter));
        }
        #endregion //Emit Getter/Setter
    }

    public delegate void GenericSetter(object target, object value);
    public delegate object GenericGetter(object target);

SearchGenericUtils Static Class

public class SearchContentFieldInfo
    {
        public string Name;
        public string Value;
        public SearchContentFieldType FieldType;
        public bool IsResultField;
        public bool IsQueryField;
        public bool IsKeyField;
        public PropertyInfo PropertyInfo;
    }

    internal static class SearchGenericUtils
    {
        internal static SearchContentFieldInfo GetSearchContentKeyFieldInfo(Type t, object instance)
        {
            SearchContentFieldInfo[] keyFields =  ReflectionHelper.Instance.GetKeyFields(t,instance);
            
            if (keyFields.Length == 0)
                throw new Exception(String.Format("No key filed defined for type {0}!", t));
            
            if(keyFields.Length > 1)
                throw new Exception(String.Format("Only one key filed allowed for type {0}!", t));

            return keyFields[0];
        }

        internal static SearchContentFieldInfo GetSearchContentKeyFieldInfo(Type t)
        {
            return GetSearchContentKeyFieldInfo(t, null);
        }

        internal static IList GetSearchContentFields(Type t, object instance)
        {
            if (instance == null)
                throw new Exception("Instance parameter is null!");

            return ReflectionHelper.Instance.GetFields(t, instance);
        }

        internal static string[] GetSearchContentQueryFields(Type t)
        {
            return ReflectionHelper.Instance.GetQueryFields(t);
        }

        internal static string[] GetSearchContentResultFields(Type t)
        {
            return ReflectionHelper.Instance.GetResultFields(t);
        }

        internal static void SetSearchResultField(object instance , string fieldName, object value)
        {

            if (instance == null)
                throw new Exception("Object instance parameter is null!");
            if (String.IsNullOrEmpty(fieldName))
                throw new Exception("Field name is empty!");

            ReflectionHelper.Instance.SetSearchResultField(fieldName, instance, value);    
        }


    }

Using The Sample Code

Our sample console application simply creates three PersonContent objects and full text indexes these objects. Then we perform queries on the indexed content.

In the sample application PersonContent class will be used as our custom content type.The class looks like

public class PersonContent
{
    private string _id;

    [SearchContentField(SearchContentFieldType.Keyword,IsKeyField=true)]
    public string Id
    {
        get { return _id; }
        set { _id = value; }
    }

    private string _keyword;
    [SearchContentField(SearchContentFieldType.Keyword)]
    public string Keyword
    {
        get { return _keyword; }
        set { _keyword = value; }
    }

    private string _fullName;
    [SearchContentField(SearchContentFieldType.Text,IsQueryField=true)]
    public string FullName
    {
        get { return _fullName; }
        set { _fullName = value; }
    }

    private string _notes;
    [SearchContentField(SearchContentFieldType.UnStored,IsResultField=false,IsQueryField=true)]
    public string Notes
    {
        get { return _notes; }
        set { _notes = value; }
    }

    private int _age;
    [SearchContentField(SearchContentFieldType.UnIndexed)]
    public int Age
    {
        get { return _age; }
        set { _age = value; }
    }

    public override string ToString()
    {
        StringBuilder sb = new StringBuilder();
        sb.AppendLine("Id = " + _id);
        sb.AppendLine("FullName = " + _fullName);
        sb.AppendLine("Age = " + _age.ToString());
        
        return sb.ToString();
    }
}

NOTE: For details about the meanings of UnIndexed, UnStored, Text and Keyword read
Lucene.NET documentation. SearchContentFieldType is a utility enumeration that provides us the ability to call appropriate methods of Field found in Lucene.Net.Documents namespace.

Tips on Merging Extended Search Classes to Cuyahoga Core

You do not need to merge these extensions directly to Cuyahoga.Core in order to use them.
But if you want Cuyahoga.Core to handle full text indexing in a more generic way, you can simply replace IndexQuery, IndexBuilder, SearchCollection, ISearchable, IndexEventHandler and IndexEventArgs classes with their counterparts found in Bo.Cuyahoga.Extensions.Search namespace. And obviosuly you will have to refactor other parts using the old versions of the replaced classes. One final issue you must consider is that you will have to add attributes to SearchContent and SearchResult classes.

Dependencies

In order to compile this code you need
  • Cuyahoga.Core project
  • Lucene.Net.ddl
  • log4net.dll

History

  • 27 March 2008:
    • Bug in DeleteContent method resolved
    • Exception is thrown in case of null getter or setter methods in ReflectionHelper class GetKeyFields, GetFields and SetSearchResultField methods
  • 07 March 2008: First version published
  • 10 March 2008:
    • ReflectionHelper class added for performance improvement.
    • SearchContentFieldInfo methods modified to call appropriate ReflectionHelper methods.
    • SearchContentFieldInfo is not a struct anymore.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here