Introduction
Cuyahoga provides module developers with a simple to use wrapper classes for full text indexing with Lucene.NET.
Although wrapper classes are easy to use these classes are not generic in nature. I mean
we, as module developers, have to convert our custom content (be it static or dynamic)
to a SearchContent object so that wrapper classes can index that content. Another similar limitation is when we perform search on the indexed content we can only get collection of SearchResult objects. I also want to mention that we, as module developers, have no control over which fields will be stored/unstored, which fields will be used as keywords or which fields will not be indexed while wrapper objects build the full text index.
In this article I will try to show you how we can make Cuyahoga.Core fulltext indexing capabilities more generic by providing a seperate extension assembly.
Background
I created two web sites using Cuyahoga framework
While developing these sites I needed to fulltext index some dynamic content, I mean content persisted in the database. After exploring the wrapper classes in Cuyahoga.Core.Search namespace I noticed the limitations of the present search architecture and decided to make these wrapper classes more generic by appliying .NET Generics and Reflection.
Bo.Cuyahoga.Extensions.Search Namespace
In order to extend Cuyahoga.Core.Search namespace functionality using .NET Generics and Reflection I copied all files to my project and started refactoring. I appended Ex to class names in order to distinguish extended wrapper classes from the original classes in Cuyahoga.Core.Search namespace.
IndexBuilder<T> class
Refactored generic version of Cuyahoga.Core.Search.IndexBuilder class. This class is
used as a wrapper around Lucene.NET index building functionality. Take a closer look at BuildDocumentFromSearchContent
DeleteContent
methods. These methods were modified so that we can use reflection to retreive fields of content object.
private Document BuildDocumentFromSearchContent(T searchContent)
{
Document doc = new Document();
IList fields = SearchGenericUtils.GetSearchContentFields(typeof(T), searchContent);
for(int i = 0; i < fields.Count ; i++)
{
SearchContentFieldInfo fi = fields[i];
switch (fi.FieldType)
{
case SearchContentFieldType.Text:
doc.Add(Field.Text(fi.Name, fi.Value));
break;
case SearchContentFieldType.UnStored:
doc.Add(Field.UnStored(fi.Name, fi.Value));
break;
case SearchContentFieldType.UnIndexed:
doc.Add(Field.UnIndexed(fi.Name, fi.Value));
break;
case SearchContentFieldType.Keyword:
doc.Add(Field.Keyword(fi.Name, fi.Value));
break;
default:
break;
}
}
return doc;
}
public void DeleteContent(T searchContent)
{
if (this._rebuildIndex)
{
throw new InvalidOperationException("Cannot delete documents when rebuilding the index.");
}
else
{
this._indexWriter.Close();
this._indexWriter = null;
SearchContentFieldInfo ki = SearchGenericUtils.GetSearchContentKeyFieldInfo(typeof(T), searchContent);
if (String.IsNullOrEmpty(ki.Name))
throw new Exception("SearchContentKey Field not specified on target class!");
Term term = new Term(ki.Name, ki.Name);
IndexReader rdr = IndexReader.Open(this._indexDirectory);
rdr.Delete(term);
rdr.Close();
}
}
IndexQuery<T> class
Refactored generic version of Cuyahoga.Core.Search.IndexQuery class. This class is used as a wrapper around Lucene.NET index search functionality. Find
was modified to so that we can
create instance of the content object and set property values via reflection.
public SearchResultCollection<T> Find(string queryText, Hashtable keywordFilter, int pageIndex, int pageSize)
{
long startTicks = DateTime.Now.Ticks;
string[] qryFields = SearchGenericUtils.GetSearchContentQueryFields(typeof(T));
if (qryFields.Length == 0)
throw new Exception("No query field specified on target class!");
Query query = MultiFieldQueryParser.Parse(queryText, qryFields, new StandardAnalyzer());
IndexSearcher searcher = new IndexSearcher(this._indexDirectory);
Hits hits;
if (keywordFilter != null && keywordFilter.Count > 0)
{
QueryFilter qf = BuildQueryFilterFromKeywordFilter(keywordFilter);
hits = searcher.Search(query, qf);
}
else
{
hits = searcher.Search(query);
}
int start = pageIndex * pageSize;
int end = (pageIndex + 1) * pageSize;
if (hits.Length() <= end)
{
end = hits.Length();
}
SearchResultCollection<T> results = new SearchResultCollection<T>();
results.TotalCount = hits.Length();
results.PageIndex = pageIndex;
string[] resultFields = SearchGenericUtils.GetSearchContentResultFields(typeof(T));
for (int i = start; i < end; i++)
{
T instance = Activator.CreateInstance<T>();
for (int j = 0; j < resultFields.Length; j++)
{
SearchGenericUtils.SetSearchResultField(instance, resultFields[j], hits.Doc(i).Get(resultFields[j]));
}
if (instance is ISearchResultStat)
{
SearchGenericUtils.SetSearchResultField(instance, "Boost", hits.Doc(i).GetBoost());
SearchGenericUtils.SetSearchResultField(instance, "Score", hits.Score(i));
}
results.Add(instance);
}
searcher.Close();
results.ExecutionTime = DateTime.Now.Ticks - startTicks;
return results;
}
SearchContentFieldAttribute Class
.NET attributes are very neat implementation of declarative programming. We use SearchContentFieldAttribute to identify how we will perform fulltext indexing on our content. A field in our fulltext index directly maps to a property of our custom content type. For example if we want to index a Person class we specify which properties of this class will be indexed, how they are indexed and which properties will be set as a result of a fulltext index search with SearchContentFieldAttribute.
SearchContentFieldAttribute is used to declare
- Field Type: This declaration is needed because Lucene.NET has different field declarations. For details see
Field
class in
Lucene.Net.Documents namespace
- Key Field: Key field is used to uniquely identify our copntent. Only one key field declaration is allowed. If you declare two properties
as key fields on the content class only the first one will be used.
- Result Field: Content class properties with IsResultField=true will be automatically set after index search, other fields will not be set.
- Query Field: Content class properties with IsQueryField=true will be used in index searches.Index search will be performed only on these
properties
namespace Bo.Cuyahoga.Extensions.Search
{
[AttributeUsage(AttributeTargets.Property, AllowMultiple = false, Inherited = true)]
public class SearchContentFieldAttribute : Attribute
{
private SearchContentFieldType _fieldType;
public SearchContentFieldType FieldType
{
get { return _fieldType; }
set { _fieldType = value; }
}
private bool _isResultField = true;
public bool IsResultField
{
get { return _isResultField; }
set { _isResultField = value; }
}
private bool _isQueryField = false;
public bool IsQueryField
{
get { return _isQueryField; }
set { _isQueryField = value; }
}
private bool _isKeyField = false;
public bool IsKeyField
{
get { return _isKeyField; }
set { _isKeyField = value; }
}
public SearchContentFieldAttribute(SearchContentFieldType fieldType)
{
_fieldType = fieldType;
}
}
public enum SearchContentFieldType
{
Text,
UnStored,
UnIndexed,
Keyword
}
}
SearchResultCollection<T> Class
This is the generic version of the SearchResultCollection class found in Cuyahoga.Core.Search namespace.Nothing special to mention about this class
ISearchResultStat interface
We use this interface in order to specify that our custom content type will hold query statistics like Boost and Score. If our custom content type implements this interface value of the Boost and Score properties are set after an index query.
public interface ISearchResultStat
{
float Boost{get;set;}
float Score{ get;set;}
}
ReflectionHelper Singleton
Reflection helper class designed to improve performance
public sealed class ReflectionHelper
{
#region Static Fields And Properties
static ReflectionHelper _instance = null;
static readonly object _padlock = new object();
public static ReflectionHelper Instance
{
get
{
lock (_padlock)
{
if (_instance == null)
{
_instance = new ReflectionHelper();
}
return _instance;
}
}
}
#endregion Static Fields And Properties
#region Instance Fields And Properties
private Dictionary> _cache;
#endregion Instance Fields And Properties
#region CTOR
private ReflectionHelper()
{
_cache = new Dictionary>();
}
#endregion
#region Reflection Helper Methods
public SearchContentFieldInfo[] GetKeyFields(Type t)
{
if (!_cache.ContainsKey(t))
AddTypeToCache(t);
List keyFields = _cache[t].FindAll(
delegate(SearchContentFieldInfo fi)
{
return fi.IsKeyField == true;
});
return keyFields.ToArray();
}
public SearchContentFieldInfo[] GetKeyFields(Type t, object instance)
{
SearchContentFieldInfo[] fields = GetKeyFields(t);
if (instance == null)
return fields;
GenericGetter getMethod;
for (int i = 0; i < fields.Length; i++)
{
getMethod = CreateGetMethod(fields[i].PropertyInfo);
object val = getMethod(instance);
fields[i].Value = val == null ? String.Empty : val.ToString();
}
return fields;
}
public IList GetFields(Type t, object instance)
{
if (!_cache.ContainsKey(t))
AddTypeToCache(t);
Listresult = new List(_cache[t].ToArray());
if (instance == null)
return result;
GenericGetter getMethod;
for (int i = 0; i < result.Count; i++)
{
SearchContentFieldInfo fi = result[i];
getMethod = CreateGetMethod(fi.PropertyInfo);
object val = getMethod(instance);
fi.Value = val == null ? String.Empty : val.ToString();
}
return result;
}
public string[] GetQueryFields(Type t)
{
if (!_cache.ContainsKey(t))
AddTypeToCache(t);
List < SearchContentFieldInfo > qryFields = _cache[t].FindAll(
delegate(SearchContentFieldInfo fi)
{
return fi.IsQueryField == true;
});
List result = qryFields.ConvertAll(
delegate(SearchContentFieldInfo fi)
{
return fi.Name;
});
return result.ToArray();
}
public string[] GetResultFields(Type t)
{
if (!_cache.ContainsKey(t))
AddTypeToCache(t);
List qryFields = _cache[t].FindAll(
delegate(SearchContentFieldInfo fi)
{
return fi.IsResultField == true;
});
List result = qryFields.ConvertAll(
delegate(SearchContentFieldInfo fi)
{
return fi.Name;
});
return result.ToArray();
}
public void SetSearchResultField(string fieldName, object instance, object value)
{
Type t = instance.GetType();
if (!_cache.ContainsKey(t))
AddTypeToCache(t);
SearchContentFieldInfo field = _cache[t].Find(
delegate(SearchContentFieldInfo fi)
{
return fi.Name == fieldName;
});
if (field.Name != fieldName)
throw new Exception(String.Format("Field with name \"{0}\" not found on type \"{1}\"!", fieldName, t));
GenericSetter setter = CreateSetMethod(field.PropertyInfo);
setter(instance, Convert.ChangeType(value,field.PropertyInfo.PropertyType));
}
#endregion
#region Cache Operations
private void AddTypeToCache(Type t)
{
if (_cache.ContainsKey(t))
return;
List fields = new List();
PropertyInfo[] props = t.GetProperties(BindingFlags.Public | BindingFlags.Instance);
for (int i = 0; i < props.Length; i++)
{
PropertyInfo pi = props[i];
SearchContentFieldAttribute[] atts = (SearchContentFieldAttribute[])pi.GetCustomAttributes(typeof(SearchContentFieldAttribute), true);
if (atts.Length > 0)
{
SearchContentFieldInfo fi = new SearchContentFieldInfo();
fi.Name = pi.Name;
fi.FieldType = atts[0].FieldType;
fi.IsKeyField = atts[0].IsKeyField;
fi.IsResultField = atts[0].IsResultField;
fi.IsQueryField = atts[0].IsResultField;
fi.PropertyInfo = pi;
fields.Add(fi);
}
}
if (fields.Count > 0)
_cache.Add(t, fields);
}
#endregion
#region Emit Getter/Setter
private GenericSetter CreateSetMethod(PropertyInfo propertyInfo)
{
MethodInfo setMethod = propertyInfo.GetSetMethod();
if (setMethod == null)
return null;
Type[] arguments = new Type[2];
arguments[0] = arguments[1] = typeof(object);
DynamicMethod setter = new DynamicMethod(
String.Concat("_Set", propertyInfo.Name, "_"),
typeof(void), arguments, propertyInfo.DeclaringType);
ILGenerator generator = setter.GetILGenerator();
generator.Emit(OpCodes.Ldarg_0);
generator.Emit(OpCodes.Castclass, propertyInfo.DeclaringType);
generator.Emit(OpCodes.Ldarg_1);
if (propertyInfo.PropertyType.IsClass)
generator.Emit(OpCodes.Castclass, propertyInfo.PropertyType);
else
generator.Emit(OpCodes.Unbox_Any, propertyInfo.PropertyType);
generator.EmitCall(OpCodes.Callvirt, setMethod, null);
generator.Emit(OpCodes.Ret);
return (GenericSetter)setter.CreateDelegate(typeof(GenericSetter));
}
private static GenericGetter CreateGetMethod(PropertyInfo propertyInfo)
{
MethodInfo getMethod = propertyInfo.GetGetMethod();
if (getMethod == null)
return null;
Type[] arguments = new Type[1];
arguments[0] = typeof(object);
DynamicMethod getter = new DynamicMethod(
String.Concat("_Get", propertyInfo.Name, "_"),
typeof(object), arguments, propertyInfo.DeclaringType);
ILGenerator generator = getter.GetILGenerator();
generator.DeclareLocal(typeof(object));
generator.Emit(OpCodes.Ldarg_0);
generator.Emit(OpCodes.Castclass, propertyInfo.DeclaringType);
generator.EmitCall(OpCodes.Callvirt, getMethod, null);
if (!propertyInfo.PropertyType.IsClass)
generator.Emit(OpCodes.Box, propertyInfo.PropertyType);
generator.Emit(OpCodes.Ret);
return (GenericGetter)getter.CreateDelegate(typeof(GenericGetter));
}
#endregion
}
public delegate void GenericSetter(object target, object value);
public delegate object GenericGetter(object target);
SearchGenericUtils Static Class
public class SearchContentFieldInfo
{
public string Name;
public string Value;
public SearchContentFieldType FieldType;
public bool IsResultField;
public bool IsQueryField;
public bool IsKeyField;
public PropertyInfo PropertyInfo;
}
internal static class SearchGenericUtils
{
internal static SearchContentFieldInfo GetSearchContentKeyFieldInfo(Type t, object instance)
{
SearchContentFieldInfo[] keyFields = ReflectionHelper.Instance.GetKeyFields(t,instance);
if (keyFields.Length == 0)
throw new Exception(String.Format("No key filed defined for type {0}!", t));
if(keyFields.Length > 1)
throw new Exception(String.Format("Only one key filed allowed for type {0}!", t));
return keyFields[0];
}
internal static SearchContentFieldInfo GetSearchContentKeyFieldInfo(Type t)
{
return GetSearchContentKeyFieldInfo(t, null);
}
internal static IList GetSearchContentFields(Type t, object instance)
{
if (instance == null)
throw new Exception("Instance parameter is null!");
return ReflectionHelper.Instance.GetFields(t, instance);
}
internal static string[] GetSearchContentQueryFields(Type t)
{
return ReflectionHelper.Instance.GetQueryFields(t);
}
internal static string[] GetSearchContentResultFields(Type t)
{
return ReflectionHelper.Instance.GetResultFields(t);
}
internal static void SetSearchResultField(object instance , string fieldName, object value)
{
if (instance == null)
throw new Exception("Object instance parameter is null!");
if (String.IsNullOrEmpty(fieldName))
throw new Exception("Field name is empty!");
ReflectionHelper.Instance.SetSearchResultField(fieldName, instance, value);
}
}
Using The Sample Code
Our sample console application simply creates three PersonContent objects and full text indexes these objects. Then we perform queries on the indexed content.
In the sample application PersonContent class will be used as our custom content type.The class looks like
public class PersonContent
{
private string _id;
[SearchContentField(SearchContentFieldType.Keyword,IsKeyField=true)]
public string Id
{
get { return _id; }
set { _id = value; }
}
private string _keyword;
[SearchContentField(SearchContentFieldType.Keyword)]
public string Keyword
{
get { return _keyword; }
set { _keyword = value; }
}
private string _fullName;
[SearchContentField(SearchContentFieldType.Text,IsQueryField=true)]
public string FullName
{
get { return _fullName; }
set { _fullName = value; }
}
private string _notes;
[SearchContentField(SearchContentFieldType.UnStored,IsResultField=false,IsQueryField=true)]
public string Notes
{
get { return _notes; }
set { _notes = value; }
}
private int _age;
[SearchContentField(SearchContentFieldType.UnIndexed)]
public int Age
{
get { return _age; }
set { _age = value; }
}
public override string ToString()
{
StringBuilder sb = new StringBuilder();
sb.AppendLine("Id = " + _id);
sb.AppendLine("FullName = " + _fullName);
sb.AppendLine("Age = " + _age.ToString());
return sb.ToString();
}
}
NOTE: For details about the meanings of UnIndexed, UnStored, Text and Keyword read
Lucene.NET documentation. SearchContentFieldType
is a utility enumeration that provides us the ability to call appropriate methods of Field
found in Lucene.Net.Documents namespace.
Tips on Merging Extended Search Classes to Cuyahoga Core
You do not need to merge these extensions directly to Cuyahoga.Core in order to use them.
But if you want Cuyahoga.Core to handle full text indexing in a more generic way, you can simply replace IndexQuery, IndexBuilder, SearchCollection, ISearchable, IndexEventHandler and IndexEventArgs classes with their counterparts found in Bo.Cuyahoga.Extensions.Search namespace. And obviosuly you will have to refactor other parts using the old versions of the replaced classes. One final issue you must consider is that you will have to add attributes to SearchContent and SearchResult classes.
Dependencies
In order to compile this code you need
- Cuyahoga.Core project
- Lucene.Net.ddl
- log4net.dll
History
- 27 March 2008:
- Bug in DeleteContent method resolved
- Exception is thrown in case of null getter or setter methods in ReflectionHelper class GetKeyFields, GetFields and SetSearchResultField methods
- 07 March 2008: First version published
- 10 March 2008:
- ReflectionHelper class added for performance improvement.
- SearchContentFieldInfo methods modified to call appropriate ReflectionHelper methods.
- SearchContentFieldInfo is not a struct anymore.