Introduction
This project started from a puzzle : There is a common English word that is nine letters long. Each time you remove a letter from it, it still remains an English word - from nine letters right down to a single letter. What is the original word, and what are the words that it becomes after removing one letter at a time? I wrote DicoLib to find the word, but since I stupidely translated "nine" by "8" in the code, I didn't find THE solution, but some others listed
here DicoLib uses an original data structure that optimises the search of words which are "close" to a given one or contain the same letters. It is therefore suitable for spell checking, although this application hasn't (yet) been developed.
Data Structure
DicoLib's indexes words:
- by their length
- and by a bitset of 26 bits which describe which letters are present in the word
Actually, DicoLib stores a list of anagrams for each length/bitset pair. Finding anagrams of a given word therefore takes 0 time!
Finding words that are made of specified letters for Scrabble, or of a given length and contianing given letters is extremely fast as it requires mainly bitset operations and table lookup.
The whole data structure used in DicoLib is depicted in the schema below:
Source code
DicoLib is available on SourceForge You're welcome to contribute to the development of this project towards a very efficient spell-checker or any other application you may think of.