Figure 1: The TagCloud Control
Introduction
I've often seen tag clouds on websites, but I had the need to use them in Windows Forms. So, I developed a control which you can add easily to your form. It will help you to visualize strings of a text or a collection, depending on their importance. Beyond this, you can also click on the items to use them for data entry. Every click increases the item's importance, and may change the view of the items in the cloud, as they are all seen in connection with all the other items in the cloud.
Background
The strings (I also call them 'items') are stored in an XML file. The XML file design is very easy. Every item is stored in a Tag
, which has three attributes: Text
, Occurrence
, and Last
. Example:
<Tag Text="ABC of programming" Occurrence="5" Last="633923147566312334" />
Text
contains the item's text, Occurrence
stores how often the item is being used (or the number of times the item appears in a text), and Last
contains the time of last access (in DateTime.Ticks
). I've implemented the access to a default XML file, which is called 'Tags.xml', and lies in the program directory. Of course, you can load and save any user-created tag file.
To demonstrate the control's use, I've written a wrapper which shows you most of the control's abilities. Please be aware that the wrapper is not a masterpiece in GUI programming. Its only purpose is to show some effects in connection with the tag cloud.
In the screenshot (figure 2), the button 'Read Default Tags XML' has been pressed with the effect that the contents of the default file 'Tags.xml' is shown in the tag cloud. On the right, you can see how many items can be shown in the cloud by: 'How many items shown in cloud', and how many items are in use at all: 'How many items at all'. In our example, 34 items can de displayed. The number of items seen in the tag cloud depends on the control's size (you can change it in steps of '+10' in the wrapper) and on the complex interaction of the item weights, meaning their occurrence.
The item weights are visualized by five different font styles and colors. You can change them by setting the control's property (described later). The wrapper demonstrates how to change the output design of the least important items by 'Set Output design 1', and those in the midrange by 'Set Output design 3'. You can reset to the default using 'Set Standard Output designs'.
Figure 2: The TagCloud Control, Embedded in TagCloudWrapper
Besides this, you can also change the control's background color (the default is Color.Azure
) and the look of the tags. Setting the property 'Underline on' shows the item being underlined on mouse hover, and setting 'Frame on' shows frames around the items (this is also the catch area when clicking on an item).
Figure 3: The TagCloud Control with Frame On
Clicking onto an item raises an event. You can see this in the wrapper in the 'Tags Information' area: the clicked text is shown in the textbox 'Text' and the occurrence in the textbox 'Occurrence'. Every click on an item increases its weight by 1, and depending on the other items, it will change its output design in the cloud (or not!). Whether belonging to a certain output design is calculated by statistical means, depending on the overall average of all items and the standard deviation. You can also find out the 'Most' and 'Least' clicked items as well as the 'Youngest' and 'Oldest' clicked ones.
In the area 'Some Tag Manipulations', you can study how items are added (please study the source code, what really happens there), and you can 'Clear all items' in the cloud. With 'Add this item', you can add an item to the cloud. It must be edited in the textbox left to the button. When adding a new item, its weight (occurrence) is set to the average value of all the items, in order to let it really appear. If it would start with weight 1, it might be possible that it would not be displayed, as other items would dominate.
To manipulate the cloud control's contents, you can open a context menu by clicking the right mouse button over an item. You can add a new item, remove an item, or change an item text.
Figure 4: Context Menu in TagCloud Control
Another way to show tags in the cloud is to open a text file or an HTML file. This is demonstrated in the area: 'Read Tags from text and HTML files'. The three buttons there demonstrate the opening of a text or HTML file, the evaluation of the important tags inside, and displaying the tags in the cloud control. As many words in English (or German) are filler words, I've added two files: ExcludeList-en.txt and ExcludeList-de.txt, which contain some words which will not be regarded when building the list of words for the cloud. These are words like: I, you, are, if, them, here, about, never... The list is read, depending on the expected language. In my example, it's English and German (no French, Spanish, Italian... sorry). I know that the lists are not complete, so feel free to add or remove words as you like, or even design an exclude list for a language you need. Another exclude list is used when trying to show an HTML file: ExcludeList-html.txt. It contains some of the syntax of HTML and web design (but is not complete too).
Furthermore, the cloud control has drag and drop functionality. You can take any text or HTML file and drop it onto the control. Please consider not to use too large files. I've implemented some checks to avoid wrong file formats being loaded, so you may get a message like this:
Figure 5: Unsuccessful Drag and Drop
Below, you can see an English text file (example: majorca_en.txt) which has been dragged and dropped over the cloud control. The height and width have been extended manually to show more items in the control.
Figure 6: Successful Drag and Drop with File majorca_en.txt
Using the Code
To use the cloud control, you have to add 'TagCloud.dll' to your references. You can see it best in the TagCloudWrapper project, which is part of the solution.
In the following sections, I'll try to describe the classes of the 'TagCloud' project.
TXTHandling
This class is responsible for reading text or HTML files. The words in the files are separated by some delimiters like ', . : " ;'. In the next step, all the words are examined, if they are 'wanted'. Wanted means that they are not part of the appropriate exclude list.
But, let's start from the beginning. First, a file is checked, whether it is a text file, or an HTML file, or not. A rudimentary algorithm checks if it's English or German text, and if it's text or HTML:
void GetFileTypeAndLanguage(string filename,
out bool ishtml,
out TagCloudControl.TextLanguage language)
Depending on the result, either (ishtml == false
):
bool ReadTextFile(out SortedList<string, TagCloudControl.StringItem> sc,
string filename,
TagCloud.TagCloudControl.TextLanguage language)
or (ishtml == true
):
bool ReadHTMLFile(out SortedList<string, TagCloudControl.StringItem> sc,
string filename,
TagCloud.TagCloudControl.TextLanguage language)
is called. Furthermore, the appropriate exclude list will be applied on the words, depending on the language English or German. In the case of an HTML file, the HTML exclude list will also be applied to filter some HTML specific words. The result is a list of 'wanted' words, which are stored in SortedList sc
. Parts of this list will then be displayed in the cloud control.
One short comment on the exclude lists: as mentioned above, they are called: ExcludeList-en.txt, ExcludeList-de.txt, and ExcludeList-html.txt.
They are normal text files, which you can change on your own.
The syntax is:
Exclude:
word1, word2, word3, word4,..
wordxxx,wordyyy, wordzzz,...
...
Include:
word1, word2
Example part of 'ExcludeList-en.txt':
Exclude:
a,able,about,above,ah,ain't,alas,am,an,any,anybody,
anyone,anywhere,after,again,against,ago,all,also,
although,among,and,are,around,as,at,away,
...
Include:
May,US,
You can see that there is also an 'Include' part. Its purpose is to include words which should normally be excluded. The words in the 'Exclude' part should be written in lower case and separated by commas. As the words 'may' and 'May' have different meanings, you should define in the 'Include' part all those words which should not be excluded when being written in the way you define them there (meaning: write them case sensitive). Please feel free to add further words or delete words. I'm not a native English speaker, so I think there are lots of improvements you can do.
Some methods of the class 'TXTHandling
' which deal with word processing, are:
int FillExcludeIncludeList(
TagCloud.TagCloudControl.TextLanguage language,
bool clear)
bool WantedWord(string newword)
bool WantedHTMLWord(string newword)
bool IsRomanNumber(string number)
bool ContainsDigit(string word)
bool IsTimeSpan(string timespan)
XMLHandling
The class 'XMLHandling
' contains the methods for reading and writing a tag file, which contains the tags (items) of a cloud. The methods 'ReadTagFile
' and 'WriteTagFile
' are overloaded. They give access to a certain filename or to the default tag file 'Tags.xml'. The default filename is got by the method 'GetDefaultFileName
'.
FileFormats
The class 'FileFormats
' contains some methods to check file formats. Some of the following formats are examined:
- Graphic file formats
- Audio file formats
- Video file formats
- Compressed file formats
- Program and development file formats
- Document file formats
- System file formats
- Database file formats
The list is not exhaustive - feel free to add further file formats there. The files are recognized in a simple way by reading their magic numbers. These are sequences of bytes, most of them at the beginning of the file, which allow you to get a general idea of which kind of file it could be. In this solution, the methods are used to avoid 'wrong' files being dragged and dropped over the cloud control.
Statistics
The class 'Statistics
' is responsible for delivering the mean value of the collection of values:
double Mean(IEnumerable<double> values)
and the standard deviation:
double StandardDeviation(IEnumerable<double> values, out double mean)
Both values are needed to regard an item in its context of the other items to decide, whether it will be shown in the cloud control and which weight it has there. Depending on its weight, one of 5 output designs (font, color) is assigned to it.
AddItem and RenameItem
These classes are simple dialog boxes which are shown when being clicked in the context menu. 'AddItem
' allows you to add a new item to the cloud, 'RenameItem
' offers you the possibility to change an existing item's name.
TagCloudControl
This class is the central class for cloud control management and display. It contains a lot of properties and methods to steer the cloud control. In the following section, I will describe some of them. The best way to understand the functionality is to study the methods and property calls in TagCloudWrapper
.
Let's start with the properties:
public Color ControlBackColor
public bool ControlTextFrame
public bool ControlTextUnderline
public int ControlHeight
public int ControlWidth
public string MostClickedItem
public string LeastClickedItem
public string OldestClickedItem
public string YoungestClickedItem
public int ItemsCount
public int ShownItemsCount
The only event which is fired is on the mouse click on an item:
[Description("Delegates a click onto the user control to the wrapper")]
public delegate void OnUserControlClick(string Text, double Occurrence);
public event OnUserControlClick clickHandler;
The event returns the text and the occurrence of the item in the cloud.
Public Methods for Output Design Management
public void SetAllDesigns(bool update)
public bool SetDesign(int number, string font, float size, Color color)
public bool SetDesign(int number, OutputDesign od)
Public Methods for String Management
public void AddItem(string text)
public void AddItem(string text, double occ)
public void AddItem(string text, double occ, long last)
public void AddItem(SortedList<string, StringItem> sc)
public void AddItem(string text, bool regardmean)
public void RemoveItem(string text)
public void ClearAllItems()
public SortedList<string, StringItem> GetAllItems()
Private Methods to Build the Cloud
These methods are all private, as they are only called when:
- strings are added to or removed from the string collection
- the weights of the items change
- the output design of the items change
- the design of the cloud changes
The methods are responsible for building the cloud's contents and showing the items:
private void SetCloud()
private void CopyStringsToCloud()
private int BuildCloud(ref StringItem si)
The algorithm works as follows: the items are listed in an alphabetic manner. Due to their weight, they get an output design (one of 5). It will be calculated if the item's text dimensions will fit into the cloud. If the dimensions exceed the right border, a new line will be started. The height of a line is given by the maximum height of the items of one line. You can best understand it when you have a look at figure 3, where the text borders are shown (frame on). The new line will be filled in the same way, and this is done until the lower border of the cloud control is reached. If the list of items has not yet reached its end, the 'weakest' item will be removed from the list of items which should be displayed. The 'weakest' item is that one which has the least occurrence and which has not been called for the longest time span.
Now, the filling of the cloud control begins in a new cycle. As an item has been removed, the weights of the other items may have changed. So the output designs may have changed due to new statistical calculations, and accordingly, the text dimensions may have changed too. The whole thing is a recursive process targeted to display most of the 'important' items. In this case, 'important' means, with the most occurrence and recently called. For details, please study the code. Maybe it can be optimized, but on my machine, I have no performance problems.
Points of Interest
I've planned to use the cloud control in two future applications.
On one hand, I want to write a program to apply some basic modifications to images taken by a digital camera. One point will be to rename the files, and I think I will use the tag cloud to get keywords for half-automatic file renaming. I plan to build a set of keywords from our last holidays, just as suggested in the example in figure 6.
On the other hand, I also plan to write a program which helps you learn by asking you questions, which you have to answer correctly. The questions are put randomly, but I could imagine it would be a good thing to additionally offer a tag cloud containing theme-based keywords you can click onto.
I hope, you have much more good ideas - let me know!
History
- Initial version 1.0: December 2009.