Introduction
Data entry sometimes requires from the end user a quite annoying and time spending input process. Here we discuss the well known association rule [AGRAWAL93] in order to suggest "often used" values depending on the current user input. It also introduces the implementation of such
a technique in an n-tiered scenario ending with a healthcare drug prescription case study.
Background
Through an example we introduce the idea behind an association rule leaving to [AGRAWAL93] for a formal discussion.
Consider the following table T:
Attribute1 | Attribute2 | Attribute3 |
a | b | c |
a | b | c |
a | b | d |
a | d | d |
a | f | d |
We say that the rule Attribute1[a], Attribute[b] → Attribute3[c] is
satisfied in T with support = 3/5 and confidence = 2/3 where support for a rule is the fraction of transactions in T that satisfy the union of items in the consequent and antecedent of the rule. Furthermore, a rule has confidence of factor 0 ≤ c ≤ 1 if at least c% of transactions in T that satisfy its antecedent also satisfy its consequent. In English we would say that: "if Attribute1 = a and Attribute2 = b than, with a
certain probability, Attribute3 = c". Several algorithms for the problem of mining association-rules have been introduced like Apriori [AGRAWAL94].
Using the code
Server side.
Here it resides the service contract declaration (IService
), its implementation (Service
), a set of lightweight objects (DTO
) serialized during the client-server communication, and the persistence layer to query the set of rules mined (RuleSet).
ServiceLayer.IService
: It defines the service contract.
namespace ServiceLayer
{
public interface IService
{
List<DTO.AttributeDTO> minedConsequents(List<DTO.AttributeDTO>
antecedents, List<DTO.AttributeDTO> consequents);
}
}
ServiceLayer.Service
: The service call implementation.
namespace ServiceLayer
{
public class Service : IService
{
public List<DTO.AttributeDTO> minedConsequents(List<DTO.AttributeDTO>
antecedents, List<DTO.AttributeDTO> consequents)
{
var ants = Dtos_to_Attributes(antecedents);
var cons = Dtos_to_Attributes(consequents);
var tmp = PersistenceLayer.RuleDAO.getConsequents(ants, cons);
return Attributes_to_Dtos(tmp);
}
}
}
DTO.AttributeDTO
: The data transfer object to encapsulate mined rules information to/from the client.
namespace DTO
{
public class AttributeDTO
{
public string AttributeName { get; private set; }
public string AttributeValue { get; set; }
public AttributeDTO(string name = null, string value = null)
{
AttributeName = name;
AttributeValue = value;
}
}
}
PersistenceLayer.RuleDAO
: The implementation of data access object to query RuleSet. Briefly, the call queries RuleSet and returns all consequents
of rules with a specified antecedent.
namespace PersistenceLayer
{
public class RuleDAO
{
public static List<Entities.Attribute> getConsequents(
List<Entities.Attribute> antecedents, List<Entities.Attribute> consequents = null)
{
...
}
}
}
Storage
No particular assumptions are taken for the production database model. Usually a relational database is used, however it could be considered other approaches like object-relational and the most recent NoSQL models.
Warehouse: ETL phase. Transformations that produce a suitable input (Datamart) for the association rules mining (Apriori). As any ETL extraction, decisions about the
frequency of runs, in which time of the day, the input dataset considered, and any other strategy will be made upon the current scenario.
RuleSet.xml: Is the output of the Apriori runs. RuleSet can be stored in any format accepted by the data access layer. A custom XML
representation is chosen however more common formats like XRFF or ARFF can be used.
<?xml version="1.0" encoding="utf-8" ?>
<rules>
<rule support="2" confidence="60">
<antecedent attributeName = "attribute1" value ="10"/>
<antecedent attributeName = "attribute2" value ="20"/>
<consequent attributeName = "attribute3" value ="30"/>
</rule>
...
</rules>
Client side
GUI.Form1.gui_KeyPress
: This event is fired whenever a user event like a key digit is performed. GUI.Form1._bindingGUIAttributes
: This in-memory data structure binds each GUI component with the associated attribute.
GUI.Form1.setMinedAttributes()
: This method is called whenever a
gui_KeyPress
event is fired. Given the actual
user input (antecedent) it queries the GUI.Proxy
for corresponding
consequences (corresponding to the fields not yet compiled) and suitable refreshes the associated fields.
namespace GUI
{
public partial class Form1 : Form
{
Dictionary<string, BindedValue> _bindingGUIAttributes;
public Form1()
{
...
_bindingGUIAttributes = new Dictionary<string, BindedValue>();
_bindingGUIAttributes.Add(textBox1.Name,
new BindedValue(new DTO.AttributeDTO("Drug", null),true));
_bindingGUIAttributes.Add(textBox2.Name,
new BindedValue(new DTO.AttributeDTO("Route", null), true));
_bindingGUIAttributes.Add(textBox3.Name,
new BindedValue(new DTO.AttributeDTO("Form", null), true));
_bindingGUIAttributes.Add(textBox4.Name,
new BindedValue(new DTO.AttributeDTO("Dose", null), true));
}
private void setMinedAttributes()
{
List<DTO.AttributeDTO> antecedents = this.getNonMineableAttributes();
List<DTO.AttributeDTO> consequents = this.getMineableAttributes();
var minedConsequents = new GUI.Proxy().minedConsequents(antecedents, consequents);
...
foreach (var cons in consequents)
{
...
foreach (var mined in minedConsequents)
{
...
updateAttributeValue(cons, mined.AttributeValue);
}
}
}
private void gui_KeyPress(object sender, KeyPressEventArgs e)
{
...
setMinedAttributes();
}
class BindedValue
{
public DTO.AttributeDTO attribute;
public bool isMineable;
public BindedValue(DTO.AttributeDTO attr, bool mineable)
{
attribute = attr;
this.isMineable = mineable;
}
}
}
}
GUI.Proxy
: It manages the client-side service invocation mechanism sharing the IService
contract with the server side ServiceLayer
.
namespace GUI
{
class Proxy : ServiceLayer.IService
{
public List<DTO.AttributeDTO> minedConsequents(List<DTO.AttributeDTO>
antecedents, List<DTO.AttributeDTO> consequents)
{
var service = new ServiceLayer.Service();
return service.minedConsequentes(antecedents, consequents);
}
}
}
Case study
Consider a data entry task where a doctor submits through GUIMiner a drug prescription described by a drug, a route, a form and a dose and the desired prescription:
Thanks to ETL procedures and Apriori runs suitable parameterized with support and confidence the following RuleSet is mined:
[Drug]en → [Route]oral
[Drug]en, [Route]oral, [Form]t → [Dose]3
The doctor opens the GUIMiner and starts with the new drug prescription cited above. He begins with digitizing the drug's name:
Behind the scenes, the gui_KeyPress
event is fired and the
system checks if the current input compares as antecedent in a rule in RuleSet. At the moment no matches are found thus no suggestions returned to GUIMiner. Then the second digit of the drug's name. Again,
the gui_KeyPress
event is fired and the system checks for rules. Now the rule [Drug]en → [Route]oral<code>
is found and the consequent [Route]oral is returned encapsulated in an AttributeDTO
instance. GUIMiner automatically populates the corresponding field maybe graphically marked as "a suggested value":
Because "oral" is the right route for the prescription, the doctor can avoid to digit it and carries on with the compilation of the form. The
system checks if the current input ([Drug]en, [Route]oral, [Form]t) is an antecedent in some rule. Because the rule [Drug]en, [Route]oral, [Form]t → [Dose]3 is found the corresponding consequent is returned. GUIMiner automatically populates
the Dose field:
Finally, the doctor replaces this last mined field with the desired value Dose = 2 and submits the prescription.
Conclusions
A kick-start has been shown but in the real world details make the difference. Just to give you an idea of some issues you can come across with we report few random considerations.
If the service invocation is a bottleneck then a client-side caching approach could be the right way. In the previous solution a remote invocation is performed whenever a user key digit is performed. In order to avoid those remote calls a client-side cache behind the GUI.Proxy
could be implemented. Furthermore, different caching policies can be adopted like the expiring time, the subset of rules downloaded depending by the attributes that appear in the GUI and rule-cumulative strategies.
Another interesting aspect is the introduction of graphical feedbacks to notify the user if a field is an explicit user input rather than a mined value. In the previous solution a different border style is rendered to show that the field is mined. At the next digit the border style comes back to the default value.
Bed and good news: Apriori executions could be an expensive task but, at the same time, the RuleSet mined in large amount of data changes after many database insert/delete/update operations thus you should find a good trade-off between having an updated RuleSet and saving computational resources for the other business tasks.
Using syntactic constraints: the call getConsequents(List<DTO.AttributeDTO> ants, List<DTO.AttributeDTO>cons = null)
considers {X→Ij ∈ RuleSet | ants = X and Ij ∈ cons} yet other queries like {X→Ij ∈ RuleSet | ants ⊆ X and Ij ∈ cons} could be fine being
careful of the following result: confidence(AB→C) ≥ confidence(A→C).
Bibliography
- AGRAWAL93: Rakesh Agrawal; Tomasz Imielinski; Arun Swami, Mining Associations Rules between Sets of Items in Large Databases, 1993.
- AGRAWAL94: Rakesh Agrawal; Ramakrishnan Srikant, Fast Algorithms for Mining Association Rules, 1994.