Resolving Rule Conflicts with LogLinear Sum
Basic Idea
The LogLinear formula is adopted from the Logistic Regression method. Let vector R be the set of induced unordered rules {R_{1}, ..., R_{n}} and let x be an example we would like to classify. The LogLinear model for rule classification in twoclass domains (c_{0}, c_{1}) is a weighted sum (W is a realvalued vector):
 

 
 

w_{0}+w_{1}×R_{1}(x) + ...+ w_{n}×R_{n}(x), 
  (5) 

where weights w_{1}, ... , w_{n} given to rules are all nonnegative reals. The terms R_{i}(x) are defined as:
R_{i}(x) = 
  
 


if conditions of R_{i} are false for example x; 


if R_{i} predicts class c_{1}; 


if R_{i} predicts class c_{0}. 



 (6) 
The predicted probability of class c_{1} for example x is computed from f(x) through the logit link function:

P

(c_{1}x) = 
1
1+e^{f(x)}


 (7) 
Although this particular model can only be used in domains with two classes, it can be extended to its multiclass version, in the same way as basic logistic regression is extended into a multinomial logistic regression.
Using loglinear sum for rule classification is implicitly already used in NaiveBayesian classification from rules. However, Naive Bayes assigns weights to each rule independently from other rules, which can be a problem if the rules are correlated. And they usually are! I propose a different way of doing it. I demand that the loglinear model gives predictions that are consistent with predictions of single rules, which automatically assures that correlated rules will not bias the model.
Implementation in Orange
This classification method is implemented within the Orange datamining software, which you can download from http://www.ailab.si/orange/downloads.asp. I propose taking the latest snapshot, as Orange is getting better every day.
The class orngCN2.CN2EVCUnorderedLearner combines EVC evaluation of rules, probabilistic covering and LCR classification. This learnRules.py script illustrates how rules can be learned and printed along with their coefficients. The learnRulesParams.py is a script that describes all possible parameters accepted by CN2EVCUnorderedLearner and demonstrates how these parameters could be changed. For the above script, you will need the wellknown Titanic dataset.
Visualisation
Obtained rules can be visualised with the standard widget for presenting rules in Orange, see rulesWidget.py as an example how to do it. Moreover, the loglinear formula also allows a nice visualisation with nomograms, which are usually used for logisticregression and linear models alike. The nomogram.py script shows how rules can be submitted to nomograms widget. You will need the SAHeart dataset. The obtained nomogram is shown in the picture belowleft, while rules in widget are in the image to the right. In nomogram, each horizontal line corresponds to one rule. The conditions of the rule are described as text at the left. The value yes on the line denotes the rule contribution if it triggers for the example, otherwise its contribution is zero. The nomogram nicelly shows how a rule is relevant with respect to other rules; for example, the age>31 and type>68 rule has a much lower accuracy than tobacco<0.480, however it has a considerably larger weight, as the "tobacco" rule must be explained already by other rules.
 