(Please note that starting with perClass 3.x (June 2011), internal parameters of pipelines are not directly accessible)
Problem: How can I protect a multi-class discriminant against accepting outliers?
Solution: Add a rejection threshold to the discriminant operating point.
Classifiers we train are often executed in environments where new types of measurements appear that were not considered during classifier design. For example, in a fruit sorting problem our classifier distinguishing several types of fruit may also encounter stones, leaves or dirt on the conveyor belt. Accepting stones or dirt as one of the fruit classes results in high sorting error.
In this tutorial, we discuss how to protect the trained multi-class discriminants from accepting such outliers.
The approach we take in this example is adding a reject option to a trained discrimiant. This method does not use any outlier examples during training. Note, however, that some outlier examples are still needed for the sake of evaluation.
18.1. Fruit data set example ↩
Our data set contains three classes, namely the apple and banana fruit and some stones we have observed. Our goal is to discriminate apple and banana while protecting the decision to any potential outliers.
>> a
'Fruit set' 260 by 2 sddata, 3 classes: 'apple'(100) 'banana'(100) 'stone'(60)
>> sdscatter
(a)
Let us first split our data set into training and test subsets. As
mentioned before, we will not consider stone during training but only in
testing phase. Using randsubset
method, we may randomly sample
only some of the classes:
>> [tr,ts]=randsubset
(a,[50 50 0])
'Fruit set' 100 by 2 sddata, 2 classes: 'apple'(50) 'banana'(50)
'Fruit set' 160 by 2 sddata, 3 classes: 'apple'(50) 'banana'(50) 'stone'(60)
18.2. Training a discriminant ↩
We will now train a model of interest on the two-class fruit problem
tr
. In this example, we use the Parzen classifier:
>> p=sdparzen
(tr)
...Parzen pipeline 2x2 2 classes, 100 prototypes (sdp_parzen)
To provide decisions, we need to explicitly add a desired operating point
using sddecide
. We will use a default setting with equal weight for each
class output:
>> pd=sddecide
(p)
sequential pipeline 2x1 'Parzen+Decision'
1 Parzen 2x2 2 classes, 100 prototypes (sdp_parzen)
2 Decision 2x1 weighting, 2 classes, 1 ops at op 1 (sdp_decide)
We visualize the decisions of our two-class discriminant pd
on the test set ts
:
>> sdscatter
(ts,pd)
We may observe that the existing stones (green markers) are assigned into one of the two classes.
>> sdconfmat
(ts.lab,ts*pd)
ans =
True | Decisions
Labels | apple banana | Totals
-------------------------------------
apple | 49 1 | 50
banana | 0 50 | 50
stone | 3 57 | 60
-------------------------------------
Totals | 52 108 | 160
18.3. Adding reject option to the discriminant ↩
Let us now add the reject option to the operating point in pd
using the
sdreject
command. sdreject
will add a threshold on the maximum weighted
output of the discriminant in pd
. The threshold value is selected so,
that a specific percentage of data is rejected.
>> pr=sdreject
(pd,tr)
Weight-based operating point,2 classes,[0.50,0.50]
sequential pipeline 2x1 'Parzen+Decision'
1 Parzen 2x2 2 classes, 100 prototypes (sdp_parzen)
2 Decision 2x1 weight+reject, 3 decisions, ROC 1 ops at op 1 (sdp_decide)
The resulting pipeline pr
returns three decisions:
>> pr.list
sdlist (3 entries)
ind name
1 apple
2 banana
3 reject
As we may see on the training set, 1% of samples is rejected by default:
>> sdconfmat
(tr.lab,tr*pr)
ans =
True | Decisions
Labels | apple banana reject | Totals
--------------------------------------------
apple | 48 1 1 | 50
banana | 2 48 0 | 50
--------------------------------------------
Totals | 50 49 1 | 100
When executed on the test set, our new classifier with reject option pr
rejects the most of the stone samples:
>> sdconfmat
(ts.lab,ts*pr)
ans =
True | Decisions
Labels | apple banana reject | Totals
--------------------------------------------
apple | 46 1 3 | 50
banana | 0 49 1 | 50
stone | 0 9 51 | 60
--------------------------------------------
Totals | 46 59 55 | 160
Finally, we visualize the decisions of the classifier with reject option on the test set:
>> sdscatter
(ts,pr)
18.4. Building reject curve ↩
Instead of fixing the fraction of rejection manually, we may build entire
reject curve relating multiple fractions to performance. This is achieved
using sdroc
command with 'reject'
option.
Similarly, to standard ROC analysis, we first need to estimate soft outputs of our trained model:
>> out=tr*p
'Fruit set' 100 by 2 sddata, 2 classes: 'apple'(50) 'banana'(50)
Now we invoke sdroc
command with 'reject'
option:
>> r=sdroc
(out,'reject')
ROC (1001 wr-based op.points, 3 measures), curop: 1
est: 1:frac(reject)=0.00, 2:TPr(apple)=0.98, 3:TPr(banana)=0.96
By default, fraction of rejection and per-class true positive rates (recalls) are estimated.
To visualize the interactive ROC and scatter plots, use sdscatter
command:
>> sdscatter
(ts,p*r,'roc',r)
Note that we may visualize the test set containing additional stone examples. Moving the mouse over the ROC plot, we may investigate changes to the classifier boundary due to different rejection threshold used.
18.5. What discriminant models can be used for outlier rejection? ↩
Not all statistical models may be used for outlier rejection. Only the models that output probability density or distance can reject outliers. If the discriminant outputs are normalized over a set of classes, the domain information is lost and cannot be recovered. Adding a reject option will result in rejection close to the decision boundary (area of low confidence), not outlier rejection.
As an illustration, we may visualize the decisions of a classifier that is built on top of Parzen with outputs normalized to sum to one (aposteriori probabilities):
>> pm=sdnorm
(p)
sequential pipeline 2x2 'Parzen+Output normalization'
1 Parzen 2x2 2 classes, 100 prototypes (sdp_parzen)
2 Output normalization 2x2 (sdp_norm)
>> pr2=sdreject
(pm,tr)
sequential pipeline 2x1 'Parzen+Output normalization+Decision'
1 Parzen 2x2 2 classes, 100 prototypes (sdp_parzen)
2 Output normalization 2x2 (sdp_norm)
3 Decision 2x1 weight+reject, 3 decisions, 1 ops at op 1 (sdp_decide)
>> sdscatter
(ts,pr2)
PRSD Studio models appropriate for outlier detection: