Keywords: image data, PCA, detectors, classifier execution
24.1. Introduction ↩
In this example, we discuss how to separate one of the categories from all others. Say, we wish to distinguish digit 3 from others.
We will use the handwritten digit recognition example from our previous example.
We split the data set with 10 classes of 16x16 digit images into training and test subsets...
>> a
'Digits' 2000 by 256 sddata, 10 classes: [200 200 200 200 200 200 200 200 200 200]
>> [tr,ts]=randsubset
(a,0.5)
'Digits' 1000 by 256 sddata, 10 classes: [100 100 100 100 100 100 100 100 100 100]
'Digits' 1000 by 256 sddata, 10 classes: [100 100 100 100 100 100 100 100 100 100]
... and project the training set into 40D subspace with PCA:
>> p=sdpca
(tr,40)
PCA pipeline 256x40 73% of variance (sdp_affine)
>> tr2=tr*p
'Digits' 1000 by 40 sddata, 10 classes: [100 100 100 100 100 100 100 100 100 100]
24.2. Building simple classifier for digit 3 ↩
As we are interested in digit 3 versus others, we will relabel our training
set using sdrelab
:
>> tr3=sdrelab
(tr2,{'~3','others'})
1: 0 -> others
2: 1 -> others
3: 2 -> others
4: 3 -> 3
5: 4 -> others
6: 5 -> others
7: 6 -> others
8: 7 -> others
9: 8 -> others
10: 9 -> others
'Digits' 1000 by 40 sddata, 2 classes: '3'(100) 'others'(900)
We instruct sdrelab
to turn all classes that "are not 3" (~3
) into a a
class called "others".
Actually, the easiest thing to do is use the simple linear discriminant as in the previous example:
>> p2=sdfisher
(tr3)
sequential pipeline 40x2 'Fisher linear discriminant'
1 LDA 40x1 (sdp_affine)
2 Gauss eq.cov. 1x2 2 classes, 2 components (sdp_normal)
3 Output normalization 2x2 (sdp_norm)
We will compose the complete pipeline from 256D pixel space to decisions:
>> pall=sddecide
(p*p2)
sequential pipeline 256x1 'PCA+LDA+Gauss eq.cov.+Output normalization+Decision'
1 PCA 256x40 73%% of variance (sdp_affine)
2 LDA 40x1 (sdp_affine)
3 Gauss eq.cov. 1x2 2 classes, 2 components (sdp_normal)
4 Output normalization 2x2 (sdp_norm)
5 Decision 2x1 weighting, 2 classes, 1 ops at op 1 (sdp_decide)
We can quickly test it on our unseen test set:
>> sdconfmat
(ts.lab,ts*pall)
ans =
True | Decisions
Labels | 3 others | Totals
---------------------------------------
0 | 0 100 | 100
1 | 2 98 | 100
2 | 1 99 | 100
3 | 82 18 | 100
4 | 0 100 | 100
5 | 13 87 | 100
6 | 1 99 | 100
7 | 0 100 | 100
8 | 4 96 | 100
9 | 0 100 | 100
---------------------------------------
Totals | 103 897 | 1000
It looks quite good, maybe we need to put more effort to the digit 5 which is sometimes labeled as digit 3 but overall looks fine.
24.3. Testing classifier out of Matlab ↩
Let us now test this simple solution in our live digit-drawing demo:
>> pall=setname
(pall,'PCA+Fisher discriminant')
sequential pipeline 256x1 'PCA+Fisher discriminant'
1 PCA 256x40 73%% of variance (sdp_affine)
2 LDA 40x1 (sdp_affine)
3 Gauss eq.cov. 1x2 2 classes, 2 components (sdp_normal)
4 Output normalization 2x2 (sdp_norm)
5 Decision 2x1 weighting, 2 classes, 1 ops at op 1 (sdp_decide)
>> sdexport
(pall,'3det_1.ppl')
Exporting pipeline..ok
This pipeline requires perClass runtime version 3.0 (18-jun-2011) or higher.
We move to our DigitsDemo.exe
out of Matlab, load the 3det_1.ppl
pipeline and draw digits. It actually works quite well, recognizing digit 3 well...
... and also rejecting other digits...
... but sometimes failing with digit 5:
However, one interesting thing, we may notice with this classifier, is that it also accepts strange non-digit characters as digit 3:
Why is that? Why doesn't our classifier say this strange scribble is "others"?
24.4. Why we cannot discard strange non-digits? ↩
Actually, we just observed a very common issue with many classification systems. We trained our classifier on digit images but now present it with a clear non-digit. But you cannot provide training examples for any non-digit character out there, can you?
We cannot, but we may build a smarter classifier. Let us visualize our Fisher discriminant on a simpler 3 versus 8 problem in the 2D feature space. We will see a line separating the entire space into two subspaces:
Note, that any observation on the left side will be labeled as "digit 3"! There is no guarantee that strange non-digits, alphabetical characters or outliers will not project far away but on the left side of digit 3 cluster.
What we hoped for was that our 3 versus others classifier will protect all 3's from other concepts. But that, as we can see, does not really happen with our Fisher classifier. Similarly, number of frequently used machine learning methods such as support vector machines, neural networks or decision trees are technically discriminants, not detectors.
What we need is a true detector: i.e. a model that describes a feature space domain enclosing it from all directions. Here is an example of a detector in 2D space:
24.5. Building a true detector ↩
Building a detector is a two step process. First, we need to train a
suitable model only on our target class. Second, we need to adjust a
threshold on the model output deciding what is accepted and what rejected.
the sddetector
function in perClass lets us do this in one command.
We provide a training set, the name of the target class and a model:
>> pd=sddetector
(tr2,'3',sdgauss
)
1: 0 -> non-3
2: 1 -> non-3
3: 2 -> non-3
4: 3 -> 3
5: 4 -> non-3
6: 5 -> non-3
7: 6 -> non-3
8: 7 -> non-3
9: 8 -> non-3
10: 9 -> non-3
sequential pipeline 40x1 'Gaussian model+Decision'
1 Gaussian model 40x1 one class, 1 component (sdp_normal)
2 Decision 1x1 thresholding ROC on 3 at op 159 (sdp_decide)
Note that all classes, other than '3', form a non-target class by default called 'non-3'. We can put together a complete classifier starting from 256D pixel space:
>> pall2=p*pd
sequential pipeline 256x1 'PCA+Gaussian model+Decision'
1 PCA 256x40 73%% of variance (sdp_affine)
2 Gaussian model 40x1 one class, 1 component (sdp_normal)
3 Decision 1x1 thresholding ROC on 3 at op 159 (sdp_decide)
As you can see, we do not add sddecide
as the detector already included a
decision step in the pipeline.
Executing the detector on our test set shows a good performance:
>> sdconfmat
(ts.lab,ts*pall2)
ans =
True | Decisions
Labels | 3 non-3 | Totals
---------------------------------------
0 | 2 98 | 100
1 | 0 100 | 100
2 | 3 97 | 100
3 | 82 18 | 100
4 | 1 99 | 100
5 | 9 91 | 100
6 | 0 100 | 100
7 | 0 100 | 100
8 | 8 92 | 100
9 | 4 96 | 100
---------------------------------------
Totals | 109 891 | 1000
Exporting it for execution in the digit demo:
>> pall2=setname
(pall2,'PCA+Gaussian detector');
>> sdexport
(pall2,'/Users/pavel/shared/digits/DigitsDemo/3det_2.ppl')
Exporting pipeline..ok
This pipeline requires perClass runtime version 3.0 (18-jun-2011) or higher.
Our detector can distinguish genuine digit 3 as the Fisher discriminant did:
But it can also reject unseen distorted examples and outliers better than discriminants:
24.6. Tuning the detector ↩
An extra advantage of a detector is that it may be easily tuned not-to
loose examples of a specific class. In order to adjust the detector
threshold, sddetector
estimates internally ROC characteristic which
relates different thresholds to performance. This curve is still present in
the pipeline and so we may adjust the performance anytime.
We can visualize the available detector settings (so called operating points) in their relation to performance using:
>> sddrawroc
(pall2)
The plot shows error on each of the two classes in out problem. Each blue dot represents one possible threshold. The red marker shows what point was selected by default (minimizing mean error rate).
We may change the operating point selecting, for example, one that
minimizes false acceptance of other-that-digit-3 images. In order to save
the new setting back to our classifier, we press 's' (for save) and fill-in
the classifier name (pall2
in our example)
>> Setting the operating point 183 in sdppl object pall2
sequential pipeline 256x1 'PCA+Gaussian model+Decision'
1 PCA 256x40 73%% of variance (sdp_affine)
2 Gaussian model 40x1 one class, 1 component (sdp_normal)
3 Decision 1x1 thresholding ROC on 3 at op 183 (sdp_decide)
The digit detector may be now applied to new data. It will accept less 3's but also minimize false positives.
>> sdconfmat
(ts.lab,ts*pall2)
ans =
True | Decisions
Labels | 3 non-3 | Totals
---------------------------------------
0 | 0 100 | 100
1 | 0 100 | 100
2 | 1 99 | 100
3 | 71 29 | 100
4 | 1 99 | 100
5 | 3 97 | 100
6 | 0 100 | 100
7 | 0 100 | 100
8 | 0 100 | 100
9 | 2 98 | 100
---------------------------------------
Totals | 78 922 | 1000
This concludes our example.