perClass Documentation
version 5.4 (7-Dec-2018)

kb32: Speeding up classifier execution by joining pipeline steps

Published on: 25-may-2015

perClass version used: 4.5 (17-mar-2014)

Problem: How to speed up execution of classifiers using PCA/LDA extraction

Solution: We can join affine and scaling pipeline steps with + operator.

32.1. Introduction ↩

Common pattern in many pattern recognition systems is to use Principal Component Analysis (PCA) to reduce dimensionality and Linear Discriminant analysis (LDA) to derive low-dimensional subspace separating our classes.

Both methods are often used together because LDA alone, applied to high-dimensional data with few examples, may easily over-fit loosing generalization on unseen data.

In this article, we demonstrate a quick way to speedup classifier execution by joining pipeline steps with + operator.

32.2. Example of hyper-spectral classification ↩

As an example, we take classification of defect in French fries. Each data sample represents a pixel of a hyperspectral image with 103 narrow spectral bands. Our data set collects almost 38 thousand examples labeled in five images. We have four classes of interest, namely healthy potato flesh and residual potato skin ("peel"), and two types of defects: rot and greening:

>> tr
37967 by 103 sddata, 4 classes: 'rot'(10066) 'green'(3062) 'peel'(12025) 'flesh'(12814) 

>> tr.image
sdlab with 37967 entries, 5 groups: 
'agria001'(4958) 'agria003'(5501) 'agria004'(9452) 'agria005'(7734) 'agria006'(10322)

We will use a combination of PCA followed by LDA dimensionality reduction. The PCA reduces the dimensionality from input 103 to 20 preserving all available variance:

>> p=sdpca(tr,20)
PCA pipeline              103x20  100% of variance

This is common in hyperspectral data sets because of high lever of correlation between adjacent bands.

Our second feature extraction step employs class information and derives a three-dimensional subspace best separating our classes of interest. Note, that the dimensionality is determined as the number of classes - 1.

>> p2=sdlda(tr*p)
LDA pipeline              20x3

Finally, we train a multi-layer perceptron classifier with 10 units in the resulting subspace:

>> p3=sdneural(tr*p*p2,'units',10)
sequential pipeline       3x1 'Scaling+Neural network'
 1 Scaling                 3x3  standardization
 2 MLP neural network      3x4  10 units
 3 Decision                4x1  weighting, 4 classes

32.3. Execution speed of the entire classifier ↩

The complete chain of processing is expressed by the pipeline pall, first applying PCA, then LDA and finally the neural network:

>> pall=p*p2*p3
sequential pipeline       103x1 'PCA+LDA+Scaling+Neural network'
 1 PCA                   103x20 100%% of variance
 2 LDA                    20x3 
 3 Scaling                 3x3  standardization
 4 MLP neural network      3x4  10 units
 5 Decision                4x1  weighting, 4 classes

We will measure the classification speed on a realistic example, executing on a new hyperspectral cube:

>> load cube3
im3           544x318x103            142545408  double              

>> im3=sdimage(im3,'sddata')
172992 by 103 sddata, class: 'unknown'

As discussed in knowledge base article 31, we measure execution speed with sdexe command timing the tight internal loop, relevant in real-time applications:

>> [out,t]=sdexe(pall,im3);
>> [out,t]=sdexe(pall,im3);
>> t

t =

0.3550

The result is 355 ms needed to process 173 000 spectral measurements.

32.4. Joining pipeline steps ↩

When we look closer at the pipeline pall, we can see that the first two steps are affine projection of input data:

>> pall
sequential pipeline       103x1 'PCA+LDA+Scaling+Neural network'
 1 PCA                   103x20 100%% of variance
 2 LDA                    20x3 
 3 Scaling                 3x3  standardization
 4 MLP neural network      3x4  10 units
 5 Decision                4x1  weighting, 4 classes

We may join them constructing a single affine projection with + operator:

>> pall2=pall(1) + pall(2:end)
sequential pipeline       103x1 'PCA+LDA+Scaling+MLP neural network+Decision'
 1 PCA+LDA               103x3 
 2 Scaling                 3x3  standardization
 3 MLP neural network      3x4  10 units
 4 Decision                4x1  weighting, 4 classes

This naturally limits the amount of multiplication and summation operations needed on each new spectrum. As a result, we observe a significant speedup:

>> [out,t]=sdexe(pall2,im3);
>> [out,t]=sdexe(pall2,im3);
>> t

t =

0.1609

32.5. Supported pipeline steps ↩

perClass supports the + operator for the following pipeline steps:

scaling
affine
neural network (MLP first stage is affine)
multiple feature selection steps

In our example, we may even attempt to go two steps further. We may join PCA, LDA and scaling into one pipeline action:

>> pall3=pall(1)+pall(2)+pall(3:end)
sequential pipeline       103x1 'PCA+LDA+Scaling+MLP neural network+Decision'
 1 PCA+LDA+Scaling       103x3 
 2 MLP neural network      3x4  10 units
 3 Decision                4x1  weighting, 4 classes

>> [out,t]=sdexe(pall3,im3);
>> [out,t]=sdexe(pall3,im3);
>> t

t =

0.1575

Here, we still observe a small speed improvement.

Finally, we may also attempt to join PCA, LDA, scaling and the neural network model together:

>> pall4=pall(1) + pall(2) + pall(3) + pall(4:end)
sequential pipeline       103x1 'PCA+LDA+Scaling+MLP neural network+Decision'
 1 PCA+LDA+Scaling+MLP neural network  103x4  10 units
 2 Decision                4x1  weighting, 4 classes

>> [out,t]=sdexe(pall4,im3);
>> [out,t]=sdexe(pall4,im3);
>> t

t =

0.2990

The last option is not increasing speed as we're executing our neural network model from the input 103-dimensional space instead of the 3-dimensional one, used in the fastest model pall3.

32.6. Validating classifier outputs ↩

As with any speed optimization, it is important to validate that the final classifier outputs are correct. In our example, we are most interested in classifier decisions. Lets us compare the original and the fastest classifier:

We prepare the decision objects by applying the classifiers to the entire image data set:

>> dec1=im3*pall
sdlab with 172992 entries, 4 groups: 'rot'(8392) 'green'(4246) 'peel'(124343) 'flesh'(36011) 

>> dec3=im3*pall3
sdlab with 172992 entries, 4 groups: 'rot'(8392) 'green'(4246) 'peel'(124343) 'flesh'(36011)

The fastest way is to simply compare the decision objects:

>> any(dec1~=dec3)

ans =

 0

We can see, there is no sample in the test image where both classifiers would yield different decisions.

An alternative comparison is to display the confusion matrix:

>> sdconfmat(dec1,dec3)
{Warning: The labels passed as first parameter were created by executing a
classifier. Ground-truth labels should be the first input and decisions the
second.}

ans =

 True                   | Decisions
 Labels                 |     rot   green    peel   flesh  | Totals
--------------------------------------------------------------------
 rot                    |   8392       0       0       0   |   8392
 green                  |      0    4246       0       0   |   4246
 peel                   |      0       0  124343       0   | 124343
 flesh                  |      0       0       0   36011   |  36011
--------------------------------------------------------------------
 Totals                 |   8392    4246  124343   36011   | 172992

Again, we can see that all examples fall on the diagonal, i.e. there are no discrepancies.

Note the warning generated: Typical use of confusion matrix is to compare ground truth to classifier decisions. perClass makes a safety check to avoid that the user provides the sdconfmat inputs other way round. In our example, we may safely ignore this message.

The second form of validation is comparing soft classifier outputs:

% recall that unary minus removes the decision step, if present
>> out1=im3 * -pall;
>> out3=im3 * -pall3;

Soft outpus on first three pixels:

>> +out1(1:3)

ans =

0.1076    0.0963    0.8893    0.0911
0.0710    0.0926    0.8827    0.0944
0.1051    0.1083    0.8845    0.0882

>> +out3(1:3)

ans =

0.1076    0.0963    0.8893    0.0911
0.0710    0.0926    0.8827    0.0944
0.1051    0.1083    0.8845    0.0882

Values look identical, but closer look shows small differences between both methods:

>> +out1(1:3) - +out3(1:3)

ans =

   1.0e-15 *

0.8882   -0.2220         0    0.2220
0.4441         0         0    0.2220
   -0.2220    0.4441   -0.4441         0

Note, that this is inevitable consequence of using imprecise floating point representation (for discussion, see e.g. http://floating-point-gui.de) . Even changing order of summation on identical inputs could lead to small differences.

From the practical viewpoint, we need to always validate that out classifier is providing correct decisions on a representative set of test data.

32.7. Summary ↩

We have seen, how joining pipeline steps with + operator may yield 2.25 times faster classifier on a realistic high-throughput hyperspectral classification problem.