When designing a robust classification algorithm, we need to understand its performance and robustness in detail. In perClass Mira, we can use the Confusion matrix tool to understand and fine-tune pixel classification performance. 


In this section, we use the potato virus data set to illustrate Confusion matrix tool and its use in performance understanding and fine-tuning. When a pixel classifier is trained, the confusion matrix on training set is always available in the Training set tab of the Confusion matrix panel.


The confusion matrix shows detailed report on classifier performance. In rows, it provides information on all labeled examples in the training set (the ground-truth). In the columns, it captures the classifier decisions on these examples.


Ideally, all labeled examples are allocated to the same categories, then the confusion matrix would show only diagonal elements (displayed in green). In practice, some examples are misclassified, these show up off-diagonal and are displayed in red.


By default, the confusion matrix is normalized by each row. This means that the entries represent accuracies on-diagonal and error rates off-diagonal. The sum of the off-diagonal errors i.e. the class error is displayed in the right-most column , this reflects what percentage of the ground truth pixels is misclassified.


Similarly, the per-class purity is displayed in the bottom row , this denotes the fraction of each decision that is actually classified correctly. This allows us to quickly judge if a specific classifier decision is trustworthy.


For example, when our classifier provides decisions on "leaves", it is correct in 95% of cases. However, when classifying the "virus" it is correct only on 73% of labeled pixels in our training set.


Finally, the right-bottom corner provides one summary performance indicator: the mean error over classes - the average of per-class error rates.


The value of the confusion matrix is in providing detailed understanding of classifier behavior. While 2.8% mean error does not seem too high, the confusion matrix allows us to learn quickly that 7% of healthy leaves is being misclassified as a virus (they are false positives).



Instead of a normalized matrix, we may wish to display the absolute pixel counts in each field. This is possible by disabling the normalization in the context menu:



The not-normalized confusion matrix may highlight that some of our classes are under-sampled. For example, while our "virus" class contains only 165 labeled pixels, the "background" contains almost four thousand.