perClass Documentation version 5.4 (7-Dec-2018)

Glossary of pattern recognition terms

• Bayes classifier - by definition the classifier delivering the smallest possible error for a given classification problem and data representation. The concept of a Bayes classifier is important for the sake of theoretical understanding but does not appear in practice where we cannot (for non-trivial cases) decide if a given solution is optimal. Note that Bayes classifier is always defined with respect to a data representation - different feature sets may lead to different Bayes classifiers/errors.

• Cost-sensitive optimization - process of selecting an operating point of a trained classifier given a labeled test set by minimizing the classifier loss function defined by a cost specification

• Classifier with constraints - specification of a acceptable performance region for a classifier defining a subset of operating points. For example, the medical recognition system may require error on cancer class to be smaller than 5%.

• Class - a collection of data samples defined by a human (designer or user) as sharing a specific property.

• Classifier - an algorithm trained from a set of labeled data samples (each sample is associated with a class). When presented with an unseen example, the classifier returns a crisp decision (the estimated class). Internally, the classifiers are usually defined in two steps. The first is a statistical model producing a soft output (confidence that a sample belongs to a given class) and the second is a decision function converting the soft output into a crisp decision.

• Cluster - a group of data samples defined by automatic procedure based on selected notion of sample similarity. Clusters do not necessarily bear any meaning. The meaning is attached to it by a human designer (not by an un-supervised algorithm).

• Confusion matrix - a matrix relating the class labels in a test set to the classifier decisions. Usually a square matrix, with identical ordering of classes and decisions so that its diagonal represents correctly assigned samples and off-diagonal elements the errors. Rectangular confusion matrices oftem arise in practice, for example, in a detector/discriminant setup where is an additional outlier reject decision delivered.

• Deployment - execution of a trained pattern recogntion system in a production environment i.e. on previously unseen examples (e.g. in industrial sorting process).

• Detector - a classifier distinguishing one specific "target" class from everything else (e.g. face detector). Detector should protect the target class even from previously unseen types of objects not represented in the training set. Detection is performed by thresholding soft output on the target class.

• Discriminant - a classifier separating two or more classes of objects. Discriminants split the feature space into open sub-spaces which means that any data sample arbitrarily far from the original training data (e.g. an outlier or example of a new class) is labeled into one of the trained classes. In order to reject examples unseen in training, discriminant may be coupled with a detector.

• Feature - a characteristic measured on all objects or phenomena to be processed by the pattern recognition system. Individual features are considered as dimensions of a feature space containing feature vectors (data samples)

• Hierarchical classifier - a tree of classifiers specifying the execution logic based on classifier decisions. Example: Samples labeled as fruit by a fruit / non-fruit classifier are further processed by a _apple/pear_ discriminant. Hierarchical classifiers may allow the designer to improve acuracy or execution speed. The accuracy may be improved by decomposing complex classification problems into simpler sub-problems that can be tackled using simpler models. Speed can be increased by handling easy samples first using cheap features and fast classifiers. Expensive features and complex models are applied only when needed.

• One-class classifier - synonym for a detector often used in the context of imbalanced (one class contains much less samples than other(s)) or badly sampled problems (class is not well represented in training data i.e. machine fault). Note that although construction of a detector usig the target data only is possible by setting its threshold to reject fraction of training objects, the evaluation of such system is impossible if non-target examples are not available.

• Operating point - all information needed to convert a soft output into a crisp decision. This includes threshold or per-class weights, the knowledge of output polarity (similarity or distance), the symbols for all decisions (target and non-target) or correct order of classes

• Meta-data - extra information connected to a data sample providing details about its relation to other samples. For example, all pixels (samples) extracted from a patient scan share the patient id as meta-data. This allows the designer to construct a test set containing pixels originating from patients unused in training.

• Rejection - instead of providing a decision for every data sample, the classifier may reject some example. There are two types of rejection. The first one refusing to label the samples too distant from the all training data (outliers). The second scheme rejects the examples too close to the decision boundary (having high probability of being misclassified). Such samples may be handled, for example, by a different machine or by human expert.

• ROC analysis - process of defining possible operating points for a trained classifier based on a labeled test set. At each of the operating points, a set of performance measures is computed

• Soft output - the real-value output of a classifier estimating the confidence that a data sample belongs to each of the classes

• System design - construction of a pattern recognition system capable of performing decisions on new, previously unseen, data. System design process provides algorithm prototype and estimate of its performance.