SDSTACKGEN Stacked generalization set of classifier outputs OUT=SDSTACKGEN(ALG,DATA,...) [OUT,PBASE]=SDSTACKGEN(ALG,DATA,...) INPUT ALG Untrained pipeline DATA Data set OPTIONS FOLDS number of cross-validation folds (opt, default: 10) SEED random seed for cross-validation initialization OUTPUT OUT Data set with unbiased soft outputs PBASE Pipeline of all base classifiers fused by a mean combiner DESCRIPTION SDSTACKGEN performs stacked generalization for a given untrained pipeline or algorithm ALG. Stacked generalization produces a dataset with the same size as DATA containing unbiased soft outputs of ALG. It may be used for construction of a second-stage training data in trained combiners or for ROC variance estimation. SDSTACKGEN is based on a rotation-based stratified cross-validation. In each fold, ALG is trained on the fold training set and its soft outputs derived on the test set are stored. Eventually, all per-fold outputs are collected together in an output set OUT. The order of samples in OUT and DATA is identical. As an optional second output, the SDSTACKGEN returns a pipeline with all trained per-fold classifiers fused by a mean combiner. Using PBASE as a base classifier in the eventual trained combiner system was observed to provide higher robustness and performance. REFERENCE: P.Paclik, T.C.W.Landgrebe, D.M.J.Tax, R.P.W.Duin, On deriving the second-stage training set for trainable combiners in proc. of MCS 2005, Monterey, CA, USA, June 2005