Skip to Main Content
Predictive modelling of multivariate data where both the covariates and responses are high-dimensional is becoming an increasingly popular task in many data mining applications. Partial Least Squares (PLS) regression often turns out to be a useful model in these situations since it performs dimensionality reduction by assuming the existence of a small number of latent factors that may explain the linear dependence between input and output. In practice, the number of latent factors to be retained, which controls the complexity of the model and its predictive ability, has to be carefully selected. Typically this is done by cross validating a performance measure, such as the predictive error. Although cross validation works well in many practical settings, it can be computationally expensive. Various extensions to PLS have also been proposed for regularising the PLS solution and performing simultaneous dimensionality reduction and variable selection, but these come at the expense of additional complexity parameters that also need to be tuned by cross-validation. In this paper we derive a computationally efficient alternative to leave-one-out cross validation (LOOCV), a predicted sum of squares (PRESS) statistic for two-block PLS. We show that the PRESS is nearly identical to LOOCV but has the computational expense of only a single PLS model fit. Examples of the PRESS for selecting the number of latent factors and regularisation parameters are provided.