Skip to Main Content
Statistical shape modelling is a technique whereby the variation of shape across the population is modelled by principal component analysis (PCA) on a set of sample shape vectors. The number of principal modes retained in the model (PCA dimension) is often determined by simple rules, for example choosing those cover a percentage of total variance. We show that this rule is highly dependent on sample size. The principal modes retained should ideally correspond to genuine anatomical variation. In this paper, we propose a mathematical framework for analysing the source of PCA model error. The optimum PCA dimension is a pay-off between discarding structural variation (under-modelling) and including noise (over-modelling). We then propose a stopping rule that identifies the noise dominated modes using a t-test of the bootstrap stability between the real data and artificial Gaussian noise. We retain those modes that are not dominated by noise. We show that our method determines the correct PCA dimension for synthetic data, where conventional rules fail. Comparison between our rule and conventional rules are also performed on a series of real datasets. We provide a foundation for analysing rules that are used to determine the number of modes to retain and also allows the study of PCA sample sufficiency.