Skip to Main Content
This paper proposes a structural classification based correlation and application to principal component analysis (PCA) for high-dimension low-sample size (HDLSS) data. The structural classification based correlation consists of two kinds of correlations; correlation of objects over variables and correlation of classification structures of objects over clusters. Therefore, this correlation can measure not only the similarity of objects but also the similarity of classification structures. We exploit this correlation to PCA whose target data is HDLSS data in which the number of variables is much larger than the number of objects. Since it is known that we cannot obtain correct solutions as the eigen-values of the covariance matrix of variables for HDLSS data and the result of ordinary PCA is based on eigen-values of the covariance matrix of variables, if we apply the ordinary PCA for HDLSS data, we cannot obtain the correct result. In order to solve this problem, we exploit the proposed structural classification based correlation with respect to variables. Since this correlation includes the correlation of classification structures, we can solve this problem and obtain a similarity relationship of objects in a lower dimensional space spanned by the obtained principal components. From several numerical examples, we show the effectiveness of our proposed principal component analysis using the structural classification based correlation for the HDLSS data.