Skip to Main Content
We have an m times n matrix D, and assume that its entries correspond to pair wise dissimilarities between m row objects Or and n column objects Oc, which, taken together (as a union), comprise a set O of N = m + n objects. This paper develops a new visual approach that applies to four different cluster assessment problems associated with O. The problems are the assessment of cluster tendency: PI) amongst the row objects Or; P2) amongst the column objects Oc; P3) amongst the union of the row and column objects Or U Oc; and P4) amongst the union of the row and column objects that contain at least one object of each type (co-clusters). The basis of the method is to regard D as a subset of known values that is part of a larger, unknown N times N dissimilarity matrix, and then impute the missing values from D. This results in estimates for three square matrices (Dr, Dc, DrUc) that can be visually assessed for clustering tendency using the previous VAT or sVAT algorithms. The output from assessment of DrUc ultimately leads to a rectangular coVAT image which exhibits clustering tendencies in D. Five examples are given to illustrate the new method. Two important points: i) because VAT is scalable by sVAT to data sets of arbitrary size, and because coVAT depends explicitly (and only) on VAT, this new approach is immediately scalable to, say, the scoVAT model, which works for even very large (unloadable) data sets without alteration; and ii) VAT, sVAT and coVAT are autonomous, parameter free models - no "hidden values" are needed to make them work.
Date of Publication: Oct. 2007