Skip to Main Content
In statistical modeling, there are various techniques used to build models from training data. Quantitative comparison of modeling techniques requires a method for evaluating the quality of the fit between the model probability density function (pdf) and the training data. One graph-based measure that has been used for this purpose is the specificity. We consider the large-numbers limit of the specificity, and derive expressions which show that it can be considered as an estimator of the divergence between the unknown pdf from which the training data was drawn and the model pdf built from the training data. Experiments using artificial data enable us to show that these limiting large-number relations enable us to obtain good quantitative and qualitative predictions of the behavior of the measured specificity, even for small numbers of training examples and in some extreme cases. We demonstrate that specificity can provide a more sensitive measure of difference between various modeling methods than some previous graph-based techniques. Key points are illustrated using real data sets. We thus establish a proper theoretical basis for the previously ad hoc concept of specificity, and obtain useful insights into the application of specificity in the analysis of real data.