Skip to Main Content
Guo and Nixon proposed a feature selection method based on maximizing I( x;Y), the multidimensional mutual information between feature vector x and class variable Y. Because computing I(x;Y) can be difficult in practice, Guo and Nixon proposed an approximation of I(x;Y) as the criterion for feature selection. We show that Guo and Nixon's criterion originates from approximating the joint probability distributions in I(x;Y) by second-order product distributions. We remark on the limitations of the approximation and discuss computationally attractive alternatives to compute I(x;Y) .