Skip to Main Content
We propose a set of (machine learning) ML-based scoring measures for conducting feature selection. We've tested these measures on documents from two well-known corpora, comparing them with other measures previously applied for this purpose. In particular, we've analyzed which measure obtains the best overall classification performance in terms of properties such as precision and recall, emphasizing to what extent some statistical properties of the corpus affects performance. The results show that some of our measures outperform the traditional measures in certain situations.