A Bayesian Hierarchical Model for Comparing Average F1 Scores | IEEE Conference Publication | IEEE Xplore