Skip to Main Content
Classification-based approaches for data analysis are provoking wide interest and increasing adoption within the neuroscience community. Topics like "brain decoding", "multi-voxel pattern analysis" and "brain-computer interface" are prominent examples of this trend. The core problem of these investigations is hypothesis testing, i.e., finding evidence of some effect produced by the stimulation protocol within neural correlates. A classification algorithm is trained on the recorded data to learn how to discriminate between different stimuli. Then the misclassification rate of the predictions is estimated to answer the statistical test. This generic classification problem can be implemented in several ways depending on the exact neuroscientific question under investigation. However some implementations produce biased estimates due to circular analysis issues that could invalidate the conclusion of the scientific study. Therefore the most suited implementation of the classification problem must be used in order to avoid biases, to detect weak stimulus-related information within noise and to give the proper answer to the neuroscientific question at hand. In this work we propose different implementations of the classification-based approach in the case it comprises a variable selection step together with a classification step. For each different implementation we investigate the associated bias. Analyses are conducted on synthetic data and MEG data from a covert spatial attention task. The effects of different implementations of the classification algorithm are quantified by means of expected misclassification rate. Results prove the importance of adopting a proper error rate estimation process.
Note: As originally published there was an error in this document. The text in this paper is not in proper order. Specifically, "The algorithm related to "Process B," p.2, Section II (next to last??paragraph): the content of second line??must be moved so as to become the fourth line.