Skip to Main Content
Research and development of new information assurance techniques and technologies is ongoing and varied. Each new proposal and technique arrives with great promise and anticipated success as research teams struggle to develop new and innovative responses to emerging threats. Unfortunately, these techniques frequently fall short of expectation when deployed due to difficulties with false alarms, trouble operating in a non-idealized or new domain, or flexibility limiting assumptions which are only valid with specific input sets. We believe these failures are due to fundamental problems with the experimental method for evaluating the effectiveness of new ideas and techniques. This work explores the effect of a poorly understood data synthesis process on the evaluation of IA devices. The point of an evaluation is to independently determine what a detector can and cannot detect, i.e. the metric of detection. This can only be done when the data contains carefully controlled ground truth. We broadly define the term “similarity class” to facilitate discussion about the different ways data (and more specifically test data) can be similar, and use these ideas to illustrate the pre-requisites for correct evaluation of anomaly detectors. We focus on how anomaly detectors function and should be evaluated in 2 specific domains with disparate system architectures and data: a sensor and data transport network for air frame tracking and display, and a deep space mission spacecraft command link. Finally, we present empirical evidence illustrating the effectiveness of our approach in these domains, and introduce the entropy of a time series sensor as a critical measure of data similarity for test data in these domains.