Abstract:
In outlier hypothesis testing, one aims to detect outlying sequences among a given set of sequences, where most sequences are generated i.i.d. from a nominal distribution...Show MoreMetadata
Abstract:
In outlier hypothesis testing, one aims to detect outlying sequences among a given set of sequences, where most sequences are generated i.i.d. from a nominal distribution while outlying sequences (outliers) are generated i.i.d. from a different anomalous distribution. Most existing studies focus on discrete-valued sequences, where each data sample takes values in a finite set. To account for practical scenarios where data sequences usually take real values and the number of outlying sequence is unknown, we study outlier hypothesis testing for continuous sequences when there might exist multiple outliers, and both the nominal and anomalous distributions are unknown. Specifically, we propose distribution free tests and prove that the probabilities of misclassification error, false reject and false alarm decay exponentially fast for three different test designs: fixed-length test, sequential test, and two-phase test. In a fixed-length test, one fixes the sample size of each observed sequence; in a sequential test, one takes a sample sequentially from each sequence per unit time until a reliable decision can be made; in a two-phase test, one adapts the sample size from two different fixed values. Remarkably, the two-phase test achieves a good balance between test design complexity and theoretical performance.
Published in: IEEE Transactions on Information Theory ( Early Access )