**1**Author(s)

# Classification With Finite Memory Revisited

- Already Purchased? View Article
- Subscription Options Learn More

We consider the class of strong-mixing probability laws with positive transitions that are defined on doubly infinite sequences in a finite alphabet A. A device called the classifier (or discriminator) observes a training sequence whose probability law Q is unknown. The classifier's task is to consider a second probability law P and decide whether P = Q, or P and Q are sufficiently different according to some appropriate criterion Delta(Q,P) > Delta. If the classifier has available an infinite amount of training data, this is a simple matter. However, here we study the case where the amount of training data is limited to N letters. We define a function N_{Delta}(Q|P), which quantifies the minimum length sequence needed to distinguish Q and P and the class M(N_{Delta}) of all probability laws pairs (Q,P) that satisfy N_{Delta}(Q|P) les N_{Delta} for some given positive number N_{Delta}. It is shown that every pair Q,P of probability laws that are sufficiently different according to the Delta criterion is contained in M(N_{Delta}). We demonstrate that for any universal classifier there exists some Q for which the classification probability lambda(Q) = 1 for some N-sequence emerging from Q, for some P : (Q,P) epsi M circ(N_{Delta}).Delta(Q,P) > Delta, if N < N_{Delta}. Conversely, we introduce a classification algorithm that is essentially optimal in the sense that for every (Q,P) epsi M(N_{Delta}), the probability of classification error lambda(Q) is uniformly vanishing with N for every P : (Q,P) epsi M circ(N_{Delta}) if N ges N_{Delta} ^{1+O(log} ^{log} ^{N} ^{Delta} ^{/log} ^{N} ^{Delta} ^{)}. The proposed algorithm finds the largest empirical conditional divergence for a set of contexts which appear in the tested N-sequence. The computational complexity of the classification algorithm is *O*(N^{3}). Also, we introduce a second simplified context classification algorithm with a computational complexity of only *O*(N(log N)^{4}) that is efficient in the sense that for *every* *pair* (Q,P) epsi M(N_{Delta}), the *pairwise* probability of classification error lambda(Q,P) for the pair Q,P vanishes with N if N ges N_{Delta} ^{1+O(log} ^{log} ^{N} ^{Delta} ^{/log} ^{N} ^{Delta} ^{)}. Conversely, lambda(Q,P) = 1 at least for some (Q,P) epsi M(N_{Delta}), if N < N_{Delta}.

- Page(s):
- 4413 - 4421
- ISSN :
- 0018-9448
- INSPEC Accession Number:
- 10215689
- DOI:
- 10.1109/TIT.2007.909181

- Date of Current Version :
- 17 December 2007
- Issue Date :
- Dec. 2007
- Sponsored by :
- IEEE Information Theory Society
- Publisher:
- IEEE