Skip to Main Content
An axiomatic approach to the notion of similarity of sequences, that seems to be natural in many cases (e.g. Phylogenetic analysis), is proposed. Despite of the fact that it is not assume that the sequences are a realization of a probabilistic process (e.g. a variable-order Markov process), it is demonstrated that any classifier that fully complies with the proposed similarity axioms must be based on modeling of the training data that is contained in a (long) individual training sequence via a suffix tree with no more than O(N) leaves (or, alternatively, a table with O(N) entries) where N is the length of the test sequence. Some common classification algorithms may be slightly modified to comply with the proposed axiomatic conditions and the resulting organization of the training data, thus yielding a formal justification for their good empirical performance without relying on any a-priori (sometimes unjustified)probabilistic assumption. One such case is discussed in details.