By Topic

CLASS: a general approach to classifying categorical sequences

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Kelil, A. ; Dept. of Comput. Sci., Univ. de Sherbrooke, Sherbrooke, QC, Canada ; Nordell-Markovits, A. ; Zaralahy, P.O.Y. ; Shengrui Wang

The rapid burgeoning of available data in the form of categorical sequences, such as biological sequences, natural language texts, network and retail transactions, makes the classification of categorical sequences increasingly important. The main challenge is to identify significant features hidden behind the chronological and structural dependencies characterizing their intrinsic properties. Almost all existing algorithms designed to perform this task are based on the matching of patterns in chronological order, but categorical sequences often have similar features in non-chronological order. In addition, these algorithms have serious difficulties in outperforming domain-specific algorithms. In this paper we propose CLASS, a general approach for the classification of categorical sequences. By using an effective matching scheme called SPM for Significant Patterns Matching, CLASS is able to capture the intrinsic properties of categorical sequences. Furthermore, the use of Latent Semantic Analysis allows capturing semantic relations using global information extracted from large number of sequences, rather than comparing merely pairs of sequences. Moreover, CLASS employs a classifier called SNN for Significant Nearest Neighbours, inspired from the K Nearest Neighbours approach with a dynamic estimation of K, which allows the reduction of both false positives and false negatives in the classification. The extensive tests performed on a range of datasets from different fields show that CLASS is oftentimes competitive with domain-specific approaches.

Published in:

Electrical and Computer Engineering, Canadian Journal of  (Volume:34 ,  Issue: 4 )