Most of clustering methods assume that an attribute value of an object has a single value. However, in many fields, an attribute value for an object may be a set or a bag of values, such as the result set of a database query, which can be looked on as a set of attributes, whose values also can be a set or a bag of data. So the clustering problems of queries can be expressed as intersection problems of sets whose element also can be a set or a bag. The paper gives a method to compute similarity among queries and presents a cluster method based on it. The algorithm reads each query q in sequence, either assigning q to an existing cluster or creating q as a new cluster. At last, the application of the algorithm in database intrusion detection is shown and experiment results on synthetic and real data set are reported.
Published in:
Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
(Volume:4
)
Date of Conference: 18-21 Aug. 2005