We describe our data collection constructed for the evaluation of automatic filtering of hazardous WWW information. Currently, there are three types of filtering systems: self rating, individual rating and automatic filtering. We propose an ideal system architecture for effective filtering based on the analysis of existing systems. For the development of our filtering system, we have collected a massive amount of hazardous WWW data. We presumed that WWW pages with few words are difficult to filter automatically, but analysis on our data collection has proved that effective automatic filtering can be achieved by applying the hierarchy of HTML data. We have also practically proved this hypothesis by evaluation experiments using an experimental automatic filtering algorithm
Published in:
Internet Workshop, 1999. IWS 99
Date of Conference: 1999