Skip to Main Content
Hidden Markov models (HHMs), while well applied in fields such as speech recognition and optical character recognition, have not been used in post-classification for search engines. We explore the use of HMMs for optimization of search engines tasks, specifically focusing on how to construct a new model structure to improve the classification of web pages. We show that a manually constructed new structure model that contains only two states and two classes of observations per field can produce good classification results, and discuss strategies for learning the model structure automatically from data. We also demonstrate that the use of new structure model to classify the search results using some search engines and some different search keywords provide a significant improvement in search accuracy. Our models are applied to the task of post-classifying the web pages selected by the search engine Google, and achieve a classification accuracy of 93.4.