Skip to Main Content
This paper presents a technical description of a solution for International Conference on Data Mining 2012 Contest - Consumer Products number 1. The Contest provided a dataset including thousands of text items, a product catalog with over fifteen million products, and hundreds of manually annotated product mentions to support data-driven approaches. The task was to identify product mentions within a large user-generated web-based textual corpus and disambiguate the mentions against the large product catalog. The solution consists of an ensemble-based algorithm for processing a textual content. It uses Conditional Random Fields and a special approach which recognizes product mentions. This solution finished in the third place in the contest.