Skip to Main Content
This paper is meant for a heuristic approach for the refinements of corpus based on regular expressions and its possible applications in the field of opinion mining. Corpus which is the plural form of dasiacorporapsila is nothing but the collection of linguistic data. And here the proposed work is based on a corpus of reviews; more specifically product reviews. The reviews are in the HTML files which are easily available in popular review sites like Cnet.com. The revolution in information and technologies has given a new era in the development of language industries. The versatility in technological development, along with the translations available in different languages has lead to use of this corpus for specific machine learning mechanism as well as various automatic translation applications. But the prime objective of researchers as well as the naive users is to give a fast developing technique of machine learning systems that should be both exact and effective. Most of the time it becomes a very tedious job to create exact dataset for the work due to the crisis of accurate corpus regarding respective research work. And that is why; we have proposed an algorithm for creating a corpus for opinion mining research field.