Skip to Main Content
Modern statistical techniques used in the field of natural language processing are limited in their applications by the fact they suffer from the loss of most of the semantic information contained in text documents. Fuzzy techniques have been proposed as a way to correct this problem through the modelling of the relationships between words while accommodating the ambiguities of natural languages. However, these techniques are currently either restricted to modelling the effects of simple words or are specialized in a single domain. In this paper, we propose a novel statistical-fuzzy methodology to represent the actions described in a variety of text documents by modelling the relationships between subject-verb-object triplets. The research will focus in the first place on the technique used to accurately extract the triplets from the text, on the necessary equations to compute the statistics of the subject-verb and verb-object pairs, and on the formulas needed to interpolate the fuzzy membership functions from these statistics and on those needed to de fuzzify the membership value of unseen triplets. Taken together, these sets of equations constitute a comprehensive system that allows the quantification and evaluation of the meaning of text documents, while being general enough to be applied to any domain. In the second phase, this paper will proceed to experimentally demonstrate the validity of our new methodology by applying it to the implementation of a fuzzy classifier conceived especially for this research. This classifier is trained using a section of the Brown Corpus, and its efficiency is tested with a corpus of 20 unseen documents drawn from three different domains. The positive results obtained from these experimental tests confirm the soundness of our new approach and show that it is a promising avenue of research.