Skip to Main Content
It is useful to recognize disease named phrases from medical literatures and clinic records for extracting medical information. Since manually annotated domain data is rarely available, it is restricted to apply machine learning approaches in the work. We propose a method based on maximum entropy model for recognizing the phrase, in which domain knowledge is integrated into the statistical method to improve the classifier. For introducing various features into maximum entropy model, the phrases are analyzed from the point of view of language. N-grams method is used and head-word is drawn out as a feature. Analyzing the lexical spectrum of the phrases structure, we utilize the clue that some words can be mapped to human anatomy concepts and related words. In another aspect, when the phrases are expressed, there are some word collocations even if they could not appear in direct co-occurrence relations. They form trigger-pairs, which can be used as features. With a hierarchical framework, anatomy taxonomy is used as a kind of priori lexical feature. Utilizing the information from attributes of anatomy objects and the information around the right boundaries of the phrases, trigger-pair mechanisms are integrated to maximum entropy model. The experiment shows that the methods can provide a new line to recognize disease named phrase.