Skip to Main Content
Large amount of electronic clinical data encompass important information in free text format. To be able to help guide medical decision-making, text needs to be efficiently processed and coded. In this research, we investigate techniques to improve classification of Emergency Department computed topography (CT) reports. The proposed system uses Natural Language Processing (NLP) to generate structured output from patient reports and then applies machine learning techniques to code for the presence of clinically important injuries for traumatic orbital fracture victims. Topic modeling of the corpora is also utilized as an alternative representation of the patient reports. Our results show that both NLP and topic modeling improve raw text classification results. Within NLP features, filtering the codes using modifiers produces the best performance. Topic modeling, on the other hand, shows mixed results. Topic vectors provide good dimensionality reduction and get comparable classification results as with NLP features. However, binary topic classification fails to improve upon raw text classification.