Skip to Main Content
This paper defines and shows the merit of measures for quantifying the degree of relatedness of information of interest and the importance of new information found within a large number of free text documents. These measures are used for identifying and sorting free text documents that are found to contain related information of interest and, in some cases, new information of interest related to a reference document. The relatedness measures consider the semantic content (e.g., people, vehicles, events, organizations, objects, and locations with their descriptive attributes) as well as the semantic context between semantic content items and key entities such as events and temporal items. Additional links to related sub-graphs between a reference graph and a comparison graph identify augmented knowledge over the known semantic text. Graph structures are generated initially from syntactic links and ontological class hierarchies, and augmented by inferred links resulting from triggered DL-Safe rules and abductive hypotheses. Inferred context broadens the potential for detecting related information. The approach is tested on a large set of free text emails between law enforcement detectives seeking leads for solving cases but the research has broad applicability to other domains such as intelligence collection, investigative reporting, and media monitoring.