Skip to Main Content
The speed, ubiquity, and potential anonymity of Internet media - email, Web sites, and Internet forums - make them ideal communication channels for militant groups and terrorist organizations. Analyzing Web content has therefore become increasingly important to the intelligence and security agencies that monitor these groups. Authorship analysis can assist this activity by automatically extracting linguistic features from online messages and evaluating stylistic details for patterns of terrorist communication. However, authorship analysis techniques are rooted in work with literary texts, which differ significantly from online communication. To explore these problems, we modified an existing framework for analyzing online authorship and applied it to Arabic and English Web forum messages associated with known extremist groups. We developed a special multilingual model - the set of algorithms and related features - to identify Arabic messages, gearing this model toward the language's unique characteristics. Furthermore, we incorporated a complex message extraction component to allow the use of a more comprehensive set of features tailored specifically toward online messages. Evaluating the linguistic features of Web messages and comparing them to known writing styles offers the intelligence community a tool for identifying patterns of terrorist communication.