Loading [MathJax]/extensions/TeX/ieeemacros.js
A Novel Framework of Identifying Chinese Jargons for Telegram Underground Markets | IEEE Conference Publication | IEEE Xplore

A Novel Framework of Identifying Chinese Jargons for Telegram Underground Markets


Abstract:

As one of the most popular instant messaging (IM) software, Telegram has reached 500 million monthly active users (MAU) up to January 2021. Nevertheless, the characterist...Show More
Notes: This article was mistakenly omitted from the original submission to IEEE Xplore. It is now included as part of the conference record.

Abstract:

As one of the most popular instant messaging (IM) software, Telegram has reached 500 million monthly active users (MAU) up to January 2021. Nevertheless, the characteristics of safety and openness have also made it a popular platform for transactions in underground markets. Moreover, cybercriminals usually use jargons instead of sensitive terms when they communicate in Telegram groups. Nevertheless, jargons identification relies on time-consuming and lagging manual work currently. To solve this problem, this paper proposes a Chinese Jargons Identification Framework (CJI-Framework) to identify jargons automatically. Firstly, we collect chat history from targeted Telegram groups to build a corpus called TUMCC, which is the first Chinese corpus in jargons identification field. Secondly, we extract seven brand-new features which can be classified into three categories: Vectors-based Features (VF), Lexical analysis-based Features (LF), and Dictionary analysis-based Features (DF), to distinguish between Chinese jargons and commonly-used words. Furthermore, we use a word vectors projection method and a transfer learning method to improve the quality of word vectors generated from the corpora. In our experiments, the CJI-Framework reaches a remarkable jargons identification performance with an F1-score of 89.66%. This work provides a method of identifying Chinese jargons for Telegram underground markets effectively and will be helpful for cybercrime investigation. It can also be helpful to jargons identification related to other similar communication platforms and languages.
Notes: This article was mistakenly omitted from the original submission to IEEE Xplore. It is now included as part of the conference record.
Date of Conference: 19-22 July 2021
Date Added to IEEE Xplore: 22 November 2021
ISBN Information:

ISSN Information:

Conference Location: Athens, Greece

Funding Agency:


References

References is not available for this document.