As a result of Internet's anonymity, openness and freedom, there have been two kinds of security problems existing extensively in the online virtual space. They are user's ID theft and article counterfeit that may cause rampant cyber crime and online deception. Therefore, application of authorship or article identification techniques across online message content is increasingly necessary. In this paper, we proposed a feature-based identification method which can automatically identify the authenticity of online authorship or article according to their unique writing style. This technique is a hypothesis testing based method that creates a function words feature set specially tailored towards Chinese online messages. To evaluate the effectiveness of this automatic recognition method, we conducted several experiments on datasets extracted from a Chinese forum web, and statistically analyzed the content of registered uses' post and reply messages. Our experimental results showed that this method has an excellent identification performance.
Published in:
Computational Science and Engineering (CSE), 2011 IEEE 14th International Conference on
Date of Conference: 24-26 Aug. 2011