By Topic

Feature selection based file type identification algorithm

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Ding Cao ; Zhengzhou Information Science and technology Institute, Henan, 450002, China ; Junyong Luo ; Meijuan Yin ; Huijie Yang

Identifying the true type of an arbitrary file is very important in information security. Methods based on file extensions or magic numbers can be easily spoofed, while a more reliable way is based on analyzing the file's binary content. We propose an algorithm to generate models for each file type based on analyzing the binary contents of a set of known input files by using n-gram analysis and design a novel feature selection evaluation function for extracting signatures from the models, then using the signatures to recognize the true type of unknown files. Our aim is not to use the structure and key words of any specific file types as this allows the approach to be applied to general file types. Experiments show that the proposed approach is promising especially when the feature selection evaluation function is applied.

Published in:

Intelligent Computing and Intelligent Systems (ICIS), 2010 IEEE International Conference on  (Volume:3 )

Date of Conference:

29-31 Oct. 2010