Abstract:
Applications are one of the most used attack surfaces, and they must be secured at source code level, early in the development phase. Developers' inherited culture of pro...Show MoreMetadata
Abstract:
Applications are one of the most used attack surfaces, and they must be secured at source code level, early in the development phase. Developers' inherited culture of programming preserves the patterns of code writing within big organizations or developers' communities, opening an opportunity to use SAST (Static Application Security Testing) complementary solutions to identify insecure code early in the development phase. We propose an Intermediate Representation, strict enough to maintain the security vulnerabilities patterns as defined by MITRE with the Common Weaknesses Enumeration, at the same time agile enough to not strongly depend on the lexical and syntax structure of the programming language, but following programmers' behavior of writing code. The current research phase uses semantical clustering of instructions (keywords) found in C/C++ programs, based on Word Embeddings, which are transported via the resulting (numerical) Intermediate Representation to the various classifiers for security vulnerability patterns detection. We show that there is a good preservation of security patterns despite the generalization of keywords via semantical clustering. This opens an opportunity for innovation in security vulnerability patterns identification, which is more dependent on the programmers' code writing behavior than the programming language specific structure.
Date of Conference: 26-28 May 2021
Date Added to IEEE Xplore: 26 July 2021
ISBN Information: