Detecting Cryptography Misuses With Machine Learning: Graph Embeddings, Transfer Learning and Data Augmentation in Source Code Related Tasks | IEEE Journals & Magazine | IEEE Xplore

Detecting Cryptography Misuses With Machine Learning: Graph Embeddings, Transfer Learning and Data Augmentation in Source Code Related Tasks


Abstract:

Cryptography is a ubiquitous tool in secure software development in order to guarantee security requirements in general. However, software developers have scarce knowledg...Show More

Abstract:

Cryptography is a ubiquitous tool in secure software development in order to guarantee security requirements in general. However, software developers have scarce knowledge about cryptography and rely on limited support tools that cannot properly detect bad uses of cryptography, thus generating vulnerabilities in software. In this work, we extend the scarcely use of machine learning to detect cryptography misuse in source code by using a state of the art deep learning model (i.e., code2vec) through transfer learning to generate features that feed machine learning models. In addition, we compare this approach to previous ones in different types of binary models. Also, we adapt code obfuscation to serve as data augmentation in machine learning source code related tasks. Finally, we show that through transfer learning code2vec can be a competitive feature generator for cryptography misuse detection and simple code obfuscation can be used to generate data to enhance machine learning models training in source code related tasks.
Published in: IEEE Transactions on Reliability ( Volume: 72, Issue: 4, December 2023)
Page(s): 1678 - 1689
Date of Publication: 07 February 2023

ISSN Information:

Funding Agency:


I. Introduction

Software services are consumed daily by people in different areas of activity. In each of these areas, security is an aspect that underlies services in order to protect users' data in general. On modern software, in general, cryptography is the tool used to achieve data protection. It provides requirements such as confidentiality, integrity, and authenticity, that cannot be achieved without the use of cryptography primitives in software development. However, the use of cryptography is not a simple task. Most software developers have limited knowledge of cryptography and do not use it properly. Also, most of the cryptographic application programming interfaces (APIs) have poor usability, with a documentation that is difficult to understand and to use [1], [2]. To address this problem, software development companies rely on software development supporting tools in order to detect incorrect use of cryptography. These tools, though, are limited and can only detect at most one-third of incorrect uses of cryptography (or, cryptography misuses) when including complex misuse cases [3] with little difference in performance when including only simple cases [4], [5]. With all this, these misuses persist in source code and, often, vulnerabilities are introduced. For example, it is estimated that most of android applications have at least one cryptography misuse in their source code [6]. These vulnerabilities can be exploited and bring damage to software companies and users as well. That said, there is an urgent need to improve cryptography misuse detection tools, as they perform an important role supporting secure software development. This will result in a reduction in security breach incidents as there will be fewer vulnerabilities to exploit. Finally, with robust cryptography misuse detection tools, software developers that do not have proper knowledge of cryptography will still be able to write secure software without the need for an expert all the time.

Contact IEEE to Subscribe

References

References is not available for this document.