Conferences >2023 38th IEEE/ACM Internatio...

When Less is Enough: Positive and Unlabeled Learning Model for Vulnerability Detection

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Automated code vulnerability detection has gained increasing attention in recent years. The deep learning (DL)-based methods, which implicitly learn vulnerable code patte...Show More

Metadata

Abstract:

Automated code vulnerability detection has gained increasing attention in recent years. The deep learning (DL)-based methods, which implicitly learn vulnerable code patterns, have proven effective in vulnerability detection. The performance of DL-based methods usually relies on the quantity and quality of labeled data. However, the current labeled data are generally automatically collected, such as crawled from human-generated commits, making it hard to ensure the quality of the labels. Prior studies have demonstrated that the non-vulnerable code (i.e., negative labels) tends to be unreliable in commonly-used datasets, while vulnerable code (i.e., positive labels) is more determined. Considering the large numbers of unlabeled data in practice, it is necessary and worth exploring to leverage the positive data and large numbers of unlabeled data for more accurate vulnerability detection. In this paper, we focus on the Positive and Unlabeled (PU) learning problem for vulnerability detection and propose a novel model named PILOT, i.e., Positive and unlabeled Learning mOdel for vulnerability deTection. PILOT only learns from positive and unlabeled data for vulnerability detection. It mainly contains two modules: (1) A distance-aware label selection module, aiming at generating pseudo-labels for selected unlabeled data, which involves the inter-class distance prototype and progressive fine-tuning; (2) A mixed-supervision representation learning module to further alleviate the influence of noise and enhance the discrimination of representations. Extensive experiments in vulnerability detection are conducted to evaluate the effectiveness of PILOT based on real-world vulnerability datasets. The experimental results show that PILOT outperforms the popular weakly supervised methods by 2.78%-18.93% in the PU learning setting. Compared with the state-of-the-art methods, PILOT also improves the performance of 1.34%-12.46 % in F1 score metrics in the supervised setting. In additio...

Published in: 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)

Date of Conference: 11-15 September 2023

Date Added to IEEE Xplore: 08 November 2023

ISBN Information:

ISSN Information:

DOI: 10.1109/ASE56229.2023.00144

Conference Location: Luxembourg, Luxembourg

Contents

References is not available for this document.

When Less is Enough: Positive and Unlabeled Learning Model for Vulnerability Detection

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

When Less is Enough: Positive and Unlabeled Learning Model for Vulnerability Detection

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?