A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries
MSR - Data Showcase
In this paper, we collected a C/C++ code vulnerability dataset from open-source projects. We crawled the public Common Vulnerabilities and Exposures (CVE) database and CVE-related code repositories. From the CVE database, we collected the descriptive information of the vulnerabilities, e.g., CVE IDs, CVE severity scores, and CVE summaries. With the CVE information and its related published Git code repository links, we downloaded all of the code repositories and extract vulnerability related code changes. In total, our dataset contains 3754 code vulnerabilities spanning 91 different vulnerability types. All these code vulnerabilities are extracted from 348 Git projects. All this information has been stored in CSV format with clear structure. The code changes and CVE descriptive information were mapped to each other so that the dataset could be used for many research areas, e.g., vulnerability detection and vulnerability fixing patches identification.