MSR 2020
Mon 29 - Tue 30 June 2020
co-located with ICSE 2020
Tue 30 Jun 2020 14:00 - 14:12 at MSR:Zoom - ML4SE Chair(s): Kevin Moran

Central to software composition analysis is a database of vulnerabilities of open-source libraries. Security researchers curate this database from various data sources, including bug tracking systems, commits, and mailing lists. In this article, we report the design and implementation of a machine learning system to help the curation by automatically predicting the vulnerability-relatedness of each data item. It supports a complete pipeline from data collection, model training and prediction, to the validation of new models before deployment. It is executed iteratively to generate better models as new input data become available. It is enhanced by self-training to significantly and automatically increase the size of the training dataset, opportunistically maximizing the improvement in the models’ quality at each iteration. We devised new “deployment stability” metric to evaluate the quality of the new models before deployment into production. We experimentally evaluate the improvement in the performance of the models in one iteration, with 27.59% maximum PR AUC improvements. Ours is the first of such study across a variety of data sources. We discover that the addition of the features of the corresponding commits to the features of issues/pull requests improve the precision for the recall values that matter. We demonstrate the effectiveness of self-training alone, with 10.50% PR AUC improvement, and we discover that there is no uniform ordering of word2vec parameters sensitivity across data sources. We show how the deployment stability metric helped to discover an error.

Tue 30 Jun
Times are displayed in time zone: (UTC) Coordinated Universal Time change

msr-2020-papers
14:00 - 15:00: Technical Papers - ML4SE at MSR:Zoom
Chair(s): Kevin MoranGeorge Mason University

Q/A & Discussion of Session Papers over Zoom (Joining info available on Slack)

msr-2020-papers14:00 - 14:12
Live Q&A
Chen YangVeracode, Inc., Andrew SantosaVeracode, Inc., Ang Ming Yi, Abhishek Sharma Singapore Management University, Singapore, Asankhaya SharmaVeracode, Inc., David LoSingapore Management University
Pre-print Media Attached
msr-2020-papers14:12 - 14:24
Live Q&A
Rhys ComptonUniversity of Waikato, Eibe FrankDepartment of Computer Science, University of Waikato, Panos Patros, Abigail KoayUniversity of Waikato
DOI Pre-print Media Attached
msr-2020-papers14:24 - 14:36
Live Q&A
Abdulkarim KhormiFlorida State University, USA - Jazan University, KSA, Mohammad AlahmadiFlorida State University, Sonia HaiducFlorida State University
Pre-print Media Attached
msr-2020-papers14:36 - 14:48
Live Q&A
Gustavo PintoUFPA, Breno MirandaFederal University of Pernambuco, Supun DissanayakeThe University of Adelaide, Marcelo d'AmorimFederal University of Pernambuco, Christoph TreudeThe University of Adelaide, Antonia BertolinoCNR-ISTI
Pre-print Media Attached
msr-2020-papers14:48 - 15:00
Live Q&A
Sakib HaqueUniversity of Notre Dame, Alexander LeClairUniversity Of Notre Dame, Lingfei WuIBM Research, Collin McMillanUniversity of Notre Dame
Pre-print Media Attached