The Data Showcase provides a forum to share and discuss important data sets that underpin the work of the Mining Software Repositories community.
Mon 29 JunDisplayed time zone: (UTC) Coordinated Universal Time change
11:00 - 12:00 | Build, CI, & DependenciesTechnical Papers / Registered Reports / Keynote / MSR Awards / FOSS Award / Education / Data Showcase / Mining Challenge / MSR Challenge Proposals / Ask Me Anything at MSR:Zoom Chair(s): Raula Gaikovina Kula NAIST Q/A & Discussion of Session Papers over Zoom (Joining info available on Slack) | ||
11:00 12mLive Q&A | A Tale of Docker Build Failures: A Preliminary StudyMSR - Technical Paper Technical Papers Yiwen Wu National University of Defense Technology, Yang Zhang National University of Defense Technology, China, Tao Wang National University of Defense Technology, Huaimin Wang Pre-print Media Attached | ||
11:12 12mLive Q&A | Using Others' Tests to Avoid Breaking UpdatesMSR - Technical Paper Technical Papers Suhaib Mujahid Concordia University, Rabe Abdalkareem Concordia University, Montreal, Canada, Emad Shihab Concordia University, Shane McIntosh McGill University Pre-print Media Attached | ||
11:24 12mLive Q&A | A Dataset of DockerfilesMSR - Data Showcase Data Showcase A: Jordan Henkel University of Wisconsin–Madison, A: Christian Bird Microsoft Research, A: Shuvendu K. Lahiri Microsoft Research, A: Thomas Reps University of Wisconsin-Madison, USA Media Attached | ||
11:36 12mLive Q&A | Empirical Study of Restarted and Flaky Builds on Travis CIMSR - Technical Paper Technical Papers Thomas Durieux KTH Royal Institute of Technology, Sweden, Claire Le Goues Carnegie Mellon University, Michael Hilton Carnegie Mellon University, USA, Rui Abreu Instituto Superior Técnico, U. Lisboa & INESC-ID DOI Pre-print Media Attached | ||
11:48 12mLive Q&A | LogChunks: A Data Set for Build Log AnalysisMSR - Data Showcase Data Showcase A: Carolin Brandt Delft University of Technology, A: Annibale Panichella Delft University of Technology, A: Andy Zaidman TU Delft, A: Moritz Beller Facebook, USA Pre-print Media Attached |
Tue 30 JunDisplayed time zone: (UTC) Coordinated Universal Time change
11:00 - 12:00 | SecurityData Showcase / Technical Papers at MSR:Zoom2 Chair(s): Dimitris Mitropoulos Athens University of Economics and Business Q/A & Discussion of Session Papers over Zoom (Joining info available on Slack) | ||
11:00 12mLive Q&A | Did You Remember To Test Your Tokens?MSR - Technical Paper Technical Papers Danielle Gonzalez Rochester Institute of Technology, USA, Michael Rath Technische Universität Ilmenau, Mehdi Mirakhorli Rochester Institute of Technology DOI Pre-print Media Attached | ||
11:12 12mLive Q&A | Automatically Granted Permissions in Android appsMSR - Technical Paper Technical Papers Paolo Calciati IMDEA Software Institute, Konstantin Kuznetsov Saarland University, CISPA, Alessandra Gorla IMDEA Software Institute, Andreas Zeller CISPA Helmholtz Center for Information Security Media Attached | ||
11:24 12mLive Q&A | PUMiner: Mining Security Posts from Developer Question and Answer Websites with PU LearningMSR - Technical Paper Technical Papers Triet Le The University of Adelaide, David Hin , Roland Croft , Muhammad Ali Babar The University of Adelaide DOI Pre-print Media Attached | ||
11:36 12mLive Q&A | A C/C++ Code Vulnerability Dataset with Code Changes and CVE SummariesMSR - Data Showcase Data Showcase A: Jiahao Fan New Jersey Institute of Technology, USA, A: Yi Li New Jersey Institute of Technology, USA, A: Shaohua Wang New Jersey Institute of Technology, USA, A: Tien N. Nguyen University of Texas at Dallas Media Attached | ||
11:48 12mLive Q&A | The Impact of a Major Security Event on an Open Source Project: The Case of OpenSSLMSR - Technical Paper Technical Papers James Walden Northern Kentucky University Pre-print Media Attached |
Accepted Papers
Call for Papers
Data Showcase papers should describe data sets that are curated by their authors and made available to use by others. Ideally, these data sets should be of value to others in the community, should be preprocessed or filtered in some way, and should provide an easy-to-understand schema. Data showcase papers are expected to include:
- a description of the data source,
- a description of the methodology used to gather the data (including provenance and the tool used to create/generate/gather the data, if any),
- a description of the storage mechanism, including a schema if applicable,
- if the data has been used by the authors or others, a description of how this was done including references to previously published papers,
- a description of the originality of the data set (that is, even if the data set has been used in a published paper, its complete description must be unpublished),
- ideas for future research questions that could be answered using the data set,
- ideas for further improvements that could be made to the data set, and
- any limitations and/or challenges in creating or using the data set.
The data set should be made available at the time of submission of the paper for review, but will be considered confidential until publication of the paper. At the latest upon publication of the paper the authors should archive the data on a persistent repository that can provide a digital object identifier (DOI) such as zenodo.org, figshare.com, Archive.org, or institutional repositories. In this way the data will become citable; the DOI-based citation of the data set should be included in the camera-ready version of the paper.
Data showcase papers are not:
- empirical studies
- tool demos
- or data sets that are
- based on poorly explained or untrustworthy heuristics for data collection, or
- result of trivial application of generic tools.
If custom tools have been used to create the data set, we expect the paper to be accompanied by the source code of the tools, along with clear documentation on how to run the tools to recreate the data set. The tools should be open source, accompanied by an appropriate license; the source code should be citable, i.e., refer to a specific release and have a DOI. GItHub provides an easy way to make source code citable. If you cannot provide the source code or the source code clause is not applicable (e.g., because the data set consists of qualitative data), please provide a short explanation of why this is not possible.
Submission
Submit your data paper (maximum 4 pages, plus 1 additional page of references) to EasyChair on or before February 6th, 2020 (abstract due January 30th).
Submitted papers will undergo single-blind peer review. We opt for single-blind peer review (as opposed to the double-blind peer review of the main track) due to the requirement above to describe the ways how data has been used in the previous studies, including the bibliographic reference to those studies. Such reference is likely to disclose the authors’ identity.
To make research data sets and research software accessible and citable, we further encourage authors to attend to the FAIR rules, i.e., data should be: Findable, Accessible, Interoperable, and Reusable.
The submission must conform to the ACM Conference Proceedings Formatting Guidelines (https://www.acm.org/publications/proceedings-template). LaTeX users must use the provided acmart.cls
and ACM-Reference-Format.bst
without modification, enable the conference format in the preamble of the document (i.e., \documentclass[sigconf,review]{acmart}
), and use the ACM reference format for the bibliography (i.e., \bibliographystyle{ACM-Reference-Format}
). The review option adds line numbers, thereby allowing referees to refer to specific lines in their comments.
Papers submitted for consideration should not have been published elsewhere and should not be under review or submitted for review elsewhere for the duration of consideration. ACM plagiarism policies and procedures shall be followed for cases of double submission. The submission must also comply with the IEEE Policy on Authorship. Please read the ACM Policy and Procedures on Plagiarism (https://www.acm.org/publications/policies/plagiarism) and the IEEE Plagiarism FAQ (https://www.ieee.org/publications/rights/plagiarism/plagiarism-faq.html) before submitting.
To submit please use the EasyChair link.
Upon notification of acceptance, all authors of accepted papers will be asked to complete a copyright form and will receive further instructions for preparing their camera ready versions. At least one author of each paper is expected to register and present the results at the MSR 2020 conference. All accepted contributions will be published in the conference electronic proceedings.
A selection of the best papers will be invited to EMSE Special Issue.
Important Dates
Abstracts Due: January 30, 2020, 23:59 AOE
Papers Due: February 6, 2020, 23:59 AOE
Author Notification: March 2, 2020
Camera Ready: March 16, 2020, 23:59 AOE
Organization
Olga Baysal, Carleton University, Canada
Bogdan Vasilescu, Carnegie Mellon University, USA