MSR 2020
Mon 5 - Tue 6 October 2020 Yongsan-gu, Seoul, South Korea
co-located with ICSE 2020

The Mining Software Repositories (MSR) conference is the premier conference for data science, machine learning, and artificial intelligence in software engineering. The goal of the conference is to improve software engineering practices by uncovering interesting and actionable information about software systems and projects using the vast amounts of software data such as source control systems, defect tracking systems, code review repositories, archived communications between project personnel, question-and-answer sites, CI build servers, and run-time telemetry. Mining this information can help to understand software development and evolution, software users, and runtime behavior; support the maintenance of software systems; improve software design/reuse; empirically validate novel ideas and techniques; support predictions about software development; and exploit this knowledge in planning future development.

The goal of this two-day international conference is to advance the science and practice of software engineering with data-driven techniques. The 17th International Conference on Mining Software Repositories is co-located with ICSE 2020 in Seoul, South Korea, and will be held on May 25-26, 2020.

The important dates for the Technical Track papers are:

  • Abstract deadline: Thursday January 9, 2020, 23:59 AOE
  • Papers deadline: Thursday January 16, 2020, 23:59 AOE (No deadline extension or grace periods will be provided. Please plan accordingly)
  • Author Response Period: February 18 - 21, 2020
  • Author Notification: Monday March 2, 2020
  • Camera Ready: Monday March 16, 2020, 23:59 AOE

Please see the Call for Papers for all the details.


Accepted Papers

Call for Papers


The technical track of MSR 2020 solicits novel, high quality submissions on a wide range of topics, including (but not limited to):

  • Analysis of software data with the goal of improving software productivity and reliability
  • Analysis and modeling of runtime information to optimize deployment, delivery and error handling in software development processes
  • Analysis of change patterns and trends to assist in future development
  • Analysis of natural language artifacts in software data
  • Analysis of software ecosystems and mining of software data across multiple projects
  • Approaches, applications, and tools for mining software data
  • Artificial intelligence for software engineering
  • Characterization, classification, and prediction of software defects based on analysis of software data
  • Characterization of bias in mining and guidelines to ensure the quality of results
  • Data science for software projects
  • Empirical studies on extracting data from large long-lived and/or industrial projects
  • Machine learning for software engineering
  • Meta-models, exchange formats, and infrastructure tools to facilitate the sharing of extracted data and to encourage reuse and repeatability
  • Methods of integrating mined data from various historical sources
  • Mining code review data
  • Mining execution traces and logs
  • Mining human and social aspects of development
  • Mining interaction data
  • Mining mobile app stores and app reviews
  • Mining software licensing and copyrights
  • Models for social and development processes in large software projects
  • Models of software project evolution based on historical repository data
  • Models and processes for improving the quality of machine learning pipelines
  • Natural language processing in software engineering
  • Prediction and modeling of software quality
  • Privacy and ethics in mining software data
  • Release engineering, including continuous integration, delivery and deployment
  • Search-driven software development, including search techniques to assist developers in finding suitable components and code fragments for reuse, and software search engines
  • Software analytics
  • Software engineering for artificial intelligence and machine learning
  • Energy efficiency of software
  • Studies of programming language features and their usage
  • Techniques and tools for capturing new forms of software data such as effort data, fine-grained changes, and refactoring
  • Techniques to model reliability and defect occurrences
  • Visualization techniques and models of mined data

Types of Technical Track Submissions

We accept both full (10 pages plus 2 additional pages of references) and short (4 pages plus 1 additional page of references) papers. Furthermore, in order to facilitate the reviewing process of your paper’s contribution, you should select one of the following paper categories:

1. Research Paper

Full research papers are expected to describe new methodologies and/or provide novel research results, and should be evaluated scientifically. While a high degree of technical rigor is expected for long papers, short research papers should discuss controversial issues in the field, or describe interesting or thought-provoking ideas that are not yet fully developed. Accepted short papers will be presented in a short lightning talk.

Relevant review criteria:

  • novelty
  • soundness of approach
  • relevance to the conference (+ clarity of relation with related work)
  • quality of presentation
  • quality of evaluation [for long papers]
  • ability to replicate [for long papers]

2. Practice Experience

MSR encourages the submission of papers that report on both positive and negative experiences of applying software analytics strategies in an industry/open source organization context. Adapting existing algorithms or proposing new algorithms or approaches for practical use are considered a plus.

Relevant review criteria:

  • quality of empirical evaluation
  • explicit discussion on the usefulness/impact of the approach in practice
  • explicit discussion of any adaptations required by the application of existing/new approach in practice
  • quality of presentation
  • relevance to the conference (+ clarity of relation with related work)

3. Reusable Tool

MSR actively promotes and recognizes the creation and use of tools that are designed and built not only for a specific research project, but for the MSR community as a whole. Those tools enable other researchers to jumpstart their own research efforts, and also enable reproducibility of earlier work.

Reusable Tool papers can be descriptions of tools built by the authors that can be used by other researchers, and/or descriptions of the use of tools built by others to obtain some specific research results in the area of mining software repositories.

Relevant review criteria:

  • evaluation of usefulness/reusability of the tool [for long papers]
  • novelty
  • quality of presentation (details on tool’s internals, usage, etc.)
  • relevance to the conference (+ clarity of relation with related work)
  • availability of the tool, clear installation instructions and example data set that allow the reviewers to run the tool

Submission Process

All types of technical papers will be peer-reviewed according to the specified review criteria, hence it is required to choose the right type of paper according to the paper’s major contributions. Submissions should follow the ACM Conference Proceedings Formatting Guidelines ( ). LaTeX users must use the provided acmart.cls and ACM-Reference-Format.bst without modification, enable the conference format in the preamble of the document (i.e., \documentclass[sigconf,review]{acmart}), and use the ACM reference format for the bibliography (i.e., \bibliographystyle{ACM-Reference-Format}). The review option adds line numbers, thereby allowing referees to refer to specific lines in their comments.

Papers submitted for consideration should not have been published elsewhere and should not be under review or submitted for review elsewhere for the duration of consideration. ACM plagiarism policies and procedures shall be followed for cases of double submission. The submission must also comply with the IEEE Policy on Authorship. Please read the ACM Policy and Procedures on Plagiarism ( and the IEEE Plagiarism FAQ ( before submitting.

Upon notification of acceptance, all authors of accepted papers will be asked to complete a copyright form and will receive further instructions for preparing their camera ready versions. At least one author of each paper is expected to register and present the results at the MSR 2020 conference. All accepted contributions will be published in the conference electronic proceedings.

A selection of the best papers will be invited to an EMSE Special Issue. The authors of accepted papers that show outstanding contributions to the FOSS community will have a chance to self-nominate their paper for the MSR FOSS Impact Paper Award.

IMPORTANT: MSR 2020 follows the double-blind submission model. Submissions should not reveal the identity of the authors in any way. This means that authors should:

  • leave out author names and affiliations from the body and metadata of the submitted pdf
  • ensure that any citations to related work by themselves are written in the third person, for example “the prior work of XYZ” as opposed to “our prior work [2]”
  • not refer to their personal, lab or university website; similarly, care should be taken with personal accounts on github, bitbucket, Google Drive, etc.
  • not upload unblinded versions of their paper on archival websites during bidding/reviewing, however uploading unblinded versions prior to submission is allowed and sometimes unavoidable (e.g., thesis)
  • not to advertise their submission number or paper topic on social media accounts. Please be careful about posting your paper number, a description of your submitted paper, or any other information that may make it easy for reviewers to identify your submission.

Please note that double-blind submission should not be an excuse for hiding replication packages or data sets from reviewers, since that effectively hinders the peer-review process. Since access to data and scripts is essential during peer review, we strongly recommend to archive data sets on online archival sites such as, or (Instructions available in Open Science Policy below). The latter two even allow to receive a DOI and hence become citable.

Submission Link

Technical papers must be submitted through EasyChair:

Open Science Policy

Openness in science is key to fostering progress via transparency, reproducibility and replicability. Our steering principle is that all research output should be accessible to the public and that empirical studies should be reproducible. In particular, we actively support the adoption of open data and open source principles. The following guidelines are recommendations and not mandatory. Your choice to use open science or not will not affect the review process for your paper. However, to increase reproducibility and replicability, we encourage all contributing authors to disclose:

  • the source code of relevant software used or proposed in the paper, including that used to retrieve and analyze data
  • the data used in the paper (e.g., evaluation data, anonymized survey data etc)
  • instructions for other researchers describing how to reproduce or replicate the results

Already upon submission, authors can privately share their anonymized data and software on preserved archives such as Zenodo or Figshare (tutorial available hereplease make sure to any links shared during peer review are anonymized). Zenodo accepts up to 50GB per dataset (more upon request). There is no need to use Dropbox or Google Drive. Once accepted, an option can be toggled to publish the data and scripts with an official DOI. Zenodo and Figshare accounts can easily be linked with GitHub repositories to automatically archive software releases. In the unlikely case that authors need to upload terabytes of data, may be used.

After acceptance, we encourage authors to self-archive pre-prints of their papers in open, preserved repositories such as This is legal and allowed by all major publishers including ACM and IEEE and it lets anybody in the world reach your paper. Note that you are usually not allowed to self-archive the PDF of the published article (that is, the publisher proof or the Digital Library version). Instead, use the manuscript with reviewer comments addressed, but before applying the camera-ready instructions and templates. Feel free to contact the MSR 2020 PC or proceedings chairs for more details.

Please note that the success of the open science initiative depends on the willingness (and possibilities) of authors to disclose their data and that all submissions will undergo the same review process independent of whether or not they disclose their analysis code or data. We encourage authors who cannot disclose industrial or otherwise non-public data, for instance due to non-disclosure agreements, to provide an explicit (short) statement in the paper.


  • Abstract Deadline: Thursday January 9, 2020, 23:59 AOE
  • Papers Deadline: Thursday January 16, 2020, 23:59 AOE (No deadline extension or grace periods will be provided. Please plan accordingly)
  • Author Response Period: February 18 - 21, 2020
  • Author Notification: March 2, 2020
  • Camera Ready: Monday March 16, 2020, 23:59 AOE

  • From Innovations to Prospects: What Is Hidden Behind Cryptocurrencies?
  • Traceability Support for Multi-Lingual Software Projects
  • An Empirical Study of Method Chaining in Java
  • What is Software? An Empirical, Descriptive Study of Artifacts
  • PUMiner: Mining Security Posts from Developer Question and Answer Websites with PU Learning
  • SoftMon: A Tool to Compare Similar Open-source Software from a Performance Perspective
  • Embedding Java Classes with code2vec: Improvements from Variable Obfuscation
  • Can We Use SE-specific Sentiment Analysis Tools in a Cross-Platform Setting?
  • Using Large-Scale Anomaly Detection on Code to Improve Kotlin Compiler
  • AIMMX: Artificial Intelligence Model Metadata Extractor
  • Improved Automatic Summarization of Subroutines via Attention to File Context
  • Forking Without Clicking: on How to Identify Software Repository Forks
  • Visualization of Methods Changeability Based on VCS Data
  • Painting Flowers: Reasons for Using Single-State State Machines in Model-Driven Engineering
  • Investigating Severity Thresholds for Test Smells
  • Detecting Video Game-Specific Bad Smells in Unity Projects
  • A Study of Potential Code Borrowing and License Violations in Java Projects on GitHub
  • On the Relationship between User Churn and Software Issues
  • Developer-Driven Code Smell Prioritization
  • RTPTorrent: An Open-source Dataset for Evaluating Regression Test Prioritization
  • Beyond the Code: Mining Self-Admitted Technical Debt in Issue Tracker Systems
  • Empirical Study of Restarted and Flaky Builds on Travis CI
  • A Machine Learning Approach for Vulnerability Curation
  • Ethical Mining – A Case Study on MSR Mining Challenges
  • Capture the Feature Flag: Detecting Feature Flags in Open-Source
  • An Empirical Study on Regular Expression Bugs
  • The Impact of a Major Security Event on an Open Source Project: The Case of OpenSSL
  • Need for tweet. How open-source developers use Twitter to talk about their GitHub work
  • On the Prevalence, Impact, and Evolution of SQLcode smells in Data-Intensive Systems
  • A Study on the Accuracy of OCR Engines for Source Code Transcription from Programming Screencasts
  • Automatically Granted Permissions in Android apps
  • A Soft Alignment Model for Bug Deduplication
  • Did You Remember To Test Your Tokens?
  • Challenges in Chatbot Development: A Study of Stack Overflow Posts
  • The State of the ML-universe: 10 Years of Artificial Intelligence & Machine Learning Software Development on GitHub
  • A Large-Scale Comparative Evaluation of IR-Based Tools for Bug Localization
  • Behind the Intents: An In-depth Empirical Study on Software Refactoring in Modern Code Review
  • Using Others’ Tests to Avoid Breaking Updates
  • Characterizing and Identifying Composite Refactorings: Concepts, Heuristics and Patterns
  • Detecting and Characterizing Bots that Commit Code
  • The Scent of Deep Learning Code: An Empirical Study
  • Boa Views: Easy Modularization and Sharing of MSR Analyses
  • Polyglot and Distributed Software Repository Mining with BARRAGE
  • What is the Vocabulary of Flaky Tests?
  • A Tale of Docker Build Failures: A Preliminary Study