MSR 2020
Mon 29 - Tue 30 June 2020
co-located with ICSE 2020

The International Conference on Mining Software Repositories (MSR) has hosted a mining challenge since 2006. With this challenge, we call upon everyone interested to apply their tools to a common dataset. The challenge is for researchers and practitioners to bravely use their mining tools and approaches on a dare.

Dates
Tracks
You're viewing the program in a time zone which is different from your device's time zone - change time zone

Mon 29 Jun
Times are displayed in time zone: (UTC) Coordinated Universal Time change

msr-2020-papers
10:30 - 11:00: Technical Papers - Programming Languages & Models at MSR:Zoom
Chair(s): Dimitris KolovosUniversity of York

Q/A & Discussion of Session Papers over Zoom (Joining info available on Slack)

msr-2020-Registered-Reports10:30 - 10:36
Live Q&A
A: Jürgen CitoMIT, A: Jiasi ShenMassachusetts Institute of Technology, A: Martin RinardMIT
Pre-print Media Attached
msr-2020-papers10:36 - 10:42
Live Q&A
Jason TsayIBM Research, Alan BrazIBM Research, Martin HirzelIBM Research, Avraham ShinnarIBM Research, Todd Mummert
Pre-print Media Attached
msr-2020-papers10:42 - 10:48
Live Q&A
Timofey BryksinJetBrains Research, Saint Petersburg State University, Victor PetukhovJetBrains, ITMO University, Ilya Alexin, Stanislav Prikhodko, Alexey Shpilman, Vladimir KovalenkoTU Delft, Nikita PovarovJetBrains
Pre-print Media Attached
msr-2020-papers10:48 - 10:54
Live Q&A
Tomoki NakamaruGraduate School of Information Science and Technology, The University of Tokyo, Tomomasa Matsunaga, Tetsuro YamazakiGraduate School of Information Science and Technology, The University of Tokyo, Soramichi AkiyamaDepartment of Creative Informatics, The University of Tokyo, Shigeru ChibaThe University of Tokyo
Pre-print Media Attached
msr-2020-papers10:54 - 11:00
Live Q&A
Nan YangEindhoven University of Technology, The Netherlands, Pieter Cuijpers, Ramon SchiffelersEindhoven University of Technology and ASML, the Netherlands, Johan Lukkien, Alexander SerebrenikEindhoven University of Technology
Media Attached
msr-2020-papers
10:30 - 11:00: Technical Papers - Refactoring & Testing at MSR:Zoom2
Chair(s): Mauricio AnicheDelft University of Technology, Netherlands

Q/A & Discussion of Session Papers over Zoom (Joining info available on Slack)

msr-2020-papers10:30 - 10:37
Live Q&A
Leonardo Da Silva SousaCarnegie Mellon University, USA, Diego CedrimPontifical Catholic University of Rio de Janeiro, Alessandro GarciaPUC-Rio, Willian OizumiPUC-Rio, Ana Carla BibianoPUC-Rio, Daniel OliveiraPUC-Rio, Miryung KimUniversity of California, Los Angeles, Anderson OliveiraPUC-Rio
Pre-print Media Attached
msr-2020-papers10:37 - 10:45
Live Q&A
Matheus PaixaoUniversity of Fortaleza, Anderson UchôaPontifical Catholic University of Rio de Janeiro (PUC-Rio), Ana Carla BibianoPUC-Rio, Daniel OliveiraPUC-Rio, Alessandro GarciaPUC-Rio, Jens KrinkeUniversity College London, Emilio Arvonio
Pre-print Media Attached
msr-2020-Data-showcase10:45 - 10:52
Live Q&A
Federico Corò, A: Roberto VerdecchiaVrije Universiteit Amsterdam, A: Emilio Cruciani, A: Breno MirandaFederal University of Pernambuco, A: Antonia BertolinoCNR-ISTI
Pre-print Media Attached
msr-2020-Data-showcase10:52 - 11:00
Live Q&A
A: András Kicsi, A: László VidácsUniversity of Szeged, Hungary, A: Tibor Gyimothy
Pre-print Media Attached
msr-2020-papers
11:00 - 12:00: Technical Papers - Build, CI, & Dependencies at MSR:Zoom
Chair(s): Raula Gaikovina KulaNAIST

Q/A & Discussion of Session Papers over Zoom (Joining info available on Slack)

msr-2020-papers11:00 - 11:12
Live Q&A
Yiwen WuNational University of Defense Technology, Yang ZhangNational University of Defense Technology, China, Tao WangNational University of Defense Technology, Huaimin Wang
Pre-print Media Attached
msr-2020-papers11:12 - 11:24
Live Q&A
Suhaib MujahidConcordia University, Rabe AbdalkareemConcordia University, Montreal, Canada, Emad ShihabConcordia University, Shane McIntoshMcGill University
Pre-print Media Attached
msr-2020-Data-showcase11:24 - 11:36
Live Q&A
A: Jordan HenkelUniversity of Wisconsin–Madison, A: Christian Bird, A: Shuvendu K. LahiriMicrosoft Research, A: Thomas RepsUniversity of Wisconsin-Madison, USA
Media Attached
msr-2020-papers11:36 - 11:48
Live Q&A
Thomas DurieuxKTH Royal Institute of Technology, Sweden, Claire Le GouesCarnegie Mellon University, Michael HiltonCarnegie Mellon University, USA, Rui AbreuInstituto Superior Técnico, U. Lisboa & INESC-ID
DOI Pre-print Media Attached
msr-2020-Data-showcase11:48 - 12:00
Live Q&A
A: Carolin BrandtDelft University of Technology, A: Annibale PanichellaDelft University of Technology, A: Andy ZaidmanTU Delft, A: Moritz BellerFacebook, USA
Pre-print Media Attached
msr-2020-papers
12:00 - 13:00: Technical Papers - Code Smells at MSR:Zoom
Chair(s): Alessandro GarciaPUC-Rio

Q/A & Discussion of Session Papers over Zoom (Joining info available on Slack)

msr-2020-papers12:00 - 12:10
Live Q&A
Pre-print Media Attached
msr-2020-papers12:10 - 12:20
Live Q&A
Davide SpadiniDelft University of Technology, Netherlands, Martin Schvarcbacher, Ana Maria Oprescu, Magiel BruntinkSoftware Improvement Group, Alberto BacchelliUniversity of Zurich
DOI Pre-print Media Attached
msr-2020-papers12:20 - 12:30
Live Q&A
Biruk Asmare Muse, Masud RahmanDalhousie University, Csaba NagySoftware Institute - USI, Lugano, Anthony CleveUniversity of Namur, Foutse KhomhPolytechnique Montréal, Giuliano AntoniolPolytechnique Montréal
Pre-print Media Attached
msr-2020-Registered-Reports12:30 - 12:40
Live Q&A
A: Mouna Abidi, A: Moses Openja, A: Foutse KhomhPolytechnique Montréal
Pre-print Media Attached
msr-2020-papers12:40 - 12:50
Live Q&A
Hadhemi Jebnoun, Masud RahmanDalhousie University, Foutse KhomhPolytechnique Montréal, Houssem Ben Braiek
Pre-print Media Attached
msr-2020-papers12:50 - 13:00
Live Q&A
Fabiano PecorelliUniversity of Salerno, Fabio PalombaUniversity of Salerno, Foutse KhomhPolytechnique Montréal, Andrea De LuciaUniversity of Salerno
Pre-print Media Attached
msr-2020-mining-challenge
12:00 - 13:00: Mining Challenge - MSR Mining Challenge at MSR:Zoom2
Chair(s): Antoine PietriInria, Stefano ZacchiroliUniversité de Paris and Inria, Diomidis SpinellisAthens University of Economics and Business

Q/A & Discussion of Session Papers over Zoom (Joining info available on Slack)

msr-2020-mining-challenge12:00 - 12:20
Live Q&A
Pre-print Media Attached
msr-2020-mining-challenge12:20 - 12:40
Live Q&A
A: Avijit BhattacharjeeUniversity of Saskatchewan, Canada, A: Sristy Sumana Nath, A: Shurui ZhouCarnegie Mellon University, USA / University of Toronto, CA, A: Debasish Chakroborti, A: Banani RoyUniversity of Saskatchewan, A: Chanchal K. RoyUniversity of Saskatchewan, A: Kevin SchneiderUniversity of Saskatchewan
DOI Pre-print Media Attached
msr-2020-mining-challenge12:40 - 13:00
Live Q&A
Gabor Antal, Márton Keleti, A: Peter HegedusUniversity of Szeged
Pre-print Media Attached
msr-2020-msr-plenary
13:00 - 13:15: MSR Plenary - "Opening" & Awards at MSR:Zoom
Chair(s): Sunghun KimHong Kong University of Science and Technology, Georgios GousiosDelft University of Technology, Sarah NadiUniversity of Alberta

Live on YouTube: https://www.youtube.com/watch?v=Qvf7mHa-YYs

msr-2020-msr-plenary13:00 - 13:15
Day opening
Sunghun KimHong Kong University of Science and Technology, Sarah NadiUniversity of Alberta, Georgios GousiosDelft University of Technology
Media Attached
msr-2020-papers
14:30 - 15:30: Technical Papers - Bugs & Issues at MSR:Zoom
Chair(s): Francisco ServantVirginia Tech

Q/A & Discussion of Session Papers over Zoom (Joining info available on Slack)

msr-2020-Data-showcase14:30 - 14:40
Live Q&A
A: Cristiano PolitowskiConcordia University, Canada, A: Fabio PetrilloUniversity of Quebec at Chicoutimi, A: Yann-Gaël GuéhéneucConcordia University and Polytechnique Montréal, A: Gabriel Cavalheiro UllmannUNIJUI - Universidade Regional do Noroeste do Estado do Rio Grande do Sul, A: Josias De Andrade Werly
Media Attached
msr-2020-papers14:40 - 14:50
Live Q&A
Omar El Zarif, Daniel Alencar Da CostaUniversity of Otago, Safwat HassanQueens University, Kingston, Canada, Ying ZouQueen's University, Kingston, Ontario
Pre-print Media Attached
msr-2020-papers14:50 - 15:00
Live Q&A
Pre-print Media Attached
msr-2020-papers15:00 - 15:10
Live Q&A
Media Attached
msr-2020-Data-showcase15:10 - 15:20
Live Q&A
A: Rafael-Michael KarampatsisThe University of Edinburgh, A: Charles SuttonGoogle Research
Pre-print Media Attached
msr-2020-Registered-Reports15:20 - 15:30
Live Q&A
A: Steffen HerboldUniversity of Göttingen, A: Alexander TrautschUniversity of Göttingen, A: Benjamin Ledel
Pre-print Media Attached
msr-2020-papers
16:30 - 17:30: Technical Papers - Github & OSS Datasets at MSR:Zoom
Chair(s): Olga BaysalCarleton University

Q/A & Discussion of Session Papers over Zoom (Joining info available on Slack)

msr-2020-Data-showcase16:30 - 16:38
Live Q&A
A: Xunhui ZhangNational University of Defense Technology, China, A: Ayushi RastogiPostdoctoral researcher at TU Delft, A: Yue YuCollege of Computer, National University of Defense Technology, Changsha 410073, China
Pre-print Media Attached
msr-2020-Data-showcase16:38 - 16:47
Live Q&A
A: Usman Ashraf, A: Christoph Mayr-DornJohannes Kepler University Linz, A: Alexander EgyedJohannes Kepler University, Linz, A: Sebastiano Panichella
Media Attached
msr-2020-Data-showcase16:47 - 16:55
Live Q&A
A: Audris Mockus, A: Zoe KottiAthens University of Economics and Business, A: Diomidis SpinellisAthens University of Economics and Business, A: Gabriel Dusing
Media Attached
msr-2020-Data-showcase16:55 - 17:04
Live Q&A
A: Diomidis SpinellisAthens University of Economics and Business, A: Zoe KottiAthens University of Economics and Business, A: Konstantinos Kravvaritis, A: Georgios Theodorou, A: Panos Louridas Athens University of Economics and Business
DOI Pre-print Media Attached
msr-2020-Data-showcase17:04 - 17:12
Live Q&A
A: Diomidis SpinellisAthens University of Economics and Business, A: Zoe KottiAthens University of Economics and Business, A: Audris Mockus
DOI Pre-print Media Attached
msr-2020-Data-showcase17:12 - 17:21
Live Q&A
A: Tanner Fry, A: Tapajit Dey, A: Andrey KarnauchUniversity of Tennessee Knoxville, A: Audris Mockus
Pre-print Media Attached
msr-2020-Data-showcase17:21 - 17:30
Live Q&A
A: Maëlick Claes University of Oulu, A: Mika MäntyläUniversity of Oulu
Media Attached
msr-2020-papers
16:30 - 17:00: Technical Papers - Platforms & Datasets at MSR:Zoom2
Chair(s): Moritz BellerFacebook, USA

Q/A & Discussion of Session Papers over Zoom (Joining info available on Slack)

msr-2020-papers16:30 - 16:37
Live Q&A
Toni MattisHasso Plattner Institute, University of Potsdam, Patrick ReinHasso Plattner Institute, Falco Dürsch, Robert HirschfeldHasso-Plattner-Institut (HPI), Germany
DOI Pre-print Media Attached
msr-2020-papers16:37 - 16:45
Live Q&A
Konstantinos Barmpis , Patrick NeubauerUniversity of York, UK, Jonathan Co, Dimitris KolovosUniversity of York, Nicholas Matragkas, Richard PaigeMcMaster University
Media Attached
msr-2020-papers16:45 - 16:52
Live Q&A
Che Shian Hung, Robert DyerUniversity of Nebraska - Lincoln
Pre-print Media Attached
msr-2020-Registered-Reports16:52 - 17:00
Live Q&A
A: Antoine PietriInria, A: Guillaume RousseauUniversité de Paris and Inria, A: Stefano ZacchiroliUniversité de Paris and Inria
Pre-print Media Attached

Tue 30 Jun
Times are displayed in time zone: (UTC) Coordinated Universal Time change

msr-2020-papers
10:30 - 11:00: Technical Papers - Apps & Bots at MSR:Zoom2
Chair(s): Ivano MalavoltaVrije Universiteit Amsterdam

Q/A & Discussion of Session Papers over Zoom (Joining info available on Slack)

msr-2020-Data-showcase10:30 - 10:37
Live Q&A
A: Pei Liu, A: Li LiMonash University, Australia, A: Yanjie Zhao, A: Xiaoyu Sun, A: John GrundyMonash University
Media Attached
msr-2020-Data-showcase10:37 - 10:45
Live Q&A
Media Attached
msr-2020-papers10:45 - 10:52
Live Q&A
Tapajit Dey, Sara Mousavi, Eduardo PonceUniversity of Tennessee - Knoxville, Tanner Fry, Bogdan VasilescuCarnegie Mellon University, Anna Filippova, Audris MockusUniversity of Tennessee - Knoxville
Pre-print Media Attached
msr-2020-papers10:52 - 11:00
Live Q&A
Ahmad AbdellatifConcordia University, Diego CostaConcordia University, Canada, Khaled BadranConcordia University, Rabe AbdalkareemConcordia University, Montreal, Canada, Emad ShihabConcordia University
Pre-print Media Attached
msr-2020-papers
10:30 - 11:00: Technical Papers - Evolution at MSR:Zoom
Chair(s): Jürgen CitoMIT

Q/A & Discussion of Session Papers over Zoom (Joining info available on Slack)

msr-2020-papers10:30 - 10:37
Live Q&A
Jens MeinickeCarnegie Mellon University, Juan HoyosUniversidad Nacional de Colombia, Bogdan VasilescuCarnegie Mellon University, Christian KästnerCarnegie Mellon University
Pre-print Media Attached
msr-2020-papers10:37 - 10:45
Live Q&A
Antoine PietriInria, Guillaume RousseauUniversité de Paris and Inria, Stefano ZacchiroliUniversité de Paris and Inria
Pre-print Media Attached
msr-2020-papers10:45 - 10:52
Live Q&A
Sergey Svitkov, Timofey BryksinJetBrains Research, Saint Petersburg State University
Pre-print Media Attached
msr-2020-Data-showcase10:52 - 11:00
Live Q&A
A: Themistoklis DiamantopoulosElectrical and Computer Engineering Dept, Aristotle University of Thessaloniki, A: Michail Papamichail , A: Thomas Karanikiotis, A: Kyriakos Chatzidimitriou Aristotle University of Thessaloniki, A: Andreas SymeonidisAristotle University of Thessaloniki
Pre-print Media Attached
msr-2020-papers
11:00 - 12:00: Technical Papers - Quality at MSR:Zoom
Chair(s): Jens KrinkeUniversity College London

Q/A & Discussion of Session Papers over Zoom (Joining info available on Slack)

msr-2020-papers11:00 - 11:12
Live Q&A
Laerte XavierUniversidade Federal de Minas Gerais (UFMG), Fabio da Silva Ferreira, Rodrigo Brito, Marco Tulio ValenteFederal University of Minas Gerais, Brazil
Pre-print Media Attached
msr-2020-papers11:12 - 11:24
Live Q&A
Peipei WangNorth Carolina State University, USA, Chris BrownNorth Carolina State University, Jamie JenningsNorth Carolina State University, Kathryn StoleeNorth Carolina State University
Pre-print Media Attached
msr-2020-Registered-Reports11:24 - 11:36
Live Q&A
A: Pavlína Wurzel Gonçalves, A: Enrico Fregnan, A: Tobias Baum, A: Kurt SchneiderLeibniz Universität Hannover, Software Engineering Group, A: Alberto BacchelliUniversity of Zurich
Pre-print Media Attached
msr-2020-papers11:36 - 11:48
Live Q&A
Pre-print Media Attached
msr-2020-papers11:48 - 12:00
Live Q&A
Yaroslav GolubevJetBrains Research, ITMO University, Maria Eliseeva, Nikita PovarovJetBrains, Timofey BryksinJetBrains Research, Saint Petersburg State University
Pre-print Media Attached
msr-2020-papers
11:00 - 12:00: Technical Papers - Security at MSR:Zoom2
Chair(s): Dimitris MitropoulosAthens University of Economics and Business

Q/A & Discussion of Session Papers over Zoom (Joining info available on Slack)

msr-2020-papers11:00 - 11:12
Live Q&A
Danielle GonzalezRochester Institute of Technology, USA, Michael RathTechnische Universität Ilmenau, Mehdi MirakhorliRochester Institute of Technology
DOI Pre-print Media Attached
msr-2020-papers11:12 - 11:24
Live Q&A
Paolo Calciati IMDEA Software Institute, Konstantin KuznetsovSaarland University, CISPA, Alessandra GorlaIMDEA Software Institute, Andreas ZellerCISPA Helmholtz Center for Information Security
Media Attached
msr-2020-papers11:24 - 11:36
Live Q&A
Triet Le Huynh MinhThe University of Adelaide, David Hin, Roland Croft, Muhammad Ali BabarThe University of Adelaide
DOI Pre-print Media Attached
msr-2020-Data-showcase11:36 - 11:48
Live Q&A
A: Jiahao FanNew Jersey Institute of Technology, USA, A: Yi LiNew Jersey Institute of Technology, USA, A: Shaohua WangNew Jersey Institute of Technology, USA, A: Tien N. NguyenUniversity of Texas at Dallas
Media Attached
msr-2020-papers11:48 - 12:00
Live Q&A
James WaldenNorthern Kentucky University
Pre-print Media Attached
msr-2020-papers
14:00 - 15:00: Technical Papers - ML4SE at MSR:Zoom
Chair(s): Kevin MoranWilliam & Mary/George Mason University

Q/A & Discussion of Session Papers over Zoom (Joining info available on Slack)

msr-2020-papers14:00 - 14:12
Live Q&A
Chen YangVeracode, Inc., Andrew SantosaVeracode, Inc., Ang Ming Yi, Abhishek Sharma Singapore Management University, Singapore, Asankhaya SharmaVeracode, Inc., David LoSingapore Management University
Pre-print Media Attached
msr-2020-papers14:12 - 14:24
Live Q&A
Rhys ComptonUniversity of Waikato, Eibe FrankDepartment of Computer Science, University of Waikato, Panos Patros, Abigail KoayUniversity of Waikato
DOI Pre-print Media Attached
msr-2020-papers14:24 - 14:36
Live Q&A
Abdulkarim KhormiFlorida State University, USA - Jazan University, KSA, Mohammad AlahmadiFlorida State University, Sonia HaiducFlorida State University
Pre-print Media Attached
msr-2020-papers14:36 - 14:48
Live Q&A
Gustavo PintoUFPA, Breno MirandaFederal University of Pernambuco, Supun DissanayakeThe University of Adelaide, Marcelo d'AmorimFederal University of Pernambuco, Christoph TreudeThe University of Adelaide, Antonia BertolinoCNR-ISTI
Pre-print Media Attached
msr-2020-papers14:48 - 15:00
Live Q&A
Sakib HaqueUniversity of Notre Dame, Alexander LeClairUniversity Of Notre Dame, Lingfei WuIBM Research, Collin McMillanUniversity of Notre Dame
Pre-print Media Attached
msr-2020-papers
16:00 - 17:00: Technical Papers - Developer Collaboration at MSR:Zoom
Chair(s): Bogdan VasilescuCarnegie Mellon University

Q/A & Discussion of Session Papers over Zoom (Joining info available on Slack)

msr-2020-papers16:00 - 16:10
Live Q&A
Pre-print Media Attached
msr-2020-papers16:10 - 16:20
Live Q&A
Nicole NovielliUniversity of Bari, Fabio CalefatoUniversity of Bari, Davide DongiovanniUniversity of Bari, Daniela GirardiUniversity of Bari, Filippo LanubileUniversity of Bari
DOI Pre-print Media Attached
msr-2020-Data-showcase16:20 - 16:30
Live Q&A
A: Esteban ParraFlorida State University, A: Ashley Ellis, A: Sonia HaiducFlorida State University
Pre-print Media Attached
msr-2020-Registered-Reports16:30 - 16:40
Live Q&A
A: Ingrid NunesUniversidade Federal do Rio Grande do Sul (UFRGS), Brazil, A: Christoph TreudeThe University of Adelaide, A: Fabio CalefatoUniversity of Bari
Pre-print Media Attached
msr-2020-Data-showcase16:40 - 16:50
Live Q&A
A: Preetha ChatterjeeUniversity of Delaware, USA, A: Kostadin DamevskiVirginia Commonwealth University, A: Nicholas A. KraftUserVoice, A: Lori Pollock
Pre-print Media Attached
msr-2020-papers16:50 - 17:00
Live Q&A
Yalin LiuUniversity of Notre Dame, Jinfeng LinUniversity of Notre Dame, Jane Cleland-HuangUniversity of Notre Dame
Media Attached
msr-2020-papers
16:00 - 17:00: Technical Papers - Visions & Reflections at MSR:Zoom2
Chair(s): Venera ArnaoudovaWashington State University

Q/A & Discussion of Session Papers over Zoom (Joining info available on Slack)

msr-2020-papers16:00 - 16:15
Live Q&A
Danielle GonzalezRochester Institute of Technology, USA, Thomas ZimmermannMicrosoft Research, Nachiappan NagappanMicrosoft Research
DOI Pre-print Media Attached
msr-2020-papers16:15 - 16:30
Live Q&A
Nicolas GoldUniversity College London, Jens KrinkeUniversity College London
DOI Pre-print Media Attached
msr-2020-papers16:30 - 16:45
Live Q&A
Ang JiaXi'an Jiaotong University, Ming FanXi'an Jiaotong University, Xi Xu, Di CuiXi'an Jiaotong University, Wenying Wei, Zijiang YangWestern Michigan University, Kai Ye, Ting LiuXi'an Jiaotong University
DOI Pre-print Media Attached
msr-2020-papers16:45 - 17:00
Live Q&A
Pre-print Media Attached

Call for Papers

The International Conference on Mining Software Repositories (MSR) has hosted a mining challenge since 2006. With this challenge, we call upon everyone interested to apply their tools to a common dataset. The challenge is for researchers and practitioners to bravely use their mining tools and approaches on a dare.

This year, the challenge is about mining the Software Heritage Graph Dataset, a very large dataset containing the development history of publicly available software, at the granularity used by state-of-the-art distributed version control systems. Included software artifacts were retrieved from major collaborative development platforms (e.g., GitHub, GitLab) and package repositories (e.g., PyPI, Debian, npm), and stored in a uniform representation: a fully-deduplicated Merkle DAG linking together source code files organized in directories, commits tracking evolution over time, up to full snapshots of version control systems (VCS) repositories as observed by the Software Heritage during periodic crawls.

Analyses can be based on the Software Heritage Graph Dataset alone or expanded to also include data from other resources such as GHTorrent, the Ultimate Debian Database, or any other dataset about software artifacts included in the dataset (e.g., previous studies about NPM, PyPI, etc). Note that the dataset does not contain the source code files themselves, but refers to them using persistent identifiers that can be used to cross-reference source code files referenced in previous studies/datasets or even retrieve source code of interest from Software Heritage.

The overall goal is to study public software development, expanding the scope of analysis of previous studies to a novel scale thanks to: (1) a good approximation of the entire corpus of publicly available software, (2) blending together related development histories in a single graph, and (3) abstracting over VCS and package differences, offering a canonical representation of source code artifacts.

Questions that are, to the best of our knowledge, not sufficiently answered and could be answered using this year dataset include:

  • Scale: Can previous software mining results be reproduced when looking at all the projects of a given kind rather than the “most starred”? At what point is sampling sufficient?
  • Cross-repository analysis: How can forking and duplication patterns inform us on software health and risks? How can community forks be distinguished from personal-use forks? What are good predictors of the success of a community fork?
  • Cross-origin analysis: Is software evolution consistent across different version control systems? Are there VCS-specific development patterns? How does a migration from a VCS to another affect development patterns? Is there a relationship between development cycles and package manager releases?
  • Graph structure: How tightly coupled are the different layers of the graph? What is the deduplication efficiency across different programming languages? When and where do source code files or directories tend to be reused? How is code shared between different forges?

These are just some of the questions that could be answered using the Software Heritage Graph Dataset. We encourage challenge participants to adapt the above research questions or formulate their own about any hidden knowledge that still defeats discovery in the treasure trove of our collective software commons!

How to Participate in the Challenge

First, familiarize yourself with the Software Heritage Graph Dataset:

Then, use the dataset to answer your research questions, report your findings in a four-page data challenge paper (see information below) and submit your abstract and paper in time (see important dates below). If your paper is accepted, present your results at MSR 2020 in Seoul, South Korea!

Submission

A challenge paper should describe the results of your work by providing an introduction to the problem you address and why it is worth studying, the version of the dataset you used, the approach and tools you used, your results and their implications, and conclusions. Make sure your report highlights the contributions and the importance of your work. See also our open science policy regarding the publication of software and additional data you used for the challenge.

Challenge papers must not exceed 4 pages plus 1 additional page only with references and must conform to the MSR 2020 format and submission guidelines. Each submission will be reviewed by at least three members of the program committee. Submissions should follow the ACM Conference Proceedings Formatting Guidelines (https://www.acm.org/publications/proceedings-template). LaTeX users must use the provided acmart.cls and ACM-Reference-Format.bst without modification, enable the conference format in the preamble of the document (i.e., \documentclass[sigconf,review]{acmart}), and use the ACM reference format for the bibliography (i.e., \bibliographystyle{ACM-Reference-Format}). The review option adds line numbers, thereby allowing referees to refer to specific lines in their comments.

IMPORTANT: MSR 2020 follows the double-blind submission model. Submissions should not reveal the identity of the authors in any way. This means that authors should:

  • leave out author names and affiliations from the body and metadata of the submitted pdf
  • ensure that any citations to related work by themselves are written in the third person, for example “the prior work of XYZ [2]” as opposed to “our prior work [2]”
  • not refer to their personal, lab or university website; similarly, care should be taken with personal accounts on GitHub, Google Drive, etc.
  • not upload unblinded versions of their paper on archival websites during bidding/reviewing. However uploading unblinded versions prior to submission is allowed and sometimes unavoidable (e.g., thesis).

Authors having further questions on double blind reviewing are encouraged to contact the Mining Challenge Chairs via email.

Papers must be submitted electronically through EasyChair, should not have been published elsewhere, and should not be under review or submitted for review elsewhere for the duration of consideration. ACM plagiarism policy and procedures shall be followed for cases of double submission. The submission must also comply with the IEEE Policy on Authorship.

Upon notification of acceptance, all authors of accepted papers will receive further instructions for preparing their camera ready versions. At least one author of each accepted paper is expected to register and present the results at MSR 2020 in Seoul, South Korea. All accepted contributions will be published in the electronic conference proceedings.

The dataset as object of study for the challenge can be cited through reference [MSR20DC] below, while the Software Heritage dataset itself and its schema can be referenced via [MSR19SH], which also contains additional sample queries.

@inproceedings{MSR20DC,
  title={The {Software Heritage Graph Dataset}: Large-scale Analysis of Public Software Development History},
  publisher = {IEEE},
  year = {2020},
  author={Antoine Pietri and Diomidis Spinellis and Stefano Zacchiroli},
  year={2020},
  booktitle={MSR 2020: The 17th International Conference on Mining Software Repositories},
  preprint={https://upsilon.cc/~zack/research/publications/msr-2020-challenge.pdf}
}

@inproceedings{MSR19SH,
  author = {Antoine Pietri and Diomidis Spinellis and Stefano Zacchiroli},
  title = {The Software Heritage Graph Dataset: Public software development under one roof},
  publisher = {IEEE},
  year = {2019},
  doi = {10.1109/MSR.2019.00030},
  pages = {138-142},
  booktitle = {MSR 2019: The 16th International Conference on Mining Software Repositories},
  preprint={https://upsilon.cc/~zack/research/publications/msr-2019-swh.pdf}
}

Important Dates

  • Abstracts due: January 30, 2020 (AOE)
  • Papers due: February 6, 2020 (AOE)
  • Author notification: March 2, 2020 (AOE)
  • Camera ready: March 16, 2020 (AOE)

Open Science Policy

Openness in science is key to fostering progress via transparency, reproducibility and replicability. Our steering principle is that all research output should be accessible to the public and that empirical studies should be reproducible. In particular, we actively support the adoption of open data and open source principles. To increase reproducibility and replicability, we encourage all contributing authors to disclose:

  • the source code of the software they used to retrieve and analyze the data
  • the (anonymized and curated) empirical data they retrieved in addition to the challenge dataset
  • a document with instructions for other researchers describing how to reproduce or replicate the results

Already upon submission, authors can privately share their anonymized data and software on preservation archives such as Zenodo, Figshare (see instructions), and Software Heritage (see instructions). After acceptance, data and software should be made public and referenceable. We also encourage authors to self-archive pre- and postprints of their papers in open, preserved repositories such as arXiv.org.

Best Mining Challenge Paper Award

All submissions will undergo the same review process independent of whether or not they disclose their analysis code or data. However, only accepted papers for which code and data are available on preservation archives, as described in the open science policy above, will be considered for the best mining challenge paper award.

Best Student Presentation Award

Like in the previous years, there will be a public voting during the conference to select the best mining challenge presentation. This award often goes to authors of compelling work who present an engaging story to the audience. To increase student involvement, only students can compete for this award.

Organization