MSR 2020
Mon 29 - Tue 30 June 2020
co-located with ICSE 2020
Mon 29 Jun 2020 16:55 - 17:04 at MSR:Zoom - Github & OSS Datasets Chair(s): Olga Baysal

We present a dataset of open source software developed mainly by enterprises rather than volunteers. This can be used to address known generalizability concerns, and, also, to perform research on open source business software development. Based on the premise that an enterprise’s employees are likely to contribute to a project developed by their organization using the email account provided by it, we mine domain names associated with enterprises from open data sources as well as through white- and blacklisting, and use them through three heuristics to identify 17,252 enterprise GitHub projects. We provide these as a dataset detailing their provenance and properties. A manual evaluation of a dataset sample shows an identification accuracy of 89%. Through an exploratory data analysis we found that projects are staffed by a plurality of enterprise insiders, who appear to be pulling more than their weight, and that in a small percentage of relatively large projects development happens exclusively through enterprise insiders.

Mon 29 Jun
Times are displayed in time zone: (UTC) Coordinated Universal Time change

16:30 - 17:30: Github & OSS DatasetsTechnical Papers / Registered Reports / Keynote / MSR Awards / FOSS Award / Education / Data Showcase / Mining Challenge / MSR Challenge Proposals / Ask Me Anything at MSR:Zoom
Chair(s): Olga BaysalCarleton University

Q/A & Discussion of Session Papers over Zoom (Joining info available on Slack)

16:30 - 16:38
Live Q&A
A New Dataset for Pull Request AcceptanceMSR - Data Showcase
Data Showcase
A: Xunhui ZhangNational University of Defense Technology, China, A: Ayushi RastogiPostdoctoral researcher at TU Delft, A: Yue YuCollege of Computer, National University of Defense Technology, Changsha 410073, China
Pre-print Media Attached
16:38 - 16:47
Live Q&A
A Mixed Graph-Relational Dataset of Socio-technicalInteractions in Open Source SystemsMSR - Data Showcase
Data Showcase
A: Usman Ashraf, A: Christoph Mayr-DornJohannes Kepler University Linz, A: Alexander EgyedJohannes Kepler University, Linz, A: Sebastiano Panichella
Media Attached
16:47 - 16:55
Live Q&A
A Complete Set of Related Git Repositories Identified via Community Detection Approaches Based on Shared CommitsMSR - Data Showcase
Data Showcase
A: Audris Mockus, A: Zoe KottiAthens University of Economics and Business, A: Diomidis SpinellisAthens University of Economics and Business, A: Gabriel Dusing
Media Attached
16:55 - 17:04
Live Q&A
A Dataset of Enterprise-Driven Open Source SoftwareMSR - Data Showcase
Data Showcase
A: Diomidis SpinellisAthens University of Economics and Business, A: Zoe KottiAthens University of Economics and Business, A: Konstantinos Kravvaritis, A: Georgios Theodorou, A: Panos LouridasAthens University of Economics and Business
DOI Pre-print Media Attached
17:04 - 17:12
Live Q&A
A Dataset for GitHub Repository DeduplicationMSR - Data Showcase
Data Showcase
A: Diomidis SpinellisAthens University of Economics and Business, A: Zoe KottiAthens University of Economics and Business, A: Audris Mockus
DOI Pre-print Media Attached
17:12 - 17:21
Live Q&A
A Dataset and an Approach for Identity Resolution of 38 Million Author IDs extracted from 2B Git CommitsMSR - Data Showcase
Data Showcase
A: Tanner Fry, A: Tapajit Dey, A: Andrey KarnauchUniversity of Tennessee Knoxville, A: Audris Mockus
Pre-print Media Attached
17:21 - 17:30
Live Q&A
20-MAD - 20 years of issues and commits of Mozilla and Apache DevelopmentMSR - Data Showcase
Data Showcase
A: Maëlick Claes University of Oulu, A: Mika MäntyläUniversity of Oulu
Media Attached