MSR 2020
Mon 29 - Tue 30 June 2020
co-located with ICSE 2020
Mon 29 Jun 2020 17:21 - 17:30 at MSR:Zoom - Github & OSS Datasets Chair(s): Olga Baysal

Data of long-lived and high profile projects is valuable for research on successful software engineering in the wild. Having a dataset with different linked software repositories of such projects, enables deeper diving investigations. This paper present 20-MAD, a dataset linking the commit and issue data of Mozilla and Apache projects. It includes over 20 years of information about 765 projects, 3.4M commits, 2.3M issues, and 17.3M issue comments. The data contains all the typical information about source code commits (e.g., lines added and removed, message and commit time) and issues (status, severity, votes, and summary). For sentiment analysis, the data has been preprocessed to include emoticons and valence and arousal scores. Linking code repository and issue tracker information, allows inferring timezone information for issue tracker timestamps that don’t have a timezone. To our knowledge this the largest linked data set in size and in project lifetime that isn’t based on GitHub.

Conference Day
Mon 29 Jun

Displayed time zone: (UTC) Coordinated Universal Time change

16:30 - 17:30
Github & OSS DatasetsTechnical Papers / Registered Reports / Keynote / MSR Awards / FOSS Award / Education / Data Showcase / Mining Challenge / MSR Challenge Proposals / Ask Me Anything at MSR:Zoom
Chair(s): Olga BaysalCarleton University

Q/A & Discussion of Session Papers over Zoom (Joining info available on Slack)

16:30
8m
Live Q&A
A New Dataset for Pull Request AcceptanceMSR - Data Showcase
Data Showcase
A: Xunhui ZhangNational University of Defense Technology, China, A: Ayushi RastogiUniversity of Groningen, The Netherlands, A: Yue YuCollege of Computer, National University of Defense Technology, Changsha 410073, China
Pre-print Media Attached
16:38
8m
Live Q&A
A Mixed Graph-Relational Dataset of Socio-technicalInteractions in Open Source SystemsMSR - Data Showcase
Data Showcase
A: Usman Ashraf, A: Christoph Mayr-DornJohannes Kepler University Linz, A: Alexander EgyedJohannes Kepler University, Linz, A: Sebastiano Panichella
Media Attached
16:47
8m
Live Q&A
A Complete Set of Related Git Repositories Identified via Community Detection Approaches Based on Shared CommitsMSR - Data Showcase
Data Showcase
A: Audris Mockus, A: Zoe KottiAthens University of Economics and Business, A: Diomidis SpinellisAthens University of Economics and Business, A: Gabriel Dusing
Media Attached
16:55
8m
Live Q&A
A Dataset of Enterprise-Driven Open Source SoftwareMSR - Data Showcase
Data Showcase
A: Diomidis SpinellisAthens University of Economics and Business, A: Zoe KottiAthens University of Economics and Business, A: Konstantinos Kravvaritis, A: Georgios Theodorou, A: Panos LouridasAthens University of Economics and Business
DOI Pre-print Media Attached
17:04
8m
Live Q&A
A Dataset for GitHub Repository DeduplicationMSR - Data Showcase
Data Showcase
A: Diomidis SpinellisAthens University of Economics and Business, A: Zoe KottiAthens University of Economics and Business, A: Audris Mockus
DOI Pre-print Media Attached
17:12
8m
Live Q&A
A Dataset and an Approach for Identity Resolution of 38 Million Author IDs extracted from 2B Git CommitsMSR - Data Showcase
Data Showcase
A: Tanner Fry, A: Tapajit Dey, A: Andrey KarnauchUniversity of Tennessee Knoxville, A: Audris Mockus
Pre-print Media Attached
17:21
8m
Live Q&A
20-MAD - 20 years of issues and commits of Mozilla and Apache DevelopmentMSR - Data Showcase
Data Showcase
A: Maëlick ClaesUniversity of Oulu, A: Mika MäntyläUniversity of Oulu
Media Attached