Data of long-lived and high profile projects is valuable for research on successful software engineering in the wild. Having a dataset with different linked software repositories of such projects, enables deeper diving investigations. This paper present 20-MAD, a dataset linking the commit and issue data of Mozilla and Apache projects. It includes over 20 years of information about 765 projects, 3.4M commits, 2.3M issues, and 17.3M issue comments. The data contains all the typical information about source code commits (e.g., lines added and removed, message and commit time) and issues (status, severity, votes, and summary). For sentiment analysis, the data has been preprocessed to include emoticons and valence and arousal scores. Linking code repository and issue tracker information, allows inferring timezone information for issue tracker timestamps that don’t have a timezone. To our knowledge this the largest linked data set in size and in project lifetime that isn’t based on GitHub.
A: Xunhui Zhang National University of Defense Technology, China, A: Ayushi Rastogi University of Groningen, The Netherlands, A: Yue Yu College of Computer, National University of Defense Technology, Changsha 410073, China