MSR 2020
Mon 29 - Tue 30 June 2020
co-located with ICSE 2020
Mon 29 Jun 2020 10:36 - 10:42 at MSR:Zoom - Programming Languages & Models Chair(s): Dimitris Kolovos

Despite all of the power that machine learning and artificial intelligence (AI) models bring to applications, much of AI development is currently a fairly ad hoc process. Software engineering and AI development share many of the same languages and tools, but AI development as an engineering practice is still in early stages. Mining software repositories of AI models enables insight into the current state of AI development. However, much of the relevant metadata around models are not easily extractable directly from repositories and require deduction or domain knowledge. This paper presents a library called AIMMX that enables simplified AI Model Metadata eXtraction from software repositories. The extractors have five modules for extracting AI model-specific metadata: model name, associated datasets, references, AI frameworks used, and model domain. We evaluated AIMMX against 7,998 open source models from three sources: model zoos, arXiv AI papers, and state-of-the-art AI papers. AIMMX extracted metadata with 87% precision and 83% recall. As preliminary examples of how AI model metadata extraction enables studies and tools to advance engineering support for AI development, this paper presents an exploratory analysis for data and method reproducibility over the models in the evaluation dataset and a catalog tool for discovering and managing models. Our analysis suggests that while data reproducibility may be relatively poor with 42% of models in our sample citing their datasets, method reproducibility is more common at 72% of models in our sample, particularly state-of-the-art models. Our collected models are searchable in a catalog that uses existing metadata to enable advanced discovery features for efficiently finding models.

The library is open source and currently available at:

Mon 29 Jun
Times are displayed in time zone: (UTC) Coordinated Universal Time change

10:30 - 11:00: Technical Papers - Programming Languages & Models at MSR:Zoom
Chair(s): Dimitris KolovosUniversity of York

Q/A & Discussion of Session Papers over Zoom (Joining info available on Slack)

msr-2020-Registered-Reports10:30 - 10:36
Live Q&A
J├╝rgen CitoMIT, Jiasi ShenMassachusetts Institute of Technology, Martin RinardMIT
Pre-print Media Attached
msr-2020-papers10:36 - 10:42
Live Q&A
Jason TsayIBM Research, Alan BrazIBM Research, Martin HirzelIBM Research, Avraham ShinnarIBM Research, Todd Mummert
Pre-print Media Attached
msr-2020-papers10:42 - 10:48
Live Q&A
Timofey BryksinJetBrains Research, Saint Petersburg State University, Victor PetukhovJetBrains, ITMO University, Ilya Alexin, Stanislav Prikhodko, Alexey Shpilman, Vladimir KovalenkoTU Delft, Nikita PovarovJetBrains
Pre-print Media Attached
msr-2020-papers10:48 - 10:54
Live Q&A
Tomoki NakamaruGraduate School of Information Science and Technology, The University of Tokyo, Tomomasa Matsunaga, Tetsuro YamazakiGraduate School of Information Science and Technology, The University of Tokyo, Soramichi AkiyamaDepartment of Creative Informatics, The University of Tokyo, Shigeru ChibaThe University of Tokyo
Pre-print Media Attached
msr-2020-papers10:54 - 11:00
Live Q&A
Nan YangEindhoven University of Technology, The Netherlands, Pieter Cuijpers, Ramon SchiffelersEindhoven University of Technology and ASML, the Netherlands, Johan Lukkien, Alexander SerebrenikEindhoven University of Technology
Media Attached