What constitutes Software? An Empirical, Descriptive Study of Artifacts
MSR - Technical Paper
The term software is ubiquitous, however, it does not seem as if we as a community have a clear understanding of what software actually is. Imprecise definitions of software do not help other professions, in particular those acquiring and sourcing software from third-parties, when deciding what precisely are potential deliverables. In this paper we investigate which artifacts constitute software by analyzing 23 715 repositories from Github, we categorize the found artifacts into high-level categories, such as, code, data, and documentation (and into 19 more concrete categories) and we can confirm the notion of others that software is more than just source code or programs, for which the term is often used synonymously. With this work we provide an empirical study of more than 13 million artifacts, we provide a taxonomy of artifact categories, and we can conclude that software most often consists of variously distributed amounts of code in different forms, such as, source code, binary code, scripts, etc., data, such as, configuration files, images, databases, etc., and documentation, such as user documentation, licenses, etc.