A Large-Scale Comparative Evaluation of IR-Based Tools for Bug LocalizationMSR - Technical Paper
This paper reports on a large-scale comparative evaluation of IR-based tools for automatic bug localization. We have divided the bug localization tools in our evaluation into the following three generations: (1) The first-generation tools, now almost a decade old, are based purely on Bag-of-Words (BoW) modeling of software libraries; (2) The second-generation tools that augment BoW-based modeling with two additional pieces of information: historical data, such as change history, and structured information such as class names, method names, etc. And, (3) The most recent third-generation tools that additionally also exploit proximity, order, and semantic relationships between the terms. It is important to realize that the original authors of all these three generations of tools tested them on relatively small-sized datasets that typically consisted no more than a few thousand bug reports. And, for an even more serious shortcoming, those evaluations only involved Java code libraries. The goal of the present paper is to present a comprehensive large-scale evaluation of all three generations of bug-localization tools with code libraries in multiple languages. Our study involves over 20,000 bug reports drawn from a diverse collection of Java, C/C++, and Python projects.