A Large-Scale Comparative Evaluation of IR-Based Tools for Bug Localization (MSR 2020 - Technical Papers)

Who

Shayan Akbar, Avinash Kak

Track

MSR 2020 Technical Papers

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 29 Jun 2020 15:00 - 15:10 at MSR:Zoom - Bugs & Issues Chair(s): Francisco Servant

Abstract

This paper reports on a large-scale comparative evaluation of IR-based tools for automatic bug localization. We have divided the bug localization tools in our evaluation into the following three generations: (1) The first-generation tools, now almost a decade old, are based purely on Bag-of-Words (BoW) modeling of software libraries; (2) The second-generation tools that augment BoW-based modeling with two additional pieces of information: historical data, such as change history, and structured information such as class names, method names, etc. And, (3) The most recent third-generation tools that additionally also exploit proximity, order, and semantic relationships between the terms. It is important to realize that the original authors of all these three generations of tools tested them on relatively small-sized datasets that typically consisted no more than a few thousand bug reports. And, for an even more serious shortcoming, those evaluations only involved Java code libraries. The goal of the present paper is to present a comprehensive large-scale evaluation of all three generations of bug-localization tools with code libraries in multiple languages. Our study involves over 20,000 bug reports drawn from a diverse collection of Java, C/C++, and Python projects.

Shayan Akbar

Avinash Kak

Media