A Soft Alignment Model for Bug DeduplicationMSR - Technical Paper
Bug tracking systems (BTS) are widely used in software projects. An important task in such systems consists in identifying duplicate bug reports, i.e., distinct reports related to the same software issue. For several reasons, reporting bugs that have already been reported is quite frequent, making their manual triage unpractical in large BTSs. In this paper, we present a novel deep learning network based on soft-attention alignment to improve duplicate bug report detec- tion. Our model can dynamically focus on distinct segments of the bug reports during the generation of report representations. This architecture is more flexible than previous approaches that compute report representations without or with limited data exchange. We evaluate our model on four well-known datasets derived from BTSs of four popular open-source projects. Our experimental evaluation relies on a ranking-based methodology, which is more adherent to real world scenarios than decision-making methodologies used in many previous works. It demonstrates that our model outperforms state-of-the-art systems and strong baselines in different scenarios. Finally, we also report on ablation studies to confirm our hypothesis that a more flexible architecture is helpful for extracting relevant information from bug reports.