International Conference
inproceedings
Benefits and pitfalls of token-level SZZ: An empirical study on OSS projects
Abstract

SZZ is the de facto standard method for identifying bug-inducing commits. The accuracy of this method heavily relies on source code management systems, such as Git, as it requires tracing the history of source code changes (i.e., commit histories) to bug-inducing commits. However, it has been reported that these systems introduce biases in commit histories because they only store line-level changes. It is known that such coarse-grained line-level changes can result in the failure to accurately track the commit history and reduce the performance of SZZ. To relieve this challenge, we explore the accuracy of SZZ in token-level changes, which provide finer-grained information to trace commit histories compared to line-level ones, and we discuss the potential benefits and pitfalls of utilizing token-level changes for SZZ. As a result of experiments on 68 OSS projects, we found that SZZ, which uses token-level histories, identifies two new bug-inducing commits that are missed when using line-level histories. Furthermore, our manual analysis of the identified commits indicates that they reduce false-positive bug-inducing commits caused by source code formatting and whitespace changes. However, this improvement in detecting bug-inducing commits comes with a trade-off of 0.081 decrease in overall accuracy, as measured by the F1 score. Consequently, we summarized three potential benefits and five pitfalls of using token-level and line-level tracking for SZZ.
Files

No files available
BibTeX

Copyright © 2025 omzn.aquatan.net a.k.a. Osamu Mizuno All rights reserved.

The publications displayed in this list is related to SEL@KIT members only.