Thesis
phdthesis
An Empirical Study of Feature Engineering on Software Defect Prediction
  • March 2021
  • Ph.D. thesis / Kyoto Institute of Technology /
    • No URL available
    Abstract

    Software products are pivotal for our daily life such as infrastructure, work, and communication. Therefore, defects in such software products may cause widespread catastrophes. Indeed, several accidents have been reported whose causes were software defects. Due to such importance of software products, software developers carefully manage the quality of software products by software quality assurance (SQA) activities (e.g., software testing, code review, and CI/CD). For example, software testing inspects if software products meet all the requirements. However, recently software products have become enormous in size and depend on numerous environments; it is difficult to inspect the entire software products by SQA activities. Defect prediction distinguishes defective software entities (e.g., file) by a defect prediction model. Such a defect prediction model enables developers to allocate their SQA activities to defective entities and reveal more defects than applying SQA activities without such a model. Hence, defect prediction attracts interests by practitioners and researchers, and becomes an active research area in software engineering. Defect prediction models are usually machine learning models that are trained on software features of past software entities. Since machine learning models rely on such software features, prior studies used feature engineering on defect prediction to improve the prediction performance. Feature engineering is a process to create or improve features by our domain knowledge. For example, several studies retrieved new features from a software product. However, defect prediction still has challenges that can be addressed by feature engineering: (1) the comparison of feature reduction techniques, (2) using the context lines of source code as features, and (3) using semantic properties as features with a deep learning model on change-level defect prediction. In this thesis, to address these challenges, we (1) conducted a large-empirical comparison across feature reduction and selection techniques, (2) constructed context features retrieved from context lines, and (3) used semantic properties with a deep learning model on change-level defect prediction. Our results showed that (1) feature reduction and selection techniques improve the prediction performance while reducing the number of features, (2) context features improve the prediction performance, and (3) semantic features with a deep learning model significantly outperform a previous deep learning model.
    Files

    Published version
    BibTeX

    Copyright © 2025 omzn.aquatan.net a.k.a. Osamu Mizuno All rights reserved.

    The publications displayed in this list is related to SEL@KIT members only.