PMAN4 - SEL@KIT

PMAN4

SEL@KIT

学位論文

phdthesis

An Empirical Study of Feature Engineering on Software Defect Prediction

2021年3月
Ph.D. thesis / Kyoto Institute of Technology /

Masanari Kondo

URLがありません

概要

Software products are pivotal for our daily life such as infrastructure, work, and communication. Therefore, defects in such software products may cause widespread catastrophes. Indeed, several accidents have been reported whose causes were software defects. Due to such importance of software products, software developers carefully manage the quality of software products by software quality assurance (SQA) activities (e.g., software testing, code review, and CI/CD). For example, software testing inspects if software products meet all the requirements. However, recently software products have become enormous in size and depend on numerous environments; it is diﬃcult to inspect the entire software products by SQA activities. Defect prediction distinguishes defective software entities (e.g., file) by a defect prediction model. Such a defect prediction model enables developers to allocate their SQA activities to defective entities and reveal more defects than applying SQA activities without such a model. Hence, defect prediction attracts interests by practitioners and researchers, and becomes an active research area in software engineering. Defect prediction models are usually machine learning models that are trained on software features of past software entities. Since machine learning models rely on such software features, prior studies used feature engineering on defect prediction to improve the prediction performance. Feature engineering is a process to create or improve features by our domain knowledge. For example, several studies retrieved new features from a software product. However, defect prediction still has challenges that can be addressed by feature engineering: (1) the comparison of feature reduction techniques, (2) using the context lines of source code as features, and (3) using semantic properties as features with a deep learning model on change-level defect prediction. In this thesis, to address these challenges, we (1) conducted a large-empirical comparison across feature reduction and selection techniques, (2) constructed context features retrieved from context lines, and (3) used semantic properties with a deep learning model on change-level defect prediction. Our results showed that (1) feature reduction and selection techniques improve the prediction performance while reducing the number of features, (2) context features improve the prediction performance, and (3) semantic features with a deep learning model significantly outperform a previous deep learning model.

タグ

Empirical Study Feature Engineering Software Defect Prediction 博論 PhDThesis

ファイル

Published version

BibTeX

ここのリストで表示される文献は，SEL@KIT在籍者に関連するもののみになります．