Predicting fault-prone modules by SPAM filtering technique

uriha

The fault-prone module detection in source code is of importance for assurance of software quality. Most of previous fault-prone detection approaches are based on software metrics. Such approaches, however, have difficulties in collecting the metrics and constructing mathematical models based on the metrics.

In order to mitigate such difficulties, we propose a novel approach for detecting fault-prone modules using a spam filtering technique, named Fault-Prone Filtering. Because of the increase of needs for spam e-mail detection, the spam filtering technique has been progressed as a convenient and effective technique for text mining. In our approach, fault-prone modules are detected in a way that the source code modules are considered as text files and are applied to the spam filter directly.

To show the usefulness of our approach, we conducted an experiment using a large source code repository of Java based open source project. The result of experiment shows that our approach can classify about 85% of software modules correctly. The result also indicates that fault-prone modules can be detected relatively low cost at an early stage.

Related paper:

  • A. Yamada and O. Mizuno, "Classification of Bug Injected and Fixed Changes Using a Text Discriminator," ACIS International Journal of Software Innovation, 3(1), pp. 50-62, January 2015.
  • O. Mizuno, N. Kawashima, and K. Kawamoto, "Fault-Prone Module Prediction Approaches Using Identifiers in Source Code," ACIS International Journal of Software Innovation, 3(1), pp. 36-49, January 2015.
  • O. Mizuno and H. Hata, "A Metric to Detect Fault-Prone Software Modules Using Text Classifier," International Journal of Reliability and Safety, 7(1), pp. 17-31, February 2013.
  • H. Hata, O. Mizuno, T. Kikuno, "Fault Prediction on Fine-Grained Modules Based on Historical Metrics," Trans. of Information Processing Society of Japan, 53(6), pp. 1635-1643, June 2012.
  • O. Mizuno and M. Nakai, "Can Faulty Modules Be Predicted by Warning Messages of Static Code Analyzer?," Advances in Software Engineering, 2012(924923), 8 pages, May 2012.
  • O. Mizuno, H. Hata, "A Comparative Study on Fault-Prone Module Prediction between Spam-Filter Based Approach and Complexity Metrics Based Approach," IEICE Transactions on Information and Systems, J94-D(1), pp. 409-412, January 2011.
  • O. Mizuno and H. Hata, "A Hybrid Fault-Proneness Detection Approach Using Text Filtering and Static Code Analysis," International Journal of Advancements in Computing Technology, 2(5), pp. 1-12, December 2010.
  • H. Hata, O. Mizuno, and T. Kikuno, "Fault-Prone Module Detection Using Large-Scale Text Features Based on Spam Filtering," Empirical Software Engineering, 15(2), pp. 147-165, April 2010. (JCR: 1.612 (2009))
  • O. Mizuno and H. Hata, "Prediction of Fault-Prone Modules Using a Text Filtering Based Metric," International Journal of Software Engineering and Its Application, 4(1), pp. 43-52, January 2010.
  • O. Mizuno and T. Kikuno, "Prediction of Fault-Prone Software Modules Using a Generic Text Discriminator," IEICE Trans. on Information and Systems, E91-D(4), pp. 888-896, April 2008. (JCR: 0.369 (2008))
  • O. Mizuno, T. Kikuno, "Fault-Prone Filtering: a Simple Approach to Predict Fault-Prone Modules Using Spam Filter," SEC journal, 4(1), pp. 6-15, February 2008.
  • T. Fujiwara, O. Mizuno, and P. Leelaprute, "Fault-Prone Byte-Code Detection Using Text Classifier," In Proc. of 16th International Conference on Product-Focused Software Process Improvement (PROFES2015), 1st International Workshop on Processes, Methods, and Tools for Engineering Embedded Systems, LNCS(9459), pp. 415-430, December 2015. (Bozen-Bolzano, Italy)
  • K. Mori and O. Mizuno, "An Implementation of Just-In-Time Fault-Prone Prediction Technique Using Text Classifier," In Proc. of the 39th IEEE Computers, Software, and Applications Conference (COMPSAC 2015), pp. 609-612, July 2015. (Taichung, Taiwan)
  • O. Mizuno and Y. Hirata, "A Cross-Project Evaluation of Text-Based Fault-Prone Module Prediction," In Proc. of 6th International Workshop on Empirical Software Engineering in Practice (IWESEP2014), pp. 43-48, November 2014. (Osaka, Japan) (Acceptance rate: 56%, 10/18)
  • A. Yamada and O. Mizuno, "A Text Filtering Based Approach to Classify Bug Injected and Fixed Changes," In Proc. of 12th International Conference on Software Engineering Research, Management and Applications (SERA2014), pp. 680-686, August 2014. (Kitakyushu, Japan) (Acceptance rate: 59%, 19/32)
  • N. Kawashima and O. Mizuno, "Predicting Fault-Prone Modules by Word Occurrence in Identifiers," In Proc. of 12th International Conference on Software Engineering Research, Management and Applications (SERA2014), Studies in Computational Intelligence , 578, pp. 87-98, August 2014. (Kitakyushu, Japan) (Acceptance rate: 59%, 19/32)
  • O. Mizuno, "On Effects of Tokens in Source Code to Accuracy of Fault-Prone Module Prediction," In Proc. of the 17th International Computer Science and Engineering Conference (ICSEC2013), 103 - 108, September 2013. (Bangkok, Thailand) (Acceptance rate: 57%, 73/128)
  • K. Kawamoto and O. Mizuno, "Predicting Fault-Prone Modules Using the Length of Identifiers," In Proc. of 4th International Workshop on Empirical Software Engineering in Practice (IWESEP 2012), pp. 30-34, October 2012. (Osaka, Japan) (Acceptance rate: 8/14, 57%)
  • Y. Hirata and O. Mizuno, "Investigating Effects of Tokens on Detecting Fault-Prone Modules by Text Filtering," In Proc. of 22nd International Symposium on Software Reliability Engineering (ISSRE2011), Supplemental proceedings, 3-2, November 2011. (Hiroshima, Japan)
  • M. Nakai and O. Mizuno, "Fault-Prone Module Prediction by Filtering Warning Messages of Static Code Analyzer," In Proc. of the Joint Conference of the 21th International Workshop on Software Measurement and the 6th International Conference on Software Process and Product Measurement (IWSM/MENSURA2011), Fast abstracts, pp. 18-21, November 2011. (Nara, Japan)
  • Y. Hirata and O. Mizuno, "Do Comments Explain Codes Adequately? -- Investigation by Text Filtering --," In Proc. of 8th Working Conference on Mining Software Repositories (MSR2011), pp. 242-245, May 2011. (Honolulu, HI, USA)
  • O. Mizuno and Y. Hirata, "Fault-Prone Module Prediction Using Contents of Comment Lines," In Proc. of International Workshop on Empirical Software Engineering in Practice 2010 (IWESEP2010), pp. 39-44, December 2010. (NAIST, Nara, Japan) (Acceptance rate: 66%)
  • O. Mizuno and H. Hata, "An Empirical Comparison of Fault-Prone Module Detection Approaches: Complexity Metrics and Text Feature Metrics," In Proc. of 34th Annual IEEE Computer Software and Applications Conference (COMPSAC2010), pp. 248-249, July 2010. (Seoul, Korea)
  • O. Mizuno and H. Hata, "An Integrated Approach to Detect Fault-Prone Modules Using Complexity and Text Feature Metrics," In Proc. of 2010 International Conference on Advanced Science and Technology (AST2010), LNCS 6059, pp. 457-468, June 2010. (Miyazaki, Japan)
  • O. Mizuno and H. Hata, "Yet Another Metric for Predicting Fault-Prone Modules," In Proc. of 2009 International Conference on Advanced Software Engineering & Its Applications (ASEA2009), CCIS 59, pp. 296-304, December 2009. (Cheju, Korea)
  • H. Hata, O. Mizuno, and T. Kikuno, "Comparative Study of Fault-Proneness Filtering with PMD," In Proc. of 19th International Symposium on Software Reliability Engineering (ISSRE2008), pp. 317-318, November 2008. (Seattle/Redmond, WA, USA) (Acceptance rate: 37%, 11/30)
  • H. Hata, O. Mizuno, and T. Kikuno, "An Extension of Fault-Prone Filtering Using Precise Training and a Dynamic Threshold," In Proc. of 5th Working Conference on Mining Software Repositories (MSR2008), pp. 89-97, May 2008. (Leipzig, Germany) (Acceptance rate: 19%)
  • T. Kondou, O. Mizuno, and T. Kikuno, "Investigating Factors Affecting Accuracy of Fault-Prone Filtering," In Proc. of 18th International Symposium on Software Reliability Engineering (ISSRE2007), Supplemental proceedings, CD-ROM, November 2007. (Trollhattan, Sweden)
  • T. Yagi, O. Mizuno, and T. Kikuno, "Analysing Effect of Pre-Training in Fault-Prone Prediction Using Spam Filter," In Proc. of 18th International Symposium on Software Reliability Engineering (ISSRE2007), Supplemental proceedings, CD-ROM, November 2007. (Trollhattan, Sweden)
  • O. Mizuno, S. Ikami, S. Nakaichi, and T. Kikuno, "Fault-Prone Filtering: Detection of Fault-Prone Modules Using Spam Filtering Technique," In Proc. 1st International Symposium on Empirical Software Engineering and Measurement (ESEM2007), pp. 374-383, September 2007. (Madrid, Spain) (Acceptance rate: 41%, 44/107)
  • O. Mizuno and T. Kikuno, "Training on Errors Experiment to Detect Fault-Prone Software Modules by Spam Filter," In The 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE2007), pp. 405-414, September 2007. (Dubrovnik, Croatia) (Acceptance rate: 17%, 43/251)
  • M. Kimoto, O. Mizuno, and T. Kikuno, "Extraction of Fault-Prone Modules Based on Fault Tracking Data from Open Source Software Repository," In 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN2007), Supplemental Proceedings, pp. 366-367, June 2007. (Edinburgh, UK)
  • O. Mizuno, S. Ikami, S. Nakaichi, and T. Kikuno, "Spam Filter Based Approach for Finding Fault-Prone Software Modules," In Proc. of Fourth International Workshop on Mining Software Repositories (MSR07), p. 4, May 2007. (Minneapolis, MN, USA) (Acceptance rate: 51%)
  • K. Mori, O. Mizuno, "スパムフィルタに基づく即時バグ予測ツールの試作," ソフトウェア・シンポジウム2015, pp. 37-46, June 2015.
  • T. Fujiwara, O. Mizuno, "バイトコードを用いたテキスト分類による不具合予測," ソフトウェア・シンポジウム2015, pp. 80-88, June 2015.
  • N. Kawashima, O. Mizuno, "識別子中の単語情報を用いた Fault-prone モジュール予測," ソフトウェアシンポジウム2014論文集, pp. 72-80, June 2014. (秋田市)
  • H. Hata, O. Mizuno, T. Kikuno, "Historical Metrics Based Foult-Prone Module Prediction for Fine-Grained Modules," ソフトウェアエンジニアリングシンポジウム2011(SES2011), 4, September 2011.
  • M. Nakai, O. Mizuno, "ソースコード静的解析結果を利用した不具合混入モジュールの予測手法の提案," ソフトウェア・シンポジウム2011, 09_研究論文 (Online only), June 2011. (長崎市)
  • H. Hata, O. Mizuno, T. Kikuno, "Application of Machine Learning Without Negative Examples to Fault-Prone Module Detection," Software Engineering Symposium 2009, pp. 133-138, September 2009. (東京)
  • O. Mizuno, "Fault-proneness Filtering: スパムフィルタに基づく不具合混入ソフトウェアモジュールの予測手法," 生産と技術, 61(1), pp. 38-43, January 2009.
  • Y. Hirata, O. Mizuno, "テキスト分類に基づくFault-proneモジュール検出法におけるコメント行の影響の分析," 情報処理学会研究報告 ソフトウェア工学(SE), 2010-SE-170(10), pp. 1-8, November 2010. (大阪大学)
  • H. Liu, O. Mizuno, T. Kikuno, "A Comparative Study of Fault-Prone Module Detection Methods -- Fault-Proneness Filtering and Logistic Regression --," Technical Report of IEICE, 108(384, KBSE2008-47), pp. 61-66, January 2009. (東京)
  • R. Morii, O. Mizuno, T. Kikuno, "Identifying Fault-Prone Tokens in Source Code Modules with Spam-Filtering Technique," Technical Report of IEICE, 108(64, SS2008-4), pp. 19-24, May 2008. (宮崎市)
  • T. Yagi, O. Mizuno, T. Kikuno, "SPAMフィルタを用いたFault-Proneモジュールの予測 -- 異なるプロジェクトの学習結果を利用した精度評価," ソフトウェア信頼性研究会第4回ワークショップ論文集, pp. 35-43, June 2007. (松山市)
  • S. Ikami, S. Nakaichi, O. Mizuno, T. Kikuno, "Prediction of Fault-Prone Source Code Modules Using Text Classifier," 電子情報通信学会技術研究報告, 106(522, SS2006-75), pp. 25-30, February 2007. (愛知県立大学, 名古屋市)
  • N. Kawashima, "識別子中の単語情報を用いたFault-proneモジュール予測手法の提案," 卒業研究報告書, 京都工芸繊維大学, February 2014.
  • Y. Hirata, "Application of Trend of Tokens in Source Code Modules to Fault-Prone Module Prediction," Master thesis, Kyoto Institute of Technology, 2012.