Prediction of fault-prone modules is important research theme in software engineering. We have proposed an ap- proach to predict fault-prone modules using spam filtering technique, named fault-prone filtering[2].
In fault-prone filtering, although high accuracy and high recall were achieved in [2], it was still unclear what kind of tokens in source code affect on the accuracy of prediction. In this paper, we thus perform an experiment to investigate sensitivity of tokens to the accuracy of prediction. To do so, we tokenized source codes into the following categories: Reserved, Identifier, Separator, Operator, and Literal. The experiment showed that the most effective token was “Iden- tifier”.