Machine Learning for Code Smell Detection

We experimented several machine learning techniques for detecting code smells in Java code.


Datasets

The following datasets are related to the publication [3] (password: AF-MZ-mlcsd-severity-2016):


Datasets

We provide different datasets regarding the work done during the experimentations:

  • The metrics extracted from the classes and methods of 74 systems of the Qualitas Corpus: metrics.zip
  • The manual evaluation we performed and used as a training set:

We highly recommend reading at least one of the papers listed at end of the page in order to understand how these datasets have been created.

Furthermore, it is now available a tool that supports the creation of these datasets: WekaNose

Metric definitions

We applied machine learning algorithms to datasets representing source code artifacts (classes and methods) through a large set of metrics. The list and definitions of the exploited metrics are reported in a separated document.

Download Metric Definitions

  1. Ordered List ItemOrdered List ItemArcelli Fontana, Francesca, Marco Zanoni, Alessandro Marino, and Mika V. Mäntylä. 2013. “Code Smell Detection: Towards a Machine Learning-Based Approach.” In Proceedings of the 29th IEEE International Conference on Software Maintenance (ICSM 2013), 396–99. Eindhoven, The Netherlands: IEEE Computer Society. doi:10.1109/ICSM.2013.56.
  2. Arcelli Fontana, Francesca, Mika V. Mäntylä, and Marco Zanoni. 2015. “Comparing and Experimenting Machine Learning Techniques for Code Smell Detection.” Empirical Software Engineering, June, 1–49. doi:10.1007/s10664-015-9378-4.
  3. Arcelli Fontana, Francesca & Zanoni, Marco. 2017. “Code Smell Severity Classification using Machine Learning Techniques”. Knowledge-Based Systems. 128. doi: 10.1016/j.knosys.2017.04.014.
  4. Umberto Azadi, Francesca Arcelli Fontana, and Marco Zanoni. 2018. Machine learning based code smell detection through WekaNose. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceedings (ICSE ’18). ACM, New York, NY, USA, 288-289. doi: 10.1145/3183440.3194974