We experimented several machine learning techniques for detecting code smells in Java code.
Code smell severity classification
The following datasets are related to the publication  (password: AF-MZ-mlcsd-severity-2016):
Code smell detection (binary classification)
We provide different datasets regarding the work done during the experimentations:
The results of the detection of different Advisors: advisor_detection.zip
The metrics extracted from the classes and methods of 74 systems of the Qualitas Corpus: metrics.zip
The manual evaluation we performed and used as a training set:
evaluation_dataset.zip (datasets related to publication )
- binary class datasets (datasets related to publication  + datasets related to Long Parameter List and Switch Statement)
We highly recommend reading at least one of the papers listed at end of the page in order to understand how these datasets have been created.
Furthermore, it is now available a tool that supports the creation of these datasets: WekaNose
We applied machine learning algorithms to datasets representing source code artifacts (classes and methods) through a large set of metrics. The list and definitions of the exploited metrics are reported in a separated document.
Download Metric Definitions
- Ordered List ItemOrdered List ItemArcelli Fontana, Francesca, Marco Zanoni, Alessandro Marino, and Mika V. Mäntylä. 2013. “Code Smell Detection: Towards a Machine Learning-Based Approach.” In Proceedings of the 29th IEEE International Conference on Software Maintenance (ICSM 2013), 396–99. Eindhoven, The Netherlands: IEEE Computer Society. doi:10.1109/ICSM.2013.56.
- Arcelli Fontana, Francesca, Mika V. Mäntylä, and Marco Zanoni. 2015. “Comparing and Experimenting Machine Learning Techniques for Code Smell Detection.” Empirical Software Engineering, June, 1–49. doi:10.1007/s10664-015-9378-4.
- Arcelli Fontana, Francesca & Zanoni, Marco. 2017. “Code Smell Severity Classification using Machine Learning Techniques”. Knowledge-Based Systems. 128. doi: 10.1016/j.knosys.2017.04.014.
- Umberto Azadi, Francesca Arcelli Fontana, and Marco Zanoni. 2018. Machine learning based code smell detection through WekaNose. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceedings (ICSE ’18). ACM, New York, NY, USA, 288-289. doi: 10.1145/3183440.3194974