Towards a New Hybrid Synthetic Minority Oversampling Technique for Imbalanced Problem in Software Defect Prediction
| dc.contributor.author | Saheed, Y.K. | |
| dc.contributor.author | Abdulsalam, S.O. | |
| dc.contributor.author | Ibrahim, M.B. | |
| dc.contributor.author | Baba, U.A. | |
| dc.date.accessioned | 2025-10-29T12:14:21Z | |
| dc.date.available | 2025-10-29T12:14:21Z | |
| dc.date.issued | 2024 | |
| dc.description.abstract | The software industry strives to improve software quality through continuous bug prediction, bug elimination, and module fault prediction. This issue has piqued researchers’ interest because of its significant relevance in the software industry. Frequently, Software Defect Prediction (SDP) models contain significantly skewed data, making it difficult for classifiers to recognize defective occurrences. The machine learning (ML) community has put a lot of effort into solving the problem of learning from imbalanced SDP data, though less so in empirical software engineering. The over-sampling strategy is one of many recent solutions to this problem. This strategy balances the number of defective and non-defective cases by creating new defective instances. Unfortunately, these methods would result in non-diverse synthetic instances as well as a large number of unneeded noise instances, creating an imbalanced class problem. As a result, we propose a hybrid synthetic minority oversampling (HSMOTE) to address the problem of imbalance in SDP. In this paper, we introduce the hybrid Synthetic Minority Oversampling Technique (HSMOTE), a method that utilizes Extra Tree, Random Forest (RF), and Extreme Gradient Boosting (Xgboost) for classification. We develop and deploy the proposed method on the National Aeronautics and Space Administration (NASA) dataset, evaluating its performance on three datasets: JM1, KC1, and PC3. We compared the parameters for accuracy, precision, AUC, recall, F-measure, and Mathew Correlation Coefficient to those of the existing SDP. The findings from the simulations on the JM1 data showed that the proposed techniques work better than the current best models. The proposed SMOTE+RF technique surpasses the existing techniques with an accuracy of 93.69%, an AUC of 82.70%, and an F-measure of 32.98%. Similarly, the proposed SMOTE+Xgboost method outperforms the existing techniques with an accuracy of 93.432%, an AUC of 82.64%, and an F-measure of 34.13%, while SMOTE+ET achieved an accuracy of 93.43%, an AUC of 77.68%, and an F-measure of 31.90%. Keywords—Synthetic Minority Oversampling Technique, Class Imbalance, Oversampling Method, Software Defect Prediction, Imbalance Data | |
| dc.identifier.uri | https://kwasuspace.kwasu.edu.ng/handle/123456789/6289 | |
| dc.language.iso | en | |
| dc.publisher | IEEE Xplore - Proceedings of 5th International Conference on Data Analytics for Business and Industry (ICDABI), University of Bahrain, October 23-24, 2024 | |
| dc.title | Towards a New Hybrid Synthetic Minority Oversampling Technique for Imbalanced Problem in Software Defect Prediction | |
| dc.type | Article |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- 22-ICDABI Bahrain_NHSMOTE Software Defect Problem.pdf
- Size:
- 656.92 KB
- Format:
- Adobe Portable Document Format
- Description:
License bundle
1 - 1 of 1
Loading...
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed to upon submission
- Description: