Class Imbalanced Learning Menggunakan Algoritma Synthetic Minority Over-sampling Technique – Nominal (SMOTE-N) pada Dataset Tuberculosis Anak

Authors

  • Yulia Ery Kurniawati Institut Teknologi dan Bisnis Kalbis

DOI:

https://doi.org/10.24002/jbi.v10i2.2441

Abstract

Abstract.

Class Imbalanced Learning (CIL) is the learning process for data representation and information extraction with severe data distribution to develop effective decisions supporting the decision-making process. SMOTE-N is one of the data level approach in CIL using over-sampling method. It generates synthetic instances to balance its minority class. This research applied SMOTE-N on Children Tuberculosis Dataset that has class imbalance. Over-sampling method is chosen to avoid important information loss because the Childhood Tuberculosis Dataset has a small number of instances. The Naive Bayes Classifier has been applied to the balance dataset to evaluate its model. The results show that SMOTE-N can improve CIL performance metrics.
Keywords: Class Imbalance Learning, Over-sampling, SMOTE-N, Naïve Bayes Classifier


Abstrak.

Class Imbalance Learning (CIL) merupakan proses pembelajaran untuk representasi data dan ekstraksi informasi dengan distribusi data yang buruk untuk mendukung pembuatan keputusan yang efektif dalam proses pengambilan keputusan. SMOTE-N adalah salah satu pendekatan data-level dalam CIL mengunakan metode over-sampling. SMOTE-N menghasilkan instance sintesis untuk menyeimbangkan jumlah instance pada kelas minoritasnya. Penelitian ini mengaplikasikan SMOTE-N pada dataset Tuberculosis Anak (TB Anak) yang memiliki ketidakseimbangan kelas. Metode over-sampling dipilih untuk menghindari kehilangan informasi yang penting dikarenakan dataset TB Anak memiliki jumlah instance yang sedikit. Naïve Bayes Classifier digunakan untuk mengevaluasi model dari dataset yang sudah seimbang. Hasilnya menunjukkan bahwa SMOTE-N dapat meningkatkan kinerja pada CIL.
Kata Kunci: Class Imbalance Learning, Over-sampling, SMOTE-N, Naïve Bayes Classifier

References

Pusat Data dan Informasi Kementrian Kesehatan, Info Data dan Informasi Tuberkulosis. 2015.

World Health Organization, “Global tuberculosis report 2018,” WHO, 2018.

Kementrian Kesehatan Republik Indonesia, “Peduli TBC, Indonesia Sehat,” 2018. [Online]. Available: http://www.depkes.go.id/article/view/18032100002/peduli-tbc-indonesia-sehat.html. [Accessed: 22-Oct-2018].

Kementrian Kesehatan Republik Indonesia, “TB Anak : TB Indonesia.” [Online]. Available: http://www.tbindonesia.or.id/tb-anak/. [Accessed: 19-Oct-2018].

V. S. Spelmen and R. Porkodi, “A Review on Handling Imbalanced Data,” Proc. 2018 Int. Conf. Curr. Trends Towar. Converging Technol. ICCTCT 2018, pp. 1–11, 2018.

W. E. Sari, O. Wahyunggoro, and S. Fauziati, “A Comparative Study on Fuzzy Mamdani-Sugeno-Tsukamoto for The Childhood Tuberculosis Diagnosis,” AIP Conf. Proc., vol. 1755, no. 1, p. 70003, 2016.

H. He and Y. Ma, Imbalanced Learning: Foundations, Algorithms, and Applications, 1st ed. Wiley-IEEE Press, 2013.

A. Dal Pozzolo, O. Caelen, and G. Bontempi, “Comparison of Balancing Techniques for Unbalanced Datasets,” Mach. Learn. Gr. Univ. Libr. Bruxelles Belgium, vol. 16, no. 1, pp. 732–735, 2010.

K. Li, W. Zhang, Q. Lu, and X. Fang, “An Improved SMOTE Imbalanced Data Classification Method Based on Support Degree,” in International Conference on Identification, Information and Knowledge in Internet of Things, 2014, pp. 34–38.

F. Koto, “SMOTE-Out, SMOTE-Cosine, and Selected-SMOTE : An Enhancement Strategy to Handle Imbalance in Data Level,” ICACSIS, pp. 280–284, 2014.

J. Li, H. Li, and J.-L. Yu, “Application of Random-SMOTE on Imbalanced Data Mining,” Fourth Int. Conf. Bus. Intell. Financ. Eng., pp. 130–133, 2011.

G. I. Winata and M. L. Khodra, “Handling imbalanced dataset in multi-label text categorization using Bagging and Adaptive Boosting,” in Proceedings - 5th International Conference on Electrical Engineering and Informatics: Bridging the Knowledge between Academic, Industry, and Community, ICEEI 2015, 2015, pp. 500–505.

N. V Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” J. Artif. Intell. Res., vol. 16, pp. 321–357, 2002.

R. I. Rashu, N. Haq, and R. M. Rahman, “Data Mining Approaches to Predict Final Grade by Overcoming Class Imbalance Problem,” in 2014 17th International Conference on Computer and Information Technology, ICCIT 2014, 2014, pp. 14–19.

A. C. Flores, R. I. Icoy, C. F. Peña, and K. D. Gorro, “An Evaluation of SVM and Naive Bayes with SMOTE on Sentiment Analysis Data Set,” in 2018 International Conference on Engineering, Applied Sciences, and Technology (ICEAST), 2018, pp. 1–4.

M. Ahsan, R. Gomes, and A. Denton, “SMOTE Implementation on Phishing Data to Enhance Cybersecurity,” in 2018 IEEE International Conference on Electro/Information Technology (EIT), 2018, pp. 531–536.

F. Gorunescu, Data mining: Concepts, models and techniques, vol. 12. Berlin: Springer, 2011.

J. Han and M. Kamber, “Data Mining: Concepts and Techniques,” Ann. Phys. (N. Y)., vol. 54, p. 770, 2006.

I. H. Witten, E. Frank, and M. A. Hall, Data Mining: Practical Machine Learning Tools and Techniques, Third. Amsterdam: Morgan Kaufmann, 2011.

Downloads

Published

2019-10-30