Top-k Feature Selction Untuk Deteksi Penyakit Hepatitis Menggunakan Algoritme Naïve Bayes

Authors

DOI:

https://doi.org/10.24002/jbi.v11i1.2456

Abstract

Abstract. Becoming one of the society health problems in the world, hepatitis is an inflammation liver disease caused by a virus, bacterial infection, chemical substances including drugs and alcohol. In this research, for the dataset of hepatitis having high dimensionality, its value for each attribute was calculated using weight information gain method. Then, the attributes were selected by using top-k methods and were classified by using Naïve Bayes Algorithm respectively. This research showed that 9 out of 20 attributes had chosen to be the highest top-9 with an accuracy rate of 85.57%. Later on, this research can be useful for a consideration in a decision making process for various subjects related to feature selection and Naïve Bayes Algorithm method and also for predicting hepatitis.
Keywords: data mining, weight information gain, Naïve Bayes algorithm


Abstrak. Penyakit hepatitis merupakan masalah kesehatan masyarakat di dunia. Penyakit hepatitis merupakan penyakit peradangan hati yang disebabkan oleh virus, infeksi bakteri, zat-zat kimia termasuk obat-obatan dan alkohol. Pada penelitian ini, dataset hepatitis yang memiliki data berdimensi tinggi akan dihitung nilai bobot dari masing-masing atribut menggunakan metode weight information gain. Setelah dihitung nilai bobot dilakukan pemilihan atribut, atribut yang dipilih menggunakan metode top-k. Kemudian dilakukan klasifikasi menggunakan algoritme Naïve Bayes. Hasil penelitian menunjukkan dari 20 atribut, terpilih top-9 tertinggi dengan nilai akurasi 85.57%. Dengan adanya penelitian ini dapat digunakan sebagai bahan pertimbangan dan pengambilan keputusan pada berbagai bidang yang berkaitan dengan metode feature selection, algoritme Naïve Bayes, dan di dalam memprediksi penyakit hepatitis.
Kata Kunci: data mining, weight information gain, algoritma Naïve Bayes

References

World Health Organization. (2017). Global Hepatitis Report.

L. Dey, S. Chakraborty, A. Biswas, B. Bose, and S. Tiwari, Sentiment analysis of review datasets using naïve bayes‘ and K-NN classifier, Int. J. Inf. Eng. Electron. Bus., vol. 8(4), pp. 54–62, 2016, [Online] doi: 10.5815/ijieeb.2016.04.07.

X. Wu and V. Kumar, The Top Ten Algorithm in Data Mining. Boca Raton: Taylor & Francis Group, 2009.

J. Chen, H. Huang, S. Tian, and Y. Qu, Feature selection for text classification with naïve bayes, Expert Syst. Appl., vol. 36(3) PART 1, pp. 5432–5435, 2009, [Online] doi: 10.1016/j.eswa.2008.06.054.

R. S. Wahono and N. S. Herman, Genetic feature selection for software defect prediction, Adv. Sci. Lett., vol. 20(1), pp. 239–244, 2014, [Online] doi: 10.1166/asl.2014.5283.

G. Chen and J. Chen, A novel wrapper method for feature selection and its applications, Neurocomputing, vol. 159(1), pp. 219–226, 2015, [Online] doi: 10.1016/j.neucom.2015.01.070.

V. Bolón-Canedo, I. Porto-Díaz, N. Sánchez-Maroño, and A. Alonso-Betanzos, A framework for cost-based feature selection, Pattern Recognit., vol. 47(7), pp. 2481–2489, 2014, [Online] doi: 10.1016/j.patcog.2014.01.008.

J. Han, M. Kamber, and J. Pei, Data Mining Concepts and Techniques, 3rd ed. Waltham: Elsevier Inc., 2012.

J. Suntoro and C. N. Indah, Average weight information gain untuk menangani data berdimensi tinggi menggunakan algoritma c4.5, Jurnal Buana Informatika, vol. 8(3), pp. 131–140, 2017.

Bustami, Penerapan algoritma naive bayes, J. Inform., vol. 8(1), pp. 884–898, 2014, [Online] doi: 10.1364/OFC.2009.OWD2.

C. W. Dawson, Projects in Computing and Information Systems, vol. 2. United States of America: Addison-Wesley, 2011.

Downloads

Published

2020-05-01