Top-k Feature Selction Untuk Deteksi Penyakit Hepatitis Menggunakan Algoritme Naïve Bayes
DOI:
https://doi.org/10.24002/jbi.v11i1.2456Abstract
Abstract. Becoming one of the society health problems in the world, hepatitis is an inflammation liver disease caused by a virus, bacterial infection, chemical substances including drugs and alcohol. In this research, for the dataset of hepatitis having high dimensionality, its value for each attribute was calculated using weight information gain method. Then, the attributes were selected by using top-k methods and were classified by using Naïve Bayes Algorithm respectively. This research showed that 9 out of 20 attributes had chosen to be the highest top-9 with an accuracy rate of 85.57%. Later on, this research can be useful for a consideration in a decision making process for various subjects related to feature selection and Naïve Bayes Algorithm method and also for predicting hepatitis.
Keywords: data mining, weight information gain, Naïve Bayes algorithm
Abstrak. Penyakit hepatitis merupakan masalah kesehatan masyarakat di dunia. Penyakit hepatitis merupakan penyakit peradangan hati yang disebabkan oleh virus, infeksi bakteri, zat-zat kimia termasuk obat-obatan dan alkohol. Pada penelitian ini, dataset hepatitis yang memiliki data berdimensi tinggi akan dihitung nilai bobot dari masing-masing atribut menggunakan metode weight information gain. Setelah dihitung nilai bobot dilakukan pemilihan atribut, atribut yang dipilih menggunakan metode top-k. Kemudian dilakukan klasifikasi menggunakan algoritme Naïve Bayes. Hasil penelitian menunjukkan dari 20 atribut, terpilih top-9 tertinggi dengan nilai akurasi 85.57%. Dengan adanya penelitian ini dapat digunakan sebagai bahan pertimbangan dan pengambilan keputusan pada berbagai bidang yang berkaitan dengan metode feature selection, algoritme Naïve Bayes, dan di dalam memprediksi penyakit hepatitis.
Kata Kunci: data mining, weight information gain, algoritma Naïve Bayes
References
World Health Organization. (2017). Global Hepatitis Report.
L. Dey, S. Chakraborty, A. Biswas, B. Bose, and S. Tiwari, Sentiment analysis of review datasets using naïve bayes‘ and K-NN classifier, Int. J. Inf. Eng. Electron. Bus., vol. 8(4), pp. 54–62, 2016, [Online] doi: 10.5815/ijieeb.2016.04.07.
X. Wu and V. Kumar, The Top Ten Algorithm in Data Mining. Boca Raton: Taylor & Francis Group, 2009.
J. Chen, H. Huang, S. Tian, and Y. Qu, Feature selection for text classification with naïve bayes, Expert Syst. Appl., vol. 36(3) PART 1, pp. 5432–5435, 2009, [Online] doi: 10.1016/j.eswa.2008.06.054.
R. S. Wahono and N. S. Herman, Genetic feature selection for software defect prediction, Adv. Sci. Lett., vol. 20(1), pp. 239–244, 2014, [Online] doi: 10.1166/asl.2014.5283.
G. Chen and J. Chen, A novel wrapper method for feature selection and its applications, Neurocomputing, vol. 159(1), pp. 219–226, 2015, [Online] doi: 10.1016/j.neucom.2015.01.070.
V. Bolón-Canedo, I. Porto-Díaz, N. Sánchez-Maroño, and A. Alonso-Betanzos, A framework for cost-based feature selection, Pattern Recognit., vol. 47(7), pp. 2481–2489, 2014, [Online] doi: 10.1016/j.patcog.2014.01.008.
J. Han, M. Kamber, and J. Pei, Data Mining Concepts and Techniques, 3rd ed. Waltham: Elsevier Inc., 2012.
J. Suntoro and C. N. Indah, Average weight information gain untuk menangani data berdimensi tinggi menggunakan algoritma c4.5, Jurnal Buana Informatika, vol. 8(3), pp. 131–140, 2017.
Bustami, Penerapan algoritma naive bayes, J. Inform., vol. 8(1), pp. 884–898, 2014, [Online] doi: 10.1364/OFC.2009.OWD2.
C. W. Dawson, Projects in Computing and Information Systems, vol. 2. United States of America: Addison-Wesley, 2011.
Downloads
Published
Issue
Section
License
Copyright of this journal is assigned to Jurnal Buana Informatika as the journal publisher by the knowledge of author, whilst the moral right of the publication belongs to author. Every printed and electronic publications are open access for educational purposes, research, and library. The editorial board is not responsible for copyright violation to the other than them aims mentioned before. The reproduction of any part of this journal (printed or online) will be allowed only with a written permission from Jurnal Buana Informatika.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.