Implementasi LightGBM dengan KNN Imputation untuk Deteksi Dini Risiko Kehamilan

Implementation of LightGBM with KNN Imputation for Early Detection of Pregnancy Risk

Authors

  • Syahnur Alawiyah Universitas Islam Negeri Sunan Ampel Surabaya
  • Dian Yuliati Universitas Islam Negeri Sunan Ampel Surabaya
  • Nurissaida Ulinnuha Universitas Islam Negeri Sunan Ampel Surabaya

DOI:

https://doi.org/10.24002/jbi.v17i1.14454

Keywords:

Kehamilan Berisiko Tinggi, KNNI, LightGBM, SKCV

Abstract

Risiko kehamilan merupakan isu penting dalam kesehatan maternal yang berkontribusi pada tingginya angka kesakitan dan kematian ibu serta bayi, sehingga diperlukan metode analisis yang akurat untuk deteksi dini. Penelitian ini bertujuan untuk mengembangkan dan mengevaluasi model klasifikasi tingkat risiko kehamilan dengan menggunakan K-Nearest Neighbor Imputation (KNNI) untuk menangani missing value dan LightGBM sebagai metode utama. Model dioptimalkan melalui uji parameter dan dievaluasi menggunakan Stratified K-Fold Cross-Validation (SKCV). Hasil penelitian menunjukkan bahwa model yang diusulkan mampu mencapai akurasi sebesar 97,64%, sehingga menunjukkan kinerja yang sangat baik dalam mengklasifikasikan tingkat risiko kehamilan. Dengan demikian, pendekatan yang digunakan memiliki potensi untuk dikembangkan sebagai sistem pendukung keputusan dalam bidang kesehatan maternal.

 

Pregnancy risks are a critical issue in maternal health that contributes to high rates of maternal and infant morbidity and mortality; therefore, accurate analytical methods are needed for early detection. This study aims to develop and evaluate a pregnancy risk classification model using K-Nearest Neighbor Imputation (KNNI) to handle missing values and LightGBM as the primary method. The model was optimized through parameter tuning and evaluated using Stratified K-Fold Cross-Validation (SKCV). The results show that the proposed model achieved an accuracy of 97.64%, demonstrating excellent performance in classifying pregnancy risk levels. Thus, the approach used has the potential to be developed as a decision support system in the field of maternal health.

References

[1] A. Raza, H. U. R. Siddiqui, K. Munir, M. Almutairi, F. Rustam, and I. Ashraf, “Ensemble Learning-Based Feature Engineering to Analyze Maternal Health During Pregnancy and Health Risk Prediction,” PLoS One, vol. 17, no. 11, Nov. 2022, doi: 10.1371/journal.pone.0276525.

[2] J. Dol et al., “Timing of Neonatal Mortality and Severe Morbidity During the Postnatal Period: A Systematic Review,” JBI Evid. Synth., vol. 21, no. 1, pp. 98–199, Jan. 2023, doi: 10.11124/JBIES-21-00479.

[3] WHO, “Maternal Mortality.” Accessed: Oct. 01, 2025. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/maternal-mortality

[4] UNICEF, “Neonatal Mortality.” Accessed: Oct. 05, 2025. [Online]. Available: https://data.unicef.org/topic/child-survival/neonatal-mortality/

[5] Kementerian Kesehatan RI, Laporan Kinerja Kementrian Kesehatan 2024. Accessed: Oct. 25, 2025. [Online]. Available: https://repository.badankebijakan.kemkes.go.id/id/eprint/5852/1/LKj%20Kemenkes%202024.pdf

[6] BKKBN, “Laporan Kependudukan Indonesia 2024.” Accessed: Oct. 05, 2025. [Online]. Available: https://siperindu.online/2023/pb/unduh_file/Laporan%20Kependudukan%20Indonesia%20-%20IND.pdf

[7] F. Chen, L. Yu, J. Mao, Q. Yang, D. Wang, and C. Yu, “A Novel Data-Characteristic-Driven Modeling Approach for Imputing Missing Value in Industrial Statistics: A Case Study of China Electricity Statistics,” Appl. Energy, vol. 373, p. 123854, Nov. 2024, doi: 10.1016/j.apenergy.2024.123854.

[8] C. Lokker et al., “Boosting Efficiency in a Clinical Literature Surveillance System with LightGBM,” PLOS Digit Health, vol. 3, no. 9, p. e0000299, Sep. 2024, doi: 10.1371/journal.pdig.0000299.

[9] T. R. Noviandy, S. I. Nainggolan, R. Raihan, I. Firmansyah, and R. Idroes, “Maternal Health Risk Detection Using Light Gradient Boosting Machine Approach,” Infolitika J. Data Sci, vol. 1, no. 2, pp. 48–55, Dec. 2023, doi: 10.60084/ijds.v1i2.123.

[10] B. M. Kanber, A. Al Smadi, N. F. Noaman, B. Liu, S. Gou, and M. K. Alsmadi, “LightGBM: A Leading Force in Breast Cancer Diagnosis Through Machine Learning and Image Processing,” IEEE Access, vol. 12, pp. 39811–39832, 2024, doi: 10.1109/ACCESS.2024.3375755.

[11] T. Emmanuel, T. Maupong, D. Mpoeleng, T. Semong, B. Mphago, and O. Tabona, “A Survey on Missing Data in Machine Learning,” J. Big Data, vol. 8, no. 140, pp. 1–37, Oct. 2021, doi: 10.1186/s40537-021-00516-9.

[12] W. Sudrajat and I. Cholid, “K-Nearest Neighbor (K-NN) untuk Penanganan Missing Value pada Data UMKM,” JRSIT, vol. 1, no. 2, pp. 54–63, Nov. 2023, doi: 10.59407/jrsit.v1i2.77.

[13] X. Chen et al., “Cervical Cancer Detection Using K Nearest Neighbor Imputer and Stacked Ensemble Learning Model,” Digit. Health, vol. 9, p. 20552076231203800, Jan. 2023, doi: 10.1177/20552076231203802.

[14] K. R. Nanda, “Maternal Age and Risk of Pregnancy Complications: A Qualitative Study,” Advances in Healthcare Research, vol. 3, no. 2, pp. 132–147, May 2025, doi: 10.60079/ahr.v3i2.488.

[15] I. M. Hamdani, Nurhidayat, A. Karman, N. F. A. H, and A. H. Julyaningsih, “Edukasi dan Pelatihan Data Science dan Data Preprocessing,” Intisari, vol. 2, no. 1, pp. 19–26, Jun. 2024, doi: 10.58227/intisari.v2i1.125.

[16] D. Liang, X. Jin, Y. Yuan, and R. Zou, “Performance Analysis of Machine Learning Methods,” J. Phys. Conf. Ser., vol. 2428, no. 1, p. 012039, 2023, doi: 10.1088/1742-6596/2428/1/012039.

[17] A. Fadlil, Herman, and M. D. Praseptian, “K Nearest Neighbor Imputation Performance on Missing Value Data Graduate User Satisfaction,” Jurnal RESTI, vol. 6, no. 4, pp. 570–576, Aug. 2022, doi: 10.29207/RESTI.V6I4.4173.

[18] W. Wenny, “Normalisasi Data Kependudukan Dengan Model Min Max Dan Algoritma K-Means Untuk Pengelompokkan Tingkat Ekonomi Masyarakat,” Bios, vol. 2, no. 2, pp. 53–63, Apr. 2024, doi: 10.62866/bios.v2i2.141.

[19] A. A. G. A. Pranandita and I. M. Widiartha, “Optimasi Metode Support Vector Machine (SVM) Mengunakan Particle Swarm Optimization pada Permasalahan Klasifikasi Diabetes,” Jnatia, vol. 3, no. 4, pp. 879–888, Aug. 2025, doi: 10.24843/JNATIA.2025.V03.I04.P18.

[20] S. Szeghalmy and A. Fazekas, “A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning,” Sensors, vol. 23, no. 4, Feb. 2023, doi: 10.3390/s23042333.

[21] D. Wijayanto and B. P. Hartato, “Analisis Perbandingan Performa Algoritma XGBoost dan LightGBM pada Klasifikasi Kanker Payudara,” The Indonesian Journal of Computer Science, vol. 13, no. 2, Apr. 2024, doi: 10.33022/ijcs.v13i2.3901.

[22] R. R. Adhitya, W. Witanti, and R. Yuniarti, “Perbandingan Metode CART Dan Naïve Bayes Untuk Klasifikasi Customer Churn,” INFOTECH journal, vol. 9, no. 2, pp. 307–318, Jul. 2023, doi: 10.31949/infotech.v9i2.5641.

[23] B. S. W. Poetro, S. Mulyono, and V. A. Pramesti, “Prediksi Penyakit Batu Ginjal dengan Menerapkan Convolutional Neural Network,” Jurnal Buana Informatika, vol. 15, no. 2, pp. 153–162, Oct. 2024, [Online]. Available: https://ojs.uajy.ac.id/index.php/jbi/article/view/9838

[24] M. U. Mojumdar et al., “Maternal Health Risk Assessment Dataset,” Mendeley Data. Accessed: Oct. 14, 2025. [Online]. Available: https://data.mendeley.com/datasets/p5w98dvbbk/1

[25] K. Muludi, R. Setianingsih, R. Sholehurrohman, and A. Junaidi, “Exploiting Nearest Neighbor Data and Fuzzy Membership Function to Address Missing Values in Classification,” PeerJ Comput. Sci., vol. 10, p. e1968, Mar. 2024, doi: 10.7717/peerj-cs.1968.

[26] S. Widodo, H. Brawijaya, and S. Samudi, “Stratified K-Fold Cross Validation Optimization on Machine Learning for Prediction,” SinkrOn, vol. 7, no. 4, pp. 2407–2414, Oct. 2022, doi: 10.33395/sinkron.v7i4.11792.

[27] L. Deng, K. Lu, and H. Hu, “An Interpretable LightGBM Model for Predicting Coronary Heart Disease: Enhancing Clinical Decision-Making with Machine learning,” PLoS One, vol. 20, no. 9 September, Sep. 2025, doi: 10.1371/journal.pone.0330377.

[28] J. Park and E. Hwang, “A Two-Stage Multistep-Ahead Electricity Load Forecasting Scheme Based on LightGBM and Attention-BilSTM,” Sensors, vol. 21, no. 22, Nov. 2021, doi: 10.3390/s21227697.

[29] E. Ramadanti, D. A. Dinathi, C. Sri, K. Aditya, and R. Chandranegara, “Diabetes Disease Detection Classification Using Light Gradient Boosting (LightGBM) With Hyperparameter Tuning,” SinkrOn, vol. 8, no. 2, pp. 956–963, Mar. 2024, doi: 10.33395/sinkron.v8i2.13530.

[30] K. T. Nguyen, T. N. Tran, and H. T. Nguyen, “Research on the Influence of Hyperparameters on the LightGBM Model in Load Forecasting,” Eng. Technol. Appl. Sci. Res., vol. 14, no. 5, pp. 17005–17010, Oct. 2024, doi: 10.48084/etasr.8266.

[31] M. Arif, “Explainable AI in Maternal Health: Utilizing XGBoost and SHAP Values for Enhanced Risk Prediction and Interpretation,” Int. J. Emerg. Multidiscip.: Comput. Sci. Artif. Intell., vol. 4, no. 1, p. 16, Apr. 2025, doi: 10.54938/ijemdcsai.2025.04.1.419.

Downloads

Published

2026-04-28