Male Fertility Classification using Machine Learning and Oversampling Techniques

Authors

  • Aloysius Gonzaga Pradnya Sidhawara Universitas Atma Jaya Yogyakarta

DOI:

https://doi.org/10.24002/jbi.v15i1.8718

Keywords:

male fertility, classification, machine learning, SMOTE, ADASYN, kesuburan laki-laki, klasifikasi, pembelajaran mesin

Abstract

Machine learning methods have been applied to male fertility diagnosis in recent years. Through early infertility case detection, this technology application offers potential benefits to the medical field. This study presents an experimental investigation that examines the prospect of using the oversampling technique and feature selection to enhance the performance of shallow classifiers to classify male fertility on the Fertility Dataset. Two oversampling techniques (SMOTE and ADASYN), two different scalers (MinMax and Standard), and two different feature selection methods (SelectKBest and SelectFromModel) were used to improve the performance of the classifier. The results show that the performance of machine learning models is better on the oversampled dataset than the original dataset. Random Forest performed best on the SMOTE test set with 90% accuracy, 89% and 100% Recall in Normal and Altered classes, respectively. Accidents or trauma, Age, and High Fevers features are selected by SelectKBest, and considered as factors that contribute to male fertility in prior studies.

References

Sexual and Reproductive Health and Research (SRH), “Infertility Prevalence Estimates, 1990–2021,” World Health Organization, Apr. 2023.

J. Fainberg and J. Kashanian, “Recent advances in understanding and managing male infertility,” F1000Research, vol. 8, 2019, doi: 10.12688/f1000research.17076.1.

F. Damayanti, M. Hakimi, M. Anwar, and D. A. Puspandari, “Psychometric properties of the Indonesian online version of fertility quality of life tool: a cross-sectional study,” International Journal of Community Medicine and Public Health, vol. 8, p. 2768, 2021, doi: 10.18203/2394-6040.IJCMPH20211944.

A. Guyansyah et al., “Primary infertility of male and female factors, polycystic ovary syndrome and oligoasthenoteratozoospermia dominate the infertile population in agricultural and industrial areas in Karawang Regency, West Java Province, Indonesia,” Bali Medical Journal, 2021, doi: 10.15562/bmj.v10i1.2281.

M. Bendayán and F. Boitrelle, “COVID-19: semen impairment may not be related to the virus,” Human Reproduction (Oxford, England), 2021, doi: 10.1093/humrep/deab082.

M. Nistal, R. Paniagua, P. Gónzalez‐Peramato, and M. Reyes-Múgica, “Perspectives in Pediatric Pathology, Chapter 23. Testicular Pathology Secondary to Physical and Chemical Injury,” Pediatric and Developmental Pathology, vol. 19, pp. 452–459, 2016, doi: 10.2350/16-04-1811-PB.1.

A. Syarif and F. R. Lumbanraja, “SYSTEMATIC REVIEW: PERKEMBANGAN MACHINE LEARNING PADA SPERMA MANUSIA,” 2023. [Online]. Available: https://ejurnal.teknokrat.ac.id/index.php/teknoinfo/index

D. Gil and J. Girela, “Fertility.” UCI Machine Learning Repository, 2013. doi: https://doi.org/10.24432/C5Z01Z.

R. Yepriyanto and Y. Retno Wahyu Utami, “SISTEM DIAGNOSA KESUBURAN SPERMA DENGAN METODE K-NEAREST NEIGHBOR (K-NN),” Jurnal Ilmiah SINUS, vol. 13, no. 2, pp. 33–44, 2015.

T. W. Pratiwi and T. Arifin, “Optimasi Decision Tree Menggunakan Particle Swarm Optimization untuk Klasifikasi Kesuburan pada Pria,” SISTEMASI: Jurnal Sistem Informasi, vol. 10, no. 1, pp. 1–12, 2021.

A. Rahman Hakim, D. Marini Umi Atmaja, A. Basri, and A. Ariyanto, “Performance Analysis of Classification and Regression Tree (CART) Algorithm in Classifying Male Fertility Levels with Mobile-Based,” JOURNAL OF TECH-E, vol. 7, no. 1, pp. 10–20, 2023.

H. Harafani and A. Maulana, “Penerapan Algoritma Genetika pada Support Vector Machine Sebagai Pengoptimasi Parameter untuk Memprediksi Kesuburan,” Jurnal Teknik Informatika, vol. 5, no. 1, pp. 51–59, 2019.

A. Prabowo, S. Wardani, R. Wijaya Dewantoro, and W. Wesly, “Komparasi Tingkat Akurasi Random Forest dan Decision Tree C4.5 Pada Klasifikasi Data Penyakit Infertilitas,” Media Online), vol. 4, no. 1, pp. 218–224, 2023, doi: 10.30865/klik.v4i1.1115.

U. Khaira, N. Syarief, and I. Hayati, “Prediksi Tingkat Fertilitas Pria Dengan Algoritma Pohon Keputusan Cart,” Jurnal Ilmiah Umum dan Kesehatan Aisyiyah, vol. 5, no. 1, 2020, [Online]. Available: https://download.garuda.kemdikbud.go.id/article.php?article=3026906&val=27399&title=Prediksi%20Tingkat%20Fertilitas%20Pria%20Dengan%20Algoritma%20Pohon%20Keputusan%20Cart

D. Gil, J. L. Girela, J. De Juan, M. J. Gomez-Torres, and M. Johnsson, “Predicting seminal quality with artificial intelligence methods,” Expert Systems with Applications, vol. 39, no. 16, pp. 12564–12573, Nov. 2012, doi: 10.1016/j.eswa.2012.05.028.

A. Fernández, S. García, F. Herrera, and N. Chawla, “SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary,” J. Artif. Intell. Res., vol. 61, pp. 863–905, 2018, doi: 10.1613/jair.1.11192.

N. Nnamoko and I. Korkontzelos, “Efficient treatment of outliers and class imbalance for diabetes prediction,” Artificial intelligence in medicine, vol. 104, p. 101815, 2020, doi: 10.1016/j.artmed.2020.101815.

C. Liu and L. Zhu, “A two-stage approach for predicting the remaining useful life of tools using bidirectional long short-term memory,” Measurement, vol. 164, p. 108029, 2020, doi: 10.1016/j.measurement.2020.108029.

Q. Wang, W. Cao, J. Guo, J. Ren, Y. Cheng, and D. Davis, “DMP_MI: An Effective Diabetes Mellitus Classification Algorithm on Imbalanced Data With Missing Values,” IEEE Access, vol. 7, pp. 102232–102238, 2019, doi: 10.1109/ACCESS.2019.2929866.

J. Cai, J. Luo, S. Wang, and S. Yang, “Feature selection in machine learning: A new perspective,” Neurocomputing, vol. 300, pp. 70–79, 2018, doi: 10.1016/j.neucom.2017.11.077.

D. A. Otchere, T. Ganat, J. Ojero, B. N. Tackie-Otoo, and M. Y. Taki, “Application of gradient boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation predictions,” Journal of Petroleum Science and Engineering, 2021, doi: 10.1016/J.PETROL.2021.109244.

A. K. Srivastava, D. Singh, A. S. Pandey, and T. Maini, “A Novel Feature Selection and Short-Term Price Forecasting Based on a Decision Tree (J48) Model,” Energies, 2019, doi: 10.3390/en12193665.

B. Karlik, A. M. Yibre, and B. Koçer, “Comprising Feature Selection and Classifier Methods with SMOTE for Prediction of Male Infertility,” 2016. [Online]. Available: https://api.semanticscholar.org/CorpusID:204826876

N. Cahyana, S. Khomsah, and A. Aribowo, “Improving Imbalanced Dataset Classification Using Oversampling and Gradient Boosting,” 2019 5th International Conference on Science in Information Technology (ICSITech), pp. 217–222, 2019, doi: 10.1109/ICSITech46713.2019.8987499.

X. Tan et al., “Wireless Sensor Networks Intrusion Detection Based on SMOTE and the Random Forest Algorithm,” Sensors (Basel, Switzerland), vol. 19, 2019, doi: 10.3390/s19010203.

G. A. Pradipta, R. Wardoyo, A. Musdholifah, I. Sanjaya, and M. Ismail, “SMOTE for Handling Imbalanced Data Problem : A Review,” 2021 Sixth International Conference on Informatics and Computing (ICIC), pp. 1–8, 2021, doi: 10.1109/ICIC54025.2021.9632912.

K. Davagdorj, J. S. Lee, V.-H. Pham, and K. Ryu, “A Comparative Analysis of Machine Learning Methods for Class Imbalance in a Smoking Cessation Intervention,” Applied Sciences, vol. 10, p. 3307, 2020, doi: 10.3390/app10093307.

S. Benbelkacem and B. Atmani, “Random Forests for Diabetes Diagnosis,” 2019 International Conference on Computer and Information Sciences (ICCIS), pp. 1–4, 2019, doi: 10.1109/ICCISCI.2019.8716405.

R. Buettner, M. Hirschmiller, K. Schlosser, M. Rössle, M. Fernandes, and I. Timm, “High-performance exclusion of schizophrenia using a novel machine learning method on EEG data,” 2019 IEEE International Conference on E-health Networking, Application & Services (HealthCom), pp. 1–6, 2019, doi: 10.1109/HealthCom46333.2019.9009437.

N. M. Abdulkareem and A. Abdulazeez, “Machine Learning Classification Based on Radom Forest Algorithm: A Review,” vol. 5, pp. 128–142, 2021, doi: 10.5281/ZENODO.4471118.

A. Sarica, A. Cerasa, and A. Quattrone, “Random Forest Algorithm for the Classification of Neuroimaging Data in Alzheimer’s Disease: A Systematic Review,” Frontiers in Aging Neuroscience, vol. 9, 2017, doi: 10.3389/fnagi.2017.00329.

M. Shojaeizadeh, S. Djamasbi, R. C. Paffenroth, and A. C. Trapp, “Detecting task demand via an eye tracking machine learning system,” Decision Support Systems, vol. 116, no. June 2018, pp. 91–101, 2019, doi: 10.1016/j.dss.2018.10.012.

R. Mora, J. Nabhani, T. Bakare, R. Khouri, and M. Samplaski, “The effect of testicular trauma on male infertility.,” Human fertility, pp. 1–6, 2022, doi: 10.1080/14647273.2022.2135464.

K. Gill, J. Jakubik-Uljasz, A. Rosiak-Gill, M. Grabowska, M. Matuszewski, and M. Piasecka, “Male aging as a causative factor of detrimental changes in human conventional semen parameters and sperm DNA integrity,” The Aging Male, vol. 23, pp. 1321–1332, 2020, doi: 10.1080/13685538.2020.1765330.

O. A. Oluwayiose et al., “Sperm DNA methylation mediates the association of male age on reproductive outcomes among couples undergoing infertility treatment,” Scientific Reports, vol. 11, 2021, doi: 10.1038/s41598-020-80857-2.

M. Bendayan and F. Boitrelle, “What could cause the long-term effects of COVID-19 on sperm parameters and male fertility?,” QJM, vol. 114, no. 4, p. 287, Jul. 2021, doi: 10.1093/qjmed/hcab028.

L. Boeri et al., “Heavy cigarette smoking and alcohol consumption are associated with impaired sperm parameters in primary infertile men,” Asian Journal of Andrology, vol. 21, pp. 478–485, 2019, doi: 10.4103/aja.aja_110_18.

S. S. Ramgir and V. Abilash, “Impact of Smoking and Alcohol Consumption on Oxidative Status in Male Infertility and Sperm Quality,” Indian Journal of Pharmaceutical Sciences, 2019, doi: 10.36468/pharmaceutical-sciences.588.

Downloads

Published

2024-04-01