Pengembangan Model Hybrid Efficientnet–Vision Transformer Untuk Diagnosis Penyakit Gigi-Mulut Berbasis Citra

Authors

  • Juvenus Universitas Atma Jaya Yogyakarta
  • Aloysius Gonzaga Pradnya Sidhawara Univeristas Atma Jaya Yogyakarta
  • Patricia Ardanari Univeristas Atma Jaya Yogyakarta

DOI:

https://doi.org/10.24002/jiaj.v7i1.14150

Keywords:

Oral and Dental Diseases, Image Classification, EfficientNet, Vision Transformer, CutMix Augmentation, Penyakit Gigi dan Mulut, Klasifikasi Citra, EfficientNet, Vision Transformer, CutMix Augmentation

Abstract

Oral and dental diseases are health problems that require early detection to prevent further complications, while manual image-based diagnosis remains prone to subjectivity and interpretation errors. This study aims to design and develop a deep learning model based on a hybrid EfficientNet–Vision Transformer approach to accurately and consistently classify images of oral and dental diseases. The hybrid model employs EfficientNet as a local feature extractor and a Vision Transformer to capture global contextual information. The dataset consists of six disease classes, each containing 1,800 images. Training was conducted using the Adam optimizer with a learning rate of 0.0001, early stopping, and CutMix data augmentation, with EfficientNet-B3 and a standalone Vision Transformer used as baseline comparators. Evaluation results demonstrate that the hybrid model achieves the highest accuracy of 93.69% with stable and well-balanced performance, indicating its potential as a diagnostic decision support system, despite remaining limitations related to dataset size and distribution.

 

Penyakit gigi dan mulut merupakan masalah kesehatan yang memerlukan deteksi dini untuk mencegah komplikasi lebih lanjut, sementara diagnosis berbasis citra secara manual masih rentan terhadap subjektivitas dan kesalahan interpretasi. Penelitian ini bertujuan untuk merancang dan mengembangkan model deep learning berbasis pendekatan hybrid EfficientNet–Vision Transformer guna melakukan klasifikasi citra penyakit gigi dan mulut secara akurat dan konsisten. Model hybrid memanfaatkan EfficientNet sebagai ekstraktor fitur lokal dan Vision Transformer untuk menangkap konteks global. Dataset terdiri dari enam kelas penyakit dengan masing-masing 1.800 citra. Pelatihan dilakukan menggunakan optimizer Adam, learning rate 0,0001, early stopping, serta teknik data augmentation CutMix, dengan EfficientNet-B3 dan Vision Transformer tunggal sebagai pembanding. Hasil evaluasi menunjukkan bahwa model hybrid mencapai akurasi tertinggi sebesar 93,69% dengan performa yang stabil dan seimbang, sehingga berpotensi digunakan sebagai sistem pendukung diagnosis, meskipun masih memiliki keterbatasan pada jumlah dan distribusi dataset.

References

[1] Kementerian Kesehatan Republik Indonesia, “Factsheet Kesehatan Gigi dan Mulut (Gilut),” 2023, Badan Kebijakan Pembangunan Kesehatan. [Online]. Available: https://repository.badankebijakan.kemkes.go.id/id/eprint/5534/1/04 factsheet Gilut_bahasa.pdf

[2] A. P. Sinaga, A. R. Ismail, M. A. P. Siregar, and I. D. Saraswati, “Korelasi Disparitas Ketersediaan Tenaga Medis Gigi Antardaerah Terhadap Pemanfaatan Layanan Gigi dan Mulut di Indonesia,” J. Manaj. Pelayanan Kesehat., vol. 25, no. 4, pp. 217–224, 2022, [Online]. Available: https://www.researchgate.net/publication/366535008_Korelasi_Disparitas_Ketersediaan_Tenaga_Medis_Gigi_Antardaerah_Terhadap_Pemanfaatan_Layanan_Gigi_dan_Mulut_di_Indonesia

[3] M. Harahap and A. M. Husein, “Peneraparan Efficient-Net dalam mengklasifikasi kanker kulit,” Pros. Semin. Nas. Ilmu Komput., Jul. 2024, [Online]. Available: https://jurnal.unprimdn.ac.id/index.php/isbn/article/view/5405

[4] J. Yang et al., “Focal self-attention for Local-Global Interactions in Vision Transformers,” arXiv Prepr. arXiv2107.00641, Jul. 2021, [Online]. Available: https://arxiv.org/abs/2107.00641

[5] S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y. Yoo, “CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features,” arXiv Prepr. arXiv1905.04899, May 2019, [Online]. Available: https://arxiv.org/abs/1905.04899

[6] W. Wahyuningsih, G. S. Nugraha, and R. Dwiyansaputra, “Classification Of Dental Caries Disease In Tooth Images Using A Comparison Of Efficientnet-B0, Mobilenetv2, Resnet-50, Inceptionv3 Architectures,” J. Tek. Inform., vol. 5, no. 4, pp. 177–185, Jul. 2024.

[7] F. Lavenia, C. M. S. Ramdani, and I. Hoeronis, “Klasifikasi Penyakit Pulpitis Pada Citra Radiografi Periapikal Menggunakan Metode Convolutional Neural Network (CNN),” Media J. Inform., vol. 16, no. 1, Jun. 2024, doi: 10.35194/mji.v16i1.4098.

[8] A. N. Pratama, “Sistem Klasifikasi Penyakit Kulit pada Manusia Convolutional Neural Network (CNN) EfficientNet B2,” Paradig. - J. Komput. dan Inform., vol. 30, no. 2, Jun. 2024, doi: 10.33503/paradigma.v30i2.439.

[9] R. R. Ar, “pendeteksian dini stunting pada balita menggunakan vision transformer (vit) berbasis citra tubuh,” J. Inform. dan Tek. Elektro Terap., vol. 13, no. 3S1, 2025, doi: 10.23960/jitet.v13i3s1.7888.

[10] I. Yudistiansyah, “Penerapan Computer Vision Untuk Klasifikasi Penyakit Mata Menggunakan Arsitektur Vision Transformers (Vits) Pada Citra Fundus,” Nusa Putra University, 2025.

[11] S. Sajid, “Oral Disease Dataset.” [Online]. Available: https://www.kaggle.com/datasets/salmansajid05/oral-diseases/data

[12] O. Rainio, J. Teuho, and R. Klén, “Evaluation metrics and statistical tests for machine learning,” Sci. Rep., vol. 14, no. 1, pp. 1–14, 2024, doi: 10.1038/s41598-024-56706-x.

[13] G. Airlangga, “Predicting Diabetes with Machine Learning: Evaluating Tree-Based and Ensemble Models with Custom Metrics and Statistical Validation,” Build. Informatics, Technol. Sci., vol. 6, no. 3, pp. 1818–1827, 2024, doi: 10.47065/bits.v6i3.6419.

[14] A. Singh, S. P. Mishra, P. Singh, and A. Srivastava, “VISNET: An Efficient Light Weighted Hybrid Model for Early Detection of Breast Tumour in Ultrasound Images using Vision Transformer and Convolutional Neural Networks,” J. Inf. Syst. Eng. Manag., 2025, doi: 10.52783/jisem.v10i40s.9215.

[15] B. Sathyanarayana, S. Alampally, R. Akella, and V. V. R. Indugu, “ColoViT: a synergistic integration of EfficientNet and vision transformers for advanced colon cancer detection,” J. Cancer Res. Clin. Oncol., vol. 151, no. 7, pp. 1–19, 2025, doi: 10.1007/s00432-025-06199-6.

[16] J. F. M. Pereira, J. F. Mari, and L. H. F. P. Silva, “Exploiting Data Augmentation Strategies to Improve the Classification of Spinal Disorders in X-Ray Images,” Rev. Inform. Teor. e Apl., vol. 32, no. 1, pp. 257–264, 2025, doi: 10.22456/2175-2745.143521.

[17] X. Qi et al., “MediAug: Exploring Visual Augmentation in Medical Imaging,” Lect. Notes Comput. Sci., vol. 15916 LNCS, pp. 218–232, 2026, doi: 10.1007/978-3-031-98688-8_16.

Downloads

Published

2026-05-30