(Performance Comparison of Deep Learning Models to Detect Covid-19

The SARS-Cov-2 outbreak caused by a coronavirus infection shocked dozens of countries. This disease has spread rapidly and become a serious threat. It even destroys various sectors of life. Along with technological developments, various deep learning models have been developed to classify between Normal and Covid-19 from chest X-ray images, such as Inception V3, Inception V4, and MobileNet. These models have been separately reported to perform good classification on Covid-19. However, there is no comparison of their performance in classifying Covid-19 on the same data. This research aims to compare the performance of the three mentioned deep learning models in classifying Covid-19 based on X-ray images. The methods involve data collection, pre-processing, training, and testing using the three models. From 2,169 datasets, the three models can classify the Covid-19 based on X-ray images. The result showed that the MobileNet model achieved the best performance with an average accuracy of 99.67%, precision of 99.77%, recall 99.38%, specificity 99.38%, and f-score 99.67%. For the Inception V3 model, it was obtained an average accuracy of 99.62%, the precision of 99.65%, recall value 99.5%, specificity 99.5%, and f-score 99.52%; while the InceptionV4 model obtained an average accuracy of 97.79%, the precision of 98.11%, recall of 90.18%, specificity of 90.18%, and f-score of 97.25%. Furthermore, the MobileNet model showed stable performance in achieving graphic results since it had extensive layers. The more layers the model has, the better the accuracy is obtained.


Introduction
Covid-19 (Coronavirus Disease 2019) is a group of pneumonia cases with unknown causes. It was firstly identified in Hubei province, China, in December 2019. The new type of coronavirus has spread rapidly to become a new pandemic. The disease symptoms are usually fever, cough, shortness of breath, and fatigue [1], [2], [3]. The spread of the Covid-19 disease is a severe threat to countries globally that can disrupt and even destroy various sectors of life [4], [5].
Along with technological developments, an examination for Covid-19 detection has utilized imaging modalities such as X-ray, CT-Scan images, and an artificial neural network [6], [7]. One type of artificial neural network used in the Covid-19 detection is deep learning, and the most popular deep learning algorithm is Convolution Neural Network (CNN) [8], [9], [10], [11]. Deep learning creates a model of Covid- 19 and Normal X-ray images, then uses it to classify the X-ray images. So, the application of deep learning helps detect the Covid-19 disease early on [12].
Researchers have initiated the analysis and the detection of Covid-19 using deep learning X-Ray images since 2020. The researchers used Convolutional Neural Network models including VGG16, VGG19, InceptionV3, MobileNetV2, ResNet50, and DenseNet121 to classify Normal, Pneumonia, and Covid-19 chest X-ray images [6]. They classified the images into two groups, namely Covid-19 or Normal, Covid-19 or Pneumonia, and Normal or Pneumonia, using thousands of publicly available chest X-ray images. Among the models, Inception V3 and MobileNet performed good classification results. Another research group proposed a pre-trained model CNN using ResNet50, InceptionV3, and Inception_ResNetV2 [13]. Since the dataset was only 50 Covid-19 and 50 Normal X-ray images, they included transfer learning techniques to overcome the limited dataset. Further research on Covid-19 used the CNN pre-trained model VGG16 and InceptionV3. This research intended to develop a fast, accurate, and low-cost diagnostic system to detect Covid-19 using chest X-Ray. The dataset consisted of 2,905 chest X-ray images with 219 confirmed positive cases of Covid-19, 1345 positive pneumonia, and 1,361 normal images. The result showed that the Inception V3 model provided the highest accuracy of 99.35% for the two binary classifications (Normal vs Covid-19 and Covid-19 vs pneumonia) compared to the VGG16 model accuracy of 97.71% [14].
The Inception V3 and MobileNet are two deep learning models which perform good classification results among the deep learning models [6], [13], [14]. In the advancement of Inception V3, the Inception V4 simplifies the Inception V3 with a more uniform architecture [15]. However, a performance comparison between them on the same dataset to classify Covid-19 was not reported. Therefore, this study aims to compare their performance in terms of accuracy, precision, recall, specificity, and F-score, using the same X-ray images dataset.

Deep Learning Framework
Deep Learning is an artificial intelligence that utilizes artificial neural networks to learn the characteristics of large datasets and provides a very robust architecture for supervised learning. In machine learning, there are techniques for using feature extraction from training data and unique learning algorithms to classify images and recognize sounds [16]. Deep learning processing requires datasets and larger datasets can give more accurate results.
InceptionV3 is a convolutional neural network architecture to analyze an image and detect an object. Compared to the previous version, Inception V3 focuses on increasing computational efficiency on the number of parameters and hardware resources cost. Figure 1 shows the Inception V3 architecture including the step-by-step parts: (1) factorized convolution to increase computational efficiency, (2) smaller scale convolutions to train faster, (3) asymmetric convolutions to reduce the number of parameters, (4) an auxiliary classifier which is a small CNN regulator inserted between layers, and (5) grid resize reduction [17], [18]. The complete architecture is shown in Figure 2 [17].  Figure 1.
Step-by-step development of Inception V3: (a) smaller scale convolutions for faster training, (b) asymmetric convolutions to reduce the number of parameters, (c) an auxiliary classifier which is a small CNN regulator inserted between layers, and (d)grid resize reduction [17], [18]. InceptionV4 is a pure artificial neural network architecture with no residual connections with approximately the same image recognition performance as Inception-ResNet-v2. Inception V4 has a simple architecture with more uniform modules than Inception V3 as shown in Figure 3 [19]. Inception V4 has three modules, namely Inception-A, Inception-B, and Inception-C, which continues by average pooling and ends with a fully connected layer as a classification layer. MobileNet is a deep learning architecture that can address a large amount of data. The essential MobileNet architecture is the use of a particular layer called depth-wise separable convolution, which reduces complexity and reduces parameters to produce a larger model, as shown in Figure 4 [20].

Method
This research involved five main stages: data collection, pre-processing, training, testing, and result analysis. The data used in this study were X-Ray image data obtained from kaggle.com. The data obtained amounted to 2,169 X-Ray images that were divided into two types, namely Covid-19 and Normal. Pre-processing is a stage of image preparation to make the data appropriate for the training and testing. It included image resizing to 224x224 pixels and folding images into 5-fold datasets for training and testing. A k-fold is a method to evaluate the model performance or algorithm [21]. The training stage created a deep learning model of the Covid-19 classifier. The configuration included a stochastic gradient descent optimizer with LR as much as 1e-4, momentum as much as 0.9, and loss using categorical_crossentropy. The training process also used an epoch of 100 and a batch size of 32. The model was then saved and used for a validation process. When the training was conducted, the validation was carried out based on 5-fold using a 20% dataset to confirm whether the training models were good or not.
The analysis in this study compared the performance of the training, validation and testing based on three models using the confusion matrix. The confusion matrix was calculated based on two classes, namely Covid-19 and Normal, shown in Table 1, where the comparison of the performance matrix included the parameters of accuracy, precision, recall, specificity, and f-score following 2. The parameters used in the analysis were equalizing all classes in each fold. The higher the value obtained, the better the model performed. For the training, the epoch value of graphic images and accuracy were compared. Meanwhile, for the testing, the results of all performance parameters were compared. The higher the value of performance matrix, the better and more efficient the performance generated by the model.

Result and Discussion
The dataset was divided into 1,388 images for training, 347 images for validation, and 434 for testing. All dataset was used for training, validation and testing of each model. The validation results on 347 images using each model are shown in Table 3. The validation of the training accuracy resulted in 99.68% for Inception V3 and MobileNet, whereas the value slightly decreased to 97.93% for Inception V4. For the training loss, the validation resulted in 0.013%, 0.072%, and 0.012% for Inception V3, Inception V4, and MobileNet respectively. According to the accuracy and loss metric in validation process, Inception V3 and MobileNet have better training results than Inception V4. It may be affected by the simpler architecture of Inception V4 than Inception V3 and MobileNet.  Figure 5 shows the validation graphs for training accuracy and loss for Inception V3, Inception V4, and MobileNet models. Inception V3 and MobileNet performed excellent and stable validation results on training accuracy and loss, while Inception V4 showed overshoot along with the epoch. Figure 5 is in accordance with the numeric result in Table 3. Since the Inception V4 has simpler architecture than Inception V3, the Inception V4 has small scale convolution and reduced grid size to perform faster training. In addition, the auxiliary classifier between layers in Inception V4 produce more stable training result.  Table 4 presents the performance matrix of InceptionV3 testing on 343 images. The results showed that the largest value of the performance parameters was at fold = 3 (Precision and Specificity 100%) while the smallest was at fold = 5 (Precision, Recall and F-Score 99.20%). The results of the InceptionV3 calculation showed the average accuracy of 99.62%, precision of 98.94%, recall of 99.69%, specificity of 99.62%, and f-score of 99.30%. According to the testing result, the Inception V3 showed an excellent ability to classify Covid-19 and Normal X-ray images. Compared to Inception V3, Inception V4 performed slightly lower results, as shown in Table 5. It resulted accuracy of 97.79%, precision 98.76%, recall 98.36%, specificity 98.17%, and F-score 95.92%. The superior results were obtained from the MobileNet model, as shown in Table  6. This model showed almost perfect ability indicated by the parameters' average with 99.67% accuracy, 98.76% precision, 100% recall, 99.55% specificity, and 99.37% F-score. The five parameters consistently agree with the superior ability of MobileNet to classify the data. This mostly caused by the depth-wise separable convolution which effectively handles a large amount of data training. The ability of each model to classify the Covid-19 and Normal datasets is shown in Table 7. The table showed that Inception V3 and MobileNet had higher performance in classifying the Covid-19 and Normal X-ray images datasets. MobileNet resulted superior performance in accuracy, precision, and F-score, while Inception V4 performed slightly under Inception V3 and MobileNet models. This performance difference was because the MobileNet model had 87 layers, where the more the number of layers, the better the accuracy results. In addition, the MobileNet had a particular layer called Depthwise Separable Convolution, which was used to reduce the complexity. Therefore, the MobileNet model was more efficient and a very light model when compared to the InceptionV3 and InceptionV4 models in the Covid-19 classification.