A Survey of Face Recognition Based on Convolutional Neural Network

Face recognition is one of the interesting research topics in the field of computer vision. In recent years, deep learning methods, especially the Convolutional Neural Network, have progressed. One of the successes of CNN is in face recognition. Face recognition by computer is a technique done so that the computer can automatically recognize faces in an image. Various researchers have conducted related research on facial recognition. This survey presents research related to face recognition based on Convolutional Neural Network that has been conducted. The studies used are studies that have been published in the last five years. It was performed to determine the renewal that emerged in face recognition based on Convolutional Neural Network. The basic theory of the Convolutional Neural Network, face recognition, and description of the database used in various researches are also discussed. Hopefully, this survey can provide additional knowledge regarding face recognition based on the Convolutional Neural Network.


Introduction
The human face is one of the unique parts of a body. A person can recognize others through the face. In addition to voice recognition, retinal scanning, or fingerprint recognition, face recognition is included in biometric identification [1]. Face recognition has become one of the interesting research topics in the field of computer vision. There are various researches done with the aim of the computer being able to recognize a person's face [2]. Moreover, the emergence of interest in research on face recognition is driven by requests for the application of the technology.

Convolutional Layer
Convolutional Layer is the primary layer of a CNN, and the Convolutional Layer has several filters or can be called kernels. A filter is a matrix that contains numbers. The filter serves to produce a feature map from the input layer (for the first time) or a feature map from the previous layer. Filters are convoluted with the input image to produce a feature map. Convolution is a mathematical operation that multiplies two matrices and sums them. Filters are connected to the input layer through a small area called the receptive field. The extent of the filter determines the extent of the area. The values in the filter will change in the learning or training process. After the convolution calculation results are obtained, each will be processed with an activation function [6]. The size of the output of the convolutional layer is determined by several parameters, such as depth (number of filters), stride (the amount of shift of the filter), and zero paddings (adjust the spatial size of the output) [7]. Fig. 2 shows an image that is convoluted with a filter to produce a feature map.
Several fully connected layers change the feature map with the shape of 2D into a feature vector with the shape of 1D. A fully connected layer works similar to the neural network [8]. After the fully connected layer is the output layer. The amount of output depends on the goal to be achieved, for example, if CNN is used to recognize four classes' objects, then the number of outputs from the output layer is four. The relationship between the fully connected layer and the output layer is a softmax function. Softmax function changes the previous layer's features into the probability values of existing classes [5].

Face Recognition
Face recognition is one of biometric identification, apart from voice recognition, retinal scanning, and fingerprint recognition [1]. Face recognition is one of the challenges in the field of pattern recognition and computer vision, and one of the many studies conducted in recent years [9], [10], [5]. The trend is due to the increasing demand for face recognition technology to be applied in law enforcement, commercial use [10], human-computer interaction, access control, surveillance [11], video conference [1], authentication on mobile devices, payment transactions, autonomous car [9], automatic attendance system, and digital entertainment [12]. Face recognition become famous for its nature, which does not require physical contact with the scanner device, and tends to be relatively inexpensive [1].
Face recognition by computer is a computational technique to recognize someone in an image that is done automatically [3]. A person can be identified through certain features that make it different from other people's faces [13]. Based on its application, face recognition can be categorized into face identification and face verification. Face verification is when a computer is given a face pair, and the computer must be able to determine the faces in both images belong to the same person or not [14]. Face identification is the process carried out by a computer to find a face's identity in an image of a collection of faces whose identity has been stored in a database [10].

Stages of Face Recognition
Face recognition is carried out through four stages: the face detection stage, the face alignment stage, the face feature extraction stage, and the face matching/classification stage [3], [2]. The Diagram of the face recognition stages is shown in Fig. 5.

Face Detection
Face detection is the primary step used for analyzing faces, such as aligning face, modeling a face, recognizing a face, recognizing facial expression, tracking facial poses, and recognizing a person's gender or age based on their face [15]. The purpose of the face detection algorithm is if an image is given to the computer, the face location will be displayed with a box marking the faces found in the image [16]. Fig. 6 shows an example of face detection in an image. The faces of people contained in the image were successfully detected, with the display of a colored box surrounding the face and the level of confidence. The faces are successfully detected even though the positions are different and not in a straight position to the camera. A face detector must have the ability to detect faces when the face is not in a straight position, varied lighting conditions, different expressions, different skin colors, obstacles, different face sizes in the images, low image resolution, and there are various other objects in an image [15], [16].  [17] Until today, there are several methods used to detect faces in an image [18]. Face detection in an image begins with the work done by Viola-Jones [18]. Viola-Jones applied the Haar-like features combined with multilevel classifiers that were trained using the AdaBoost learning algorithm to detect faces in an image quickly and accurately [19]. However, the weakness of the Viola-Jones method is the difficulty in detecting faces with a different point of view, blurred images, or partially covered faces [17], [19], [20]. Another method is the Deformable Part Model (DPM) proposed by Felzenszwalb et al. DPM model information contained between parts of the face [17]. The weakness of DPM is that it requires extensive computing resources [19].
Recent research shows that deep learning, using the Convolutional Neural Network (CNN), achieved success in computer vision. One of the successes is to detect faces. The reason for CNN's superiority compared to the previous methods is that CNN can automatically learn features that represent complex visual variations from a large amount of training data [21]. Several studies used CNN to detect the face. Li et al. used CNN cascade, which uses more than one CNN [22]. The use of six CNNs makes the computer able to detect faces more accurately. However, the study's weakness is that the training process is very complicated and heavy because they must train the six CNN separately [20]. Qin et al. proposed a joint training method to optimize the CNN cascade training process with the same goal, which is to detect faces [23]. Sawat and Hegadi proposed a method for detecting face by combining CNN and Cubic Support Vector Machine [24]. Face Alignment The stage after detecting the face is to align the face. Face alignment is also known as detecting face landmarks [25]. Detecting face landmark is needed to make align to the front. It can increase the level of accuracy of face recognition. The main points of face landmark include the eyes, nose, mouth, and so on.
The shape or location gives a unique pattern to each face [26]. Fig. 7 shows the detection of face landmarks points. The face on the left shows 20 points of face landmarks, while on the right shows 68 points of face landmarks.

Face Feature Extraction
The feature extraction stage is essential in face recognition [27]. As the name suggests, the primary purpose of this stage is to extract features of the face. Face feature extraction is taking and storing the most important information of the face. The information is in the form of a geometric distribution and the shape of the mouth, nose, eyes, or other features that make a face unique. Face features are represented in vectors that will be used for the next stage [10], [28]. There are several methods proposed to extract features of the face, including using Principal Component Analysis (PCA), Independent Component Analysis (ICA), Local Binary Pattern (LBP), Histogram, and the latest and better is using CNN [10].

Face Matching / Classification
This stage is the stage of comparing face features that have been extracted in the previous stage with face features contained in the database. Face recognition has two types of applications, namely, face identification and face verification [28]. Face verification is a one-to-one matching process. A test image will be compared with an image from the database to determine whether they are the same. Face identification is a one-to-many matching process. A test image is compared to a set of faces in the database to find the most likely match [10].

Face recognition based on Convolutional Neural Network
Various studies on face recognition have been carried out. This section will discuss several studies that focus on face recognition using the Convolutional Neural Network. The selected studies are studies published from 2015 to 2020.  [5]. NIRFaceNet is a modification of GoogLeNet, used for face recognition whose input is in the form of Near-Infrared (NIR) images. The researchers chose to do face recognition with a NIR image because it has advantages over lighting changes. The dataset used is the CASIA NIR database. The advantage of this research is that the researchers added image variations to the dataset. These variations are motion blur, Gaussian blur, salt-and-pepper, and Gaussian noise. The addition of variations is done so that the model can recognize faces in unclear image conditions. The model used is based on GoogLeNet, but only consists of 8 layers, different from the original, 27 layers. The reduction in layers provides the advantage of faster time needed for training. The test results show that NIRFaceNet obtained an accuracy rate of 100% to recognize faces without expression and normal position. Subsequent testing, with facial expressions and different facial positions, the accuracy level obtained, was 98.28%. Then, the testing with blurred and noise images achieved accuracy ranging from 96.02% to 98.48%.
Ben Fredj et al. trained a CNN to recognize faces in an uncontrolled environment, in the sense that face images have noise, or partially covered faces [29]. The researchers used the data augmentation method of flipping, histograms, noise, blur, differences in lighting, partially covered faces, and parts of cut-off faces. Data augmentation increases the number of images in the dataset and adds variation. The research in [29] has little in common with research in [5]. Softmax loss and center loss are used. The dataset used in the study is CASIA-WebFace. The accuracy obtained is 99.2% when tested with the LFW dataset, and 96.83% when tested with the YTF dataset.
Pei et al. made a student attendance system using face recognition based on deep learning [30]. Researchers revealed the problem faced was the difficulty in getting a large amount of training data. Data augmentation was used as a solution to increase the number of images in the dataset that can be used. The dataset images are modified through geometric transformation (enlarging the image, translation, rotation), lighting, mean filter, median filter, Gaussian filter, and bilateral filter. The dataset used is privately owned, with 3538 student face images for training, and 372 for testing. Researchers used the CNN VGG-16 architecture. The accuracy obtained from this study was 86.3%. Researchers then increased the amount of training data through face capture via video, and the level of accuracy increased to 98.1%.
Moon et al. trained CNN to recognize a person's face at different distances [31]. The used dataset is a dataset formed by the researchers themselves, with 12 individuals, and 270 images for each individual. Everyone's faces in the dataset are taken from 1-meter to 9 meters, with 30 images taken at each distance. The average level of accuracy obtained from this study was 88.9%.
Zheng et al. made a face recognition using the Deep Convolutional Neural Network and the Vector of Locally Aggregated Descriptor (VLAD) feature encoding [32]. The CNN was trained using the CASIA-WebFace dataset and tested using the IJB-A and JANUS CS2 datasets. Data augmentation was used to add to the dataset by turning the image in the dataset horizontally. The highest accuracy achieved in face verification testing is 97.90% using the IJB-A dataset, and 96.66% using the JANUS CS2 dataset. The highest level of accuracy achieved in face identification testing was 96.4% using the IJB-A dataset, and 96.90% using the JANUS CS2 dataset.

Hu et al. used 3 Convolutional Neural Networks arranged in parallel and proposed a Diversity
Combination method for face recognition [33]. Inception-ResNet-v1 was used as the basic architecture of CNN. The three CNNs were used for feature extraction, and diversity combination is a strategy used to adaptively adjust the weight value in each CNN and make joint classification decisions. VGG2-Face, MS-Celeb-1M, and CAISA WebFace were used to train the CNN. Face matching testing was performed using CASIA NIR-VIS 2.0 and Oulu-CASIA NIR-VIS dataset. The accuracy level obtained was 98.9% using the CASIA NIR-VIS 2.0 and 99.8% using the Oulu-CASIA NIR-VIS dataset.
Binti Mat Kasim et al. used CNN to recognize faces [34]. The research focuses on celebrity face recognition. The author used three types of CNN architecture, which are ordinary CNN, AlexNet, and GoogLeNet. The purpose of the three architectures is to find out and compare the accuracy obtained from each architecture. The dataset used for training is the CelebFaces dataset. When tested, CNN could get 99.72% accuracy, while AlexNet and GoogLeNet managed to achieve an accuracy level of 100%.
Bendjillali et al., in their research, compared three types of architecture VGG16, ResNet50, and Inception-v3 [35]. The Viola-Jones algorithm is used to detect the face. The authors increased the contrast of the training image to determine the impact given to the accuracy of face recognition. The contrast  [36]. The method is used to overcome the problem of input image scale, for CNN to recognize faces on low-resolution images, therefore increasing the performance of CNN. Training is carried out using the CASIA WebFace database. Evaluation is done using the LFW database, which is a standard dataset for face recognition evaluation. Besides, the authors tested it with a private CCTV dataset. The results obtained are that the model has a face matching accuracy level of 98.87%.
Chandran et al. used CNN and Multi-Class Support Vector Machine (SVM) to recognize children's faces [37]. The architecture created is based on VGG-Face. The SVM Multiclass is used as a substitute for Softmax. The authors made the face database used for training and testing. The database contains 846 faces of children with 43 individuals. The test results show that the level of accuracy achieved is 99.41%.
Khan et al. used a Convolutional Neural Network to detect and recognize faces [38]. Region Proposal Network which is part of the R-CNN that is used for object detection, is used by researchers to detect faces in images. The dataset used for training is LFW, and the authors used the data augmentation method by reversing each image in the dataset. The accuracy achieved was 97.9%.
Hu et al. proposed a method for increasing the number of images in the dataset through image synthesis [4]. Two CNN architectures are used to test the resulting accuracy. The first architecture is named CNN-S, and the second architecture is named CNN-L. CNN-L has more layers than CNN-S The dataset used for training is LFW and CASIA NIR-VIS 2.0. The test results show the highest accuracy obtained using CNN-L, which is 95.77% using the LFW dataset and 85.05% using the CASIA NIR-VIS 2.0 dataset.
Ding proposed a CNN model called Trunk-Branch Ensemble CNN (TBE -CNN) for video-based face recognition [39]. The model was designed to be able to extract additional information from the face holistically, and facial parts took around facial components. The TBE-CNN model is based on GoogLeNet. The researcher used the CASIA-WebFace database for training. Data augmentation such as the horizontal reversal and adding Gaussian noise was used. The test was carried out using the PaSC, COX Face, and YouTube Faces databases. Up to 98% accuracy was achieved using the PaSC database, 94.96% accuracy was achieved using the YouTube Faces database, and 99.33% accuracy was achieved using the COX Face. Singh and Om used CNN to recognize the face of a newborn baby [42]. The CNN architecture used is very simple, consisting of 2 convolutional layers, two pooling layers, and a fully connected layer. The Ríos-Sánchez et al. compared four CNN models, FaceNet, OpenFace, gb2s_Model1, and gb2s_Model2 [43]. All models are based on GoogLeNet, with the last two models being made by researchers. The four models are used to recognize faces with only one sample for each person. The goal is to determine the accuracy of face recognition when the amount of available data is very small. The LFW database is used to train the gb2s_Model1 and gb2s_Model2. The test was carried out using Extended Yale B, ORL, BioID, EUCFI, PrintAttack, gb2sµMOD_Face_Dataset, gb2sTablet, gb2s_Selfies, and gb2s_IDCards database. The last three databases are private. The result shows the highest False Match Rate (FMR) and False Non-Match Rate (FNMR) is using OpenFace.
Zhou et al. used ResNet-face18, which is a CNN model for face recognition, and modification of ResNet [44]. The research introduces a Softmax function named double additive margin Softmax loss (DAM-Softmax), and they use CASIA-WebFace to train the model. CFP-FP, CALFW, and CPLFW datasets were used to test the model. Researchers compared three types of Softmax functions, namely Softmax, AM-Softmax, and DAM-Softmax. The highest accuracy level is obtained using DAM-Softmax. The accuracy achieved is 90.17%, 82.08%, and 93.26%, using CALFW, CPLFW, CFP-FP, respectively.
Another research used three different CNN models, namely MTCNN, self-designed CNN, and IFaceNet [45]. MTCNN is used to detect faces in an image. The self-designed CNN is used to determine whether the face in the image is fake or not. IFaceNet is used to recognize faces that are otherwise not fake. The IFaceNet is a modification of FaceNet. CNN was trained using a private dataset called NenuLD. CNN has an accuracy of 99.8% in determining whether a face is fake or not. The accuracy of face recognition obtained is also high, which is 99.7%. The researcher states that the total accuracy of the proposed system is 99.5%. Neighbor classifier is used to classify faces. The highest level of accuracy obtained by using Alexnet using ORL databases reached 92.74%, and ZF-5net reached 87.68%. The highest level of accuracy obtained in the AR database using Alexnet reaches 69.44%, and ZF-5net reaches 72.52%. GoogLeNet produces an accuracy rate of up to 93.54% using the ORL database, and 76.17% using the AR database.
Nimbarte and Bhoyar discussed one of the problems in face recognition, which is increasing one's age [50]. If a person's face stored in the database changes due to aging, it is worried that the face recognition system cannot recognize that person's face. Researchers proposed a CNN architecture to overcome this problem. CNN is used as a feature extractor, and SVM is used to classify faces. FGNET and MORPH (Album II) datasets are used to train and test the model. The accuracy level obtained using the FGNET dataset is 76.5%, and 92.5% using the MORPH (Album II) dataset.
Kamencay et al. compared the performance of CNN with Principal Component Analysis (PCA), Local Binary Patterns Histograms (LBPH), and K-Nearest Neighbor (K-NN) in face recognition [51]. The database used in the study is the ORL database. The highest face recognition accuracy was obtained using CNN, which is up to 98.3%. While the level of accuracy obtained using PCA, LBPH, and K-NN is 85.6%, 88.9%, and 81.4%, respectively. By using CNN, the level of accuracy obtained is very high.  [53]. In the research, the authors used a Convolutional Neural Network model for face recognition. The CNN model used is based on lightened CNN. Lightened CNN has been trained using the CASIA-WebFace database. Four databases were used in this study, namely AR face, Extend Yale B, FERET, and LFW face database. The accuracy level reached 100% using AR face, 88.3% using the Extend Yale B, 93.9% using the FERET database, and 74% using the LFW database.
Chen et al. [54] tested two CNN models, namely DCNNS and DCNNL. The DCNNL model is based on the AlexNet architecture, while the DCNNS is based on the architecture in the study by Chen et al. [55]. DCNN-based face detection was also used in this study. The model is the Deep Pyramid Deformable Parts Model for Face Detection (DP2MFD). The study used IJB-A and JANUS CS2 databases. The accuracy rate in the study reached 98.8% using the IJB-A dataset, and 98.6% using the JANUS CS2 dataset.
Chen et al., in their study, proposed a CNN with deep transformation learning [56]. The method increases the robustness and degree of discrimination of the extracted features. In the study, face detection and face marker detection were performed using MTCNN and Dlib. The datasets used for training are FaceScrub, cad2000, and CASIA-Web face. Tests were carried out by researchers using the LFW and IJB-A datasets. The test results show that the accuracy rate obtained using LFW is 99.16%, and identification using IJB-A is 93.1%. MegaFace database is used for testing. The accuracy of using a validation database is 99.83% using LFW, 98.57% using AgeDB, and 95.85% using CFP. The identification accuracy level results using MegaFace are 88.74% using the ACNN-Res50 and 98.35% using the ACNN-Res-101. Then the accuracy of verification is 91.12% using the ACNN-Res50 and 98.42% using the ACNN-Res101.
Khan et al. discussed making a student attendance recording system using face recognition [58]. The proposed system is expected to replace the manual and biometric attendance system, which takes a long time to record attendance. Researchers use YOLO v3 to detect faces and the Microsoft Azure face API to recognize detected faces. Researchers chose to use the YOLO algorithm because the processing is faster than R-CNN. A database containing students' faces is used to recognize faces. Twenty photos are taken from each student. The level of accuracy is obtained when the system is tested up to 100%. The result indicates that the proposed system can be used as a substitute for the manual recording system.
The work of Nakajima, Moshnyaga, and Hashimoto compared the performance of two facial recognition approaches [59]. The two approaches are CNN and Local-Binary Pattern Histograms. Both approaches were experimented on Raspberry-Pi and were trained using a private dataset consisting of 12 classes. Each class consists of 50 facial images. The number of images used for training is 540 images, and 60 images are used for verification. The CNN achieved the highest accuracy of 100% and average accuracy of 96%. In contrast to CNN, the highest accuracy achieved by the LBP is 76%, while the average accuracy is 64%. Therefore, the study concludes that the CNN achieved more robust recognition than the LBP.
Hussain et al. proposed an authentication system for the medical and healthcare area using face recognition [60]. The proposed system is comprised of face detection, extraction of facial features, and classification. The face detection was done using the Haar cascade technique. Three methods were compared in the facial features extraction: pre-trained ResNet-50, VGG-16, and the Linear Binary Pattern Histogram (LBPH). Lastly, the Support Vector Machine (SVM) was utilized for classification. A total of 8422 face images of 100 individuals were used, of which 70% were used for training, 15% for testing, and 15% for validation. The performance of each method was compared. The ResNet-50 + SVM could achieve an accuracy of 99.56%, VGG-16 + SVM achieved 98.49%, while the LBPH achieved an accuracy of 98.47%. Hence, the ResNet + SVM achieved the highest accuracy.
Farhi, Abbasi, and Rehman proposed a face recognition-based identity management system for office and academic environments [61]. The proposed system comprises face detection, facial features extraction, and classification. The work utilized MTCNN for face detection purposes and the well-known FaceNet to extract the facial features. Like the work of [60], the classification process was done using the Support Vector Machine (SVM). The authors experimented with different angles, distances, and illumination in their work. The proposed system could achieve 97.1% -98.8% accuracy with face positioning from -15 to +15-degree angle. In normal light conditions and a distance of 4-5 meters, the proposed system achieved 98% to 99% accuracy. However, in low light conditions, the accuracy achieved is lower, 96.47%.
Xing, Wang, and Zheng proposed a VGG-16 with an improved pooling method for face recognition [62]. The work utilizes image size normalization and de-averaging operation to preprocess the image data. Furthermore, Gabor Wavelet Transform-based image enhancement was used to reduce noise in the image, while histogram homogenization was used to reduce the effect of light shadow in the image. The improved pooling method proposed in their work is the improved stochastic pooling, which enhances the generalization of the network and abstraction process. Face detection was carried out using the combination of Harr and Adaboost and the deep learning-based Faster R-CNN. The proposed method was experimented with using the self-built and LFW dataset. The resulting accuracy reached 97.2%.
Akter et al. proposed a framework to detect autistic children through facial recognition [63]. The facial recognition was done through improved transfer learning using pre-trained CNN models. The pretrained CNNs used in their work are DenseNet121, ResNet50, VGG16, VGG19, MobileNet-V1, and MobileNet-V2. Besides using the pre-trained CNN, the work experimented with several machine learning Indonesian Journal of Information Systems (IJIS) Vol. 4, No. 2, February 2022 classifiers, such as Adaboost, k-Nearest Neighbor (kNN), Decision Tree, Logistic Regression, Gradient Boosting, Naïve Bayes, Support Vector Machine, Multi-layer Perception, and Random Forest. The pretrained models were modified by adding three Batch Normalization layers and two fully connected layers before the output layer. The dataset used is facial images of normal and autistic children with 2936 images. The highest accuracy achieved was 91% on the test set using the improved MobileNet-V1. The authors used k-means clustering for classifying binary sub-types of autism and achieved 92.10% using the improved MobileNet-V1.
In the Covid-19 pandemic, the challenge to facial recognition emerges because people must wear face masks. Several works have tried to address this problem. Talahua et al. proposed a facial recognition system that could recognize a person even when using a face mask [65]. The system utilized OpenCv for facial detection, MobileNetV2 handles the face mask-wearing recognition. FaceNet handled the face recognition task as the facial features extractor and feedforward multilayer perceptron used for classification. A total of 13,359 images were used for training face recognition. The highest accuracy achieved is 99.52% for facial recognition with a mask and 99.96% without a mask.
Deng et al. proposed a facial recognition algorithm to recognize a masked face based on large margin cosine loss called MFCosface [66]. In their work, MTCNN was utilized for face detection, and the base architecture used is the Inception-ResNet-v1, and the proposed large margin cosine loss was used to train the model. VGGFace2_m was used to train the model, while CASIA-FaceV5_m, LFW_m, RMFD, and MFR2 were used for testing. Some faces from the datasets are generated with a mask. The accuracy achieved on LFW_m, CF_m, MFR2, RMFD is 99.33%, 97.03%, 98.50%, 92.15%, respectively.
Ullah et al. proposed a unified framework for mask detection and masked facial recognition [67]. The authors made a custom CNN called DeepMaskNet, which comprises 17 layers, including the input and classification layers. A largescale called masked detection and masked facial recognition (MDMFR) was used for training and testing the model. The DeepMaskNet achieved 100% accuracy in face mask detection and 93.33% in face recognition, outperforming other state-of-the-art models.
Song et al. proposed the Spartan Face Mask Detection and Facial Recognition system to address the challenge of mask detection, mask type classification, mask position classification, and identity recognition that emerged during the Covid-19 pandemic [68]. The proposed system utilized the MTCNN to detect faces in an image, FaceNet to extract embedded facial features, and Support Vector Machine and XGBoost as the classifiers for the facial recognition scope. The training and testing were done using a total of 2000 images. The accuracy achieved using the FaceNet + SVM is 97% on the test set and 100% on the training set, while the FaceNet + XGBoost is 88% on the test set and 100% on the training set.

Face Recognition Database
Face database is important in face recognition. The face database contains face images that can be used to train the Convolutional Neural Network. There are a variety of face databases that are available online and can be downloaded by researchers or developers who need them. Table 2 lists the databases used in several studies to train and test facial recognition, along with their information.

Conclusion
This survey discusses face recognition based on the Convolutional Neural Network. Face recognition is one of the challenges in pattern recognition and computer vision and one of the many studies conducted in recent years. Several studies have been discussed trying to find renewal for face recognition. The renewed face recognition proposed various kinds of architecture used, modifying the images in the dataset or combining several methods. The main goal is to obtain a high level of accuracy so that the face recognition system has a high performance. This survey also discusses some face databases used in several studies. The face databases available have data from hundreds of images to millions. It is hoped that through this survey, readers can gain additional knowledge about face recognition based on Convolutional Neural Networks.

Acknowledgement
We would like to thank Universitas Universal for funding this research.