Next Article in Journal
The Key Success Factors for Attracting Foreign Investment in the Post-Epidemic Era
Next Article in Special Issue
Multicriteria Evaluation of Deep Neural Networks for Semantic Segmentation of Mammographies
Previous Article in Journal
Delay-Dependent Stability, Integrability and Boundedeness Criteria for Delay Differential Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Optimization of Convolutional Neural Networks Architectures Using PSO for Sign Language Recognition

by
Jonathan Fregoso
,
Claudia I. Gonzalez
* and
Gabriela E. Martinez
Division of Graduate Studies and Research, Tijuana Institute of Technology, Tijuana 22414, Mexico
*
Author to whom correspondence should be addressed.
Axioms 2021, 10(3), 139; https://doi.org/10.3390/axioms10030139
Submission received: 29 May 2021 / Revised: 12 June 2021 / Accepted: 25 June 2021 / Published: 29 June 2021
(This article belongs to the Special Issue Various Deep Learning Algorithms in Computational Intelligence)

Abstract

:
This paper presents an approach to design convolutional neural network architectures, using the particle swarm optimization algorithm. The adjustment of the hyper-parameters and finding the optimal network architecture of convolutional neural networks represents an important challenge. Network performance and achieving efficient learning models for a particular problem depends on setting hyper-parameter values and this implies exploring a huge and complex search space. The use of heuristic-based searches supports these types of problems; therefore, the main contribution of this research work is to apply the PSO algorithm to find the optimal parameters of the convolutional neural networks which include the number of convolutional layers, the filter size used in the convolutional process, the number of convolutional filters, and the batch size. This work describes two optimization approaches; the first, the parameters obtained by PSO are kept under the same conditions in each convolutional layer, and the objective function evaluated by PSO is given by the classification rate; in the second, the PSO generates different parameters per layer, and the objective function is composed of the recognition rate in conjunction with the Akaike information criterion, the latter helps to find the best network performance but with the minimum parameters. The optimized architectures are implemented in three study cases of sign language databases, in which are included the Mexican Sign Language alphabet, the American Sign Language MNIST, and the American Sign Language alphabet. According to the results, the proposed methodologies achieved favorable results with a recognition rate higher than 99%, showing competitive results compared to other state-of-the-art approaches.

1. Introduction

Deep neural networks have demonstrated their capacity to solve classification problems using a hierarchical model, millions of parameters, and learning with big databases. Convolutional neural networks (CNN) are a special class of deep neural networks that consist of several convolutions, pooling, and fully connected layers; this has proven to be a robust method for image or video processing, classification, and pattern recognition. In recent years CNN has attracted attention for achieving superior results in various applications in the computer vision domain, such as medicine, aerospace, natural language processing and robotics [1,2].
CNN are widely used in the field of industry, however, when designing CNN architectures, we face some challenges which include the high computational costs for information processing and finding the optimal CNN parameters (architecture) for each problem [3]. CNN architectures are made up of numerous parameters and, depending on their configuration, can generate a variety of classification results when applied to solve the same tasks; the setting of the hyper-parameter values is usually based on a random search, performing several tests or adjusting manually and this represents a complex search process. To solve this challenge, various researchers have proposed the implementation of evolutionary computation approaches to automatically design the optimal CNN architectures and to increase its performance [4,5]. In Sun et al. [6,7], an evolutionary approach is implemented to automatically obtain CNN architectures, achieving good results against the state-of-the-art architectures. In Ma et al. [8], the authors present an analysis of different methodologies based on evolutionary computing techniques to optimize CNN architectures, these were tested on benchmark data sets and achieved competitive results. Baldominos et al. [9] implement an approach to automatically design CNN architectures, using genetic algorithms (GA) in conjunction with grammatical evolution.
In the state of the art, we can find a variety of meta-heuristics that are applied to optimize CNN hyper-parameters, including the FGSA [10,11,12], harmonic search (HS) [13], differential evolution (DE) [14], microcanonical optimization algorithm [15], Whale optimization algorithm [16] and tree growth algorithm framework [17] to mention a few.
In other research works, the PSO algorithm is used to optimize CNN architectures, obtaining favorable results in the solution of different applications. In Sun et al. [18], Singh et al. [19] and Wang et al. [20], PSO is applied to automatically design CNN architectures; these approaches are tested on known benchmark datasets, and the results obtained are competitive against the state-of-the-art architectures. Besides this, PSO has been implemented in other fields of machine learning, including the optimization of different types of artificial neural network architectures, given favorable solutions for a plethora of problems [21,22]. In [23], PSO optimizes models of modular neural networks and is applied to obtain the blood pressure trend. In [24], a hybrid ANN-PSO method is applied to model the electricity price forecasting for the Indian energy exchange. As well as in [25], the PSO variants are applied to generate optimal modular neural network architectures obtaining competitive results for human recognition. In [26], the PSO algorithm is used to optimize deep neural network architectures and is tested in image classification tasks. In [27] a new paradigm of hybrid classification based on PSO is presented, which is applied for the prediction of medical diagnoses and prognoses. Furthermore, in [28] the artificial bee colony (ABC) [29] and PSO are used to optimize multilayer perceptron neural networks (MLP); the approach is applied to estimate the heating load and cooling load of energy efficient buildings; and the authors report that PSO outperforms ABC, improving the MLP performance. In the listed works, we can note the advantages that PSO offers in the optimization process, increasing performance in different tasks.
In research related to CNN approaches applied to the recognition of sign language, we find the work presented in [30] where a CNN model with stochastic pooling is implemented in the recognition of the Chinese sign language spelling, achieving a rate of 89.32 ± 1.07% recognition. In [31] a CNN method for Arabic sign language (ArSL) recognition was applied, where the authors report a value of 90.02% precision. In [32] a 3D-CNN approach is applied to sign language recognition for extensive vocabulary, images are captured through a Kinect, the authors report effectiveness of 88.7%.
The contribution of this research work focuses on implementing a hybrid methodology, where the PSO algorithm is applied to find the optimal design of parameters for CNN architectures. This work presents two optimization approaches; in both, the parameters considered are the number of convolution layers, the filter size used in each convolutional layer, the convolution filters number and, the batch size. In the first approach, the consistency of the parameters between each layer is maintained in the same conditions and the objective function is given by the recognition rate. In the second approach, the aim is to find more random searches in the architectures that the PSO produces; in this case, the values for each convolution layer are completely different, and the objective function is given by the highest recognition rate, and the lowest Akaike information criterion (AIC); the latter helps to obtain more robust performance of the network with the minimum parameters as the AIC allows penalizing the number of parameters used in each training. The optimized architectures are tested with three sign language databases, including the Mexican Sign Language (MSL) alphabet, the American Sign Language (ASL) alphabet [33], and the American Sign Language MNIST (ASL MNIST) [34]. This research aims to impulse the investigation in the soft computing area for the development of tools to help the deaf community for a more inclusive society [35].
The structure of this work is organized as follows. Section 2 presents the general theory about convolutional neural networks. Section 3 introduces PSO theory, including definitions, functionality, and the main equations. Section 4 details the methodology for developing the two PSO-CNN optimization approaches. Section 5 describes an analysis of the experimental results achieved after the optimized architectures are implemented for the three databases. Additionally, Section 5 presents a statistical test to compare the two optimization proposals, and we also show a comparative analysis against other CNN approaches focused on sign language recognition. Finally, Section 6 gives important conclusions and future works.

2. Convolutional Neural Networks

Biologically inspired computational models are capable of far outperforming previous forms of common artificial intelligence of machine learning. One of the most impressive forms of ANN (artificial neural network) architecture is that of CNN, which is mainly implemented to solve difficult image-based pattern recognition tasks.
CNNs are a specialized type of ANN with supervised learning, which process their layers by emulating the visual cortex of the human eye. This procedure allows the recognition of characteristic patterns in the input data, which makes it possible to identify objects through a set of hidden layers, which have a hierarchy and are specialized. The first layers are capable of detecting curves and lines and to the extent that you work with deeper layers, it is possible to achieve the recognition of more complex shapes, such as a silhouette or peoples’ faces.
These types of networks are designed to operate specifically with image processing. The design of its architecture emulates the behavior of the visual cortex of the brain when processing and recognizing images [36]. Its main function is to locate and learn the information characteristic patterns, such as curves, lines, color tones, etc., through the application of convolution layers, which facilitate the process of identification and classification of objects [37,38].
The basic CNN architecture is presented in Figure 1, which consists of five layers: the input, convolution, non-linearity (ReLu), pooling, and classification layer [39,40], these are described in the following subsections.
CNNs are widely implemented in applications that need the use of artificial vision techniques. Although the results that have been obtained are very promising, the reality is that they incur high computational costs; therefore it is essential to implement techniques that allow your performance to be increased. For this reason, an optimization of the CNN parameters is presented to improve the recognition percentage and reduce computational cost. In Figure 2, we can appreciate some parameters that can be optimized in each CNN layer [41].

2.1. Input Layer

It is the first layer of a CNN, here the images or videos are entered that are going to be processed by the neural network to extract their characteristics. All information is stored in two-dimensional matrices. To increase the effectiveness of the algorithms and reduce the computational cost, it is recommended to carry out a previous preprocessing of the images to be trained, such as segmentation, normalization of pixel values, extraction of characteristics of the objects or the background to keep the most relevant information, working them in grayscale, etc.

2.2. Convolution Layer

One of the most distinctive processes of this type of network is convolutions. It consists of taking a group of pixels from the input image and making a dot product with a kernel to produce numerous images that are the feature maps; these maps are distinct and depend on the type and size of the convolution filter implemented in the image.
Among the important characteristics that it gives to the kernel, is to detect lines, edges, focus, blur, curves, colors, among others. This is achieved by performing the convolution between the image and the kernel, multiplying the filter values pixel by pixel with those of the image, by traveling the filter from left to right; this representation can be appreciated in Figure 3 [42], where * stands the convolution operation.

2.3. Non-Linearity Layer

The activation function in the convolutional layer has the same proposal that the activation used in any neural network, commonly a non-linearity function is used to normalize the images. There exist different activation functions; one of the most used in this type of models is the rectified linear unit (ReLU) function which brings back a value of zero if it receives a value less than zero as input, nevertheless for any value greater than zero the same parameter comes back [41,42].

2.4. Pooling Layer

The pooling task is used to reduce the dimensionality of the network, in other words, allows the reduction of the number of parameters, which shortens training time and combats over-fitting [41]. Among the most used types of grouping, we can mention the following: (1) mean, select the arithmetic mean of the values, (2) max pooling, select the pixel with the largest value in the feature map and (3) sum, take the sum of all the elements present in the feature map.
The pooling operation is usually done using a 2 × 2 filter, assuming that we have a 4 × 4 future map (obtained after the convolution layer), and this operation is carried out; first, the future map is divided into 4 segments with the size of the filer (2 × 2), second, in each segment an pixel value is selected according to pooling type (mean, max, sum). An example is illustrated in Figure 4.

2.5. Classifier Layer

This layer appears in the CNN architecture after total convolutional and pooling layers; this is a fully connected layer that interprets the feature representations obtained by the previous layers and performs the high-level reasoning function. It has a similar principle to the conventional multilayer perceptron neural system, and in this layer, the CNN recognizes and classifies the images that are part of the output. In a multiclass classification problem, this fully connected layer has the same number of outputs as the classes defined in the study case to be solved. The Softmax function has become one of the most popular options for the classification task, due to its effectiveness [42].

3. Particle Swarm Optimization

It is a stochastic algorithm established on the intelligence of the swarm and inspired by the way birds forage for food; each bird is represented using particles which “move” in a multidimensional search space and “adjust” based on the experience of neighbors and your own.
The possible solution to the problem is depicted by the particle, which can be considered as “an individual element in a flock” [43]. PSO uses local and global information to find the best solution using a fitness function and the speeds at which the particles are moving.
PSO is very prone to premature convergence and falls into local optimum, so since its introduction in 1995 by Kennedy and Eberhart [44], various optimization variants have been proposed [45,46,47,48].
Algorithm 1. The PSO algorithm
Initialize the parameter of the problem (a random population).
while (completion criteria are not met)
begin
For each particle i do
begin
Update the position p i using (1).
Update the velocity x i using (2).
Evaluate the fitness value of the particle
If is necessary using (3)(4)
Update pbesti(t) and gbesti(t).
end
end
Algorithm 1 describes the process carried out by the PSO. This algorithm is defined by the equations that allow updating of the velocity with Equation (2) and the position with Equation (1).
p i t + 1 = p i t + x i t + 1 ,
In Equation (1), p i t is the position of particle i in a time t , within the search space. By adding a velocity x i t it is possible to change the position of the particle [45].
x i t + 1 = x i t ω + c 1 r 1 y i p i t + c 2 r 2 y ^ p i t ,
In Equation (2), x represents the velocity and i the particle. The parameters c 1 and c 2 define the cognitive and social factors, respectively. The random values in the interval [0,1] are depicted by r 1 and r 2 , ω is an inertia weight and the best position of the particle ( p b e s t i   ) is determined by y i and the best global position ( g b e s t   ) by y ^ .
The swarm is assumed to consist of n particles, so an objective function f is implemented to perform the computation of particle fitness with a maximization task. The personal and global best values are updated using Equations (3) and (4), respectively, at a time t [48].
Thus, i 1 n
p b e s t i t + 1 = p b e s t i t   i f   f p b e s t i t f p t + 1 p i t + 1   i f   f p b e s t i t > f p i t + 1
g b e s t t + 1 = m a x f y ,   f g b e s t t w h e r e ,       y     p b e s t 0 t ,     p b e s t 1 t ,   ,     p b e s t n t
According to Equations (1) and (2), the movements of the particle in the search space are illustrated in Figure 5.
The red and yellow circles represent the movement that a particle makes when the parameters c1 and c2 are updated. When c1 > c2, the particle moves in the direction of the yellow circle. When this condition is met, it means that the swarm performs the exploration process, so they “fly” in the search space to find the area that allows it to find the global optimum.
This movement allows the particles to perform long displacements, thus covering the whole search space. In the case of c2 > c1 then, the particle motion will be towards the red circle. It is here that the exploitation process takes place; it consists of the swarm “flying” in the best area of the search space, making small motions, which allow an intensive search [49].

4. Convolutional Neural Network Architecture Optimized by PSO

This Section presents two optimization approaches where the PSO algorithm is applied to optimize the parameters of CNN architectures, these approaches are denoted as PSO-CNN-I and PSO-CNN-II. The first objective is to select the most relevant parameters that have influence to obtain good performance of CNN and then implement the PSO algorithm to find these optimal parameters.
The parameters to be optimized were selected after evaluating the performance of a CNN with an experimental study, where the parameters were changed manually. As mentioned above, different CNN parameter values produce a variety of results for the same task; for this reason, the aim is to find the optimal architectures. The parameters listed below were chosen to be optimized in this work.
  • The number of convolutional layers;
  • The filter size or filter dimension used in each convolutional;
  • The number of filters to extract the future maps (the convolution filter number);
  • The batch size number: this value represents the number of images that are entered into CNN in each training block.
The general methodology of the proposal is presented in Figure 6, as the “training and optimization” block is the most important part of the whole process, where the CNN is initialized to integrate the parameter optimization by applying the PSO algorithm. In this process, the PSO is initialized according to the parameter given for the execution (the parameters are explained below) and this generates the particles. Each particle is a possible solution and its position has the parameter to be optimized, so each solution represents a complete CNN training.
The training process is an iterative cycle that ends when all the particles generated by the PSO are evaluated for each generation. The computational cost is higher and, it depends on the database size, the size of particles, the number of iterations of the PSO and, the number of particles in each iteration. That is to say, if the PSO is executed with 10 particles and 10 iterations, the CNN training process is executed 100 times. The steps to optimize the CNN by the PSO algorithm are illustrated in Figure 7 and explained as follows.
  • Input database to train the CNN. This step consists of selecting the database to be processed and classified for the CNN (ASL alphabet, ASL MINIST and MSL alphabet). Is important to mention that all the elements of each database need to keep a similar structure or characteristics. In other words, images with the same scale and color gamma (grayscale, RGB, CMYK); additionally, with the same dimensions of pixels and a similar format of file (JPGE, PNG, TIFF, BMP, etc.).
  • Generate the particle population for the PSO algorithm. The PSO parameters are set to include the number of iterations, the number of particles, inertial weight, cognitive constant (W1), and social constant (W2); the parameters used in the experimentation are presented in Table 8. This step involves the design of the particles; the structures of these are presented in Tables 1 and 3 according to the two optimization architecture proposals in this paper.
  • Initialize the CNN architecture, with the parameter obtained by the PSO (convolution layers number, the filter size, number of convolution filters, and the batch size) the CNN is initialized and in conjunction with the additional parameter specified in Table 8, the CNN is ready to train the input database.
  • CNN training and validation. The CNN reads and processes the input databases taking the images for training, validation, and testing; this step produces a recognition rate and the AIC value. These values return to the PSO as part of the objective function.
  • Evaluate the objective function. The PSO algorithm evaluates the objective function to determine the best value. As in this research, we are considering two approaches, in the first, the objective function is only the recognition rate (Equation (5)) and in the second, the objective function consists of the recognition rate and the AIC value (Equation (6)).
  • Update PSO parameters. At each iteration, each particle updates its velocity and position depending on its own best-known position (Pbest) in the search-space and the best-known position in the whole swarm (Gbest).
  • The process is repeated, evaluating all the particles until the stop criteria are found (in this case, it is the number of iterations).
  • Finally, the optimal solution is selected. In this process, the particle represented by Gbest is the optimal one for the CNN model.

4.1. PSO-CNN Optimization Process (PSO-CNN-I)

This first approach, which we are going to identify as PSO-CNN-I, consists of implementing a particle with four positions, one position for each parameter to be optimized (Figure 8). Table 1 presents the detail of the particle composition where the position x 1 corresponds to the number of layers with a search space from 1 to n, that is to say, that method can produce architectures with a minimum of one layer and maximum n, for the purposes of this work, we are using n = 3. The x 2 position represents the number of convolution filters used to extract the characteristics, with a search space of 32 to 18 filters. Position x 3 is the filter size; the search space is from 1 to 4 where this values represents a position, the value reached is mapped with the values of Table 2 to obtain the filter size (i.e., if the particle generates a value of 1 this represents a filter size of [3 × 3], to get a value of 2 the filter size will be [5 × 5] and so on, respectively, for each value. The last position represents the batch size ( x 4 ), this is initialized considering the search space ranges from 32 to 256. In this optimization process, the consistency of the parameters between the layers is maintained in the same conditions, that is, if after the PSO execution it generates a particle with 3 convolutional layers ( x 1 ), 50 filters ( x 2 ), a filter dimension of 3 × 3 ( x 3 ) and batch size of 50 ( x 4 ). The same values of filter numbers ( x 2 ) and filter size ( x 3 ) will apply to the three convolution layers of the CNN.
In this process, the objective function defined by Equation (5) is given by the recognition rate (precision) that the CNN returns after it is trained with the parameters generated by the PSO.
O b j e c t i v e   f u n c t i o n = R e c o g n i t i o n   R a t e ,

4.2. PSO-CNN Optimization Process (PSO-CNN-II)

In this second proposal, identifying as PSO-CNN-II, the particle structure consists of eight positions whose structure is presented in Figure 9, where each position represents the parameter to be optimized. The difference from the previous approach (PSO-CNN-I in Section 4.1) is finding more random searches in the architectures that the PSO produces; because in this case, the values for each convolution layer are completely different. Table 3 presents the detail of the particle composition, the description of each position, and the search space used. As we can see in Table 3, the positions x 3 , x 5 and x 7 represent an index with an integer value between 1 to 4, and depending on the value taken by the PSO, a mapping is made with values presented in Table 2.
According to the values to optimize in this new approach, the x 1 position is used to control the number of convolution layers and the activation of the positions x 2 to x 7 . If PSO generates a particle with a value of one for x 1 , only the position x 2 and x 3 will be activated to generate the number of filters of the convolutional layer 1 and the filter size to use in this layer. In other words, if PSO produces a particle with a value of three in the x 1 position, the positions from x 2 to x 7 will be activated to generate the number of filters to use in the convolutional layer 1 ( x 2 ), the filter size of layer 1 ( x 3 ), the number of filters of layer 2 ( x 4 ), the filter size for layer 2 ( x 5 ), the number of filters of layer 3 ( x 6 ), and the filter size for layer 3 ( x 7 ), respectively; these values are completely different from each other, therefore this methodology helps to produce more heterogeneous CNN architectures.
Another difference between this proposal and the previous one (PSO-CNN-I) is that the objective function changes, for this we are using the recognition rate together with the Akaike information criteria (AIC). The AIC penalizes the architectures according to the number of parameters used; that is to say, the model is penalized when it needs more parameters. The objective function is considered the highest recognition rate and the lowest AIC. The AIC is defined in Equation (6).
A I C = 2 k 2 l n L ,
According to our problem, in Equation (6) k   is the number of parameters of the model (number of layers and filter number) and L is the maximum value of the recognition that the CNN can reach; in this case, the value is 100. Figure 10 illustrates an example of a particle generated by PSO.
Based on the structure of Figure 10, we have a three-layer convolutional architecture, where the first layer consists of 100 convolution filters with a filter size of 3 × 3. The second layer has 85 convolution filters with a filter size of 5 × 5 and the third convolution layer has 50 convolutional filters with a filter size of 7 × 7. Finally, the batch size is 32. The CNN is training with these values, and the recognition rate is calculated, additionally the AIC is obtained based on the parameters of the positions x 1 , x 2 , x 4 and x 6 which represents the number of convolution layers and the number of filters for each convolutional layer. After applying Equation (6), this architecture produces the AIC defined in Equation (7).
A I C = 2 3 + 100 + 85 + 50 2 l n   100 , A I C = 466.7897 ,
Assuming that there are two architectures with the same recognition rate but with different AICs (Table 4), the model will take the architecture with the lowest AIC, as this would help penalize the parameters that are needed to train the network and thus produce optimized and simpler architectures.

5. Experiments and Results

This section describes the three databases implemented in the case studies (ASL alphabet, ASL MNIST, and MSL alphabet), the static parameters used to set the PSO algorithm and the CNN process, the experimental results obtained in the two optimization approaches that were performed (PSO-CNN-I and PSO-CNN-II), as well as the comparison analysis against other approaches.

5.1. Sign Language Databases Used in the Study Cases

The characteristics of the sign databases are described below.

5.1.1. American Sign Language (ASL Alphabet)

The ASL alphabet consists of 87,000 images in color format, with a dimension of 200 × 200 pixels. This database contains 29 classes, these are labeled in a range of 0 to 28, with a one-to-one assignment for each letter of the American alphabet A–Z (0 to 25 for the alphabet; that is, 0 = A and 25 = Z) the other three classes correspond to the space symbols, delete, and null (26 to 28; i.e., 26 = space, 27 = delete and, 28 = null). Table 5 presents the general description of the ASL alphabet database and Figure 11 illustrates a sample of the images.

5.1.2. American Sign Language (ASL MNIST)

ASL MNIST consists of a collection of 34,627 grayscale images with a dimension of 28 × 28 pixels. This database has 24 labeled classes in a range from 0 to 25 with assignment for each letter of the alphabet A–Z (the class 9 = J and 25 = Z, were excluded due to gestural movements). Table 6 presents a description of this database and Figure 12 illustrates a sample of the sign images.

5.1.3. Mexican Sign Language (MSL Alphabet)

The MSL alphabet database was obtained from a group of 18 people, including deaf students and sign language translation teachers. Students are part of an inclusive group in a high school in Mexico. This database consists of 21 classes with the alphabet of the MSL without movement as illustrated in Figure 13. Ten images were captured for each letter, achieving a total of 3780 grayscale images with a dimension of 32 by 32. Table 7 displays a general overview of the MSL alphabet database.

5.2. Parameters Used in the Experimentation

In the CNN parameter settings, some static parameters were used, including the learning function, the activation function in the classifying layer, the non-linearity activation function, and the epoch number. The fixed parameters considered in the PSO configuration are the number of particles, the iterations number, the inertial weight, and the social and cognitive constants. The static parameters used for PSO and CNN are presented in Table 8. The dynamic parameters optimized by PSO are the number of convolutional layers, the size of the filters used in each convolutional layer, the number of convolutional filters, and the batch size (Table 1 and Table 3).

5.3. Optimization Results Obtained by the PSO-CNN-I Approach

This section presents the simulation results produced after the CNN architecture is optimized considering the approach described in Section 4.1. The experimentation consists of 30 executions carried out on the three databases; the aim is to obtain the optimal CNN architecture, that is, minimum parameters necessary to maximize the recognition rate.
The first experiment was applied in the ASL alphabet (Table 5), using a distribution of 80% of the total images for training and 20% for testing. Table 9 shows the values achieved after 30 executions, where the higher recognition rate was a value of 99.87% and the mean was 99.58%. Based on the results, we can see that the optimal architecture achieved by the PSO was as follows: three convolutional layers, 128 filters per layer with a filter size of 7 × 7, and the batch size with a value of 256.
In another test, the PSO-CNN-I approach was applied to the ASL MNIST database; Table 10 presents the results achieved by the CNN where the best recognition rate was a value of 98.82% and the mean of 99.53%. According to this analysis, the optimal architecture for this study case is two convolutional layers, with 117 convolutional filters in both layers with a filter size of 7 × 7 and the batch size with a value of 129.
Table 11 presents the experimental results obtained when the approach is applied in the MSL alphabet database. As we can see in Table 11, the best accuracy reached by the CNN was 99.37% with a mean of 99.10%. In this case, the optimal architecture is as follows: one convolutional layer with 122 convolutional filters, filter size of 3 × 3, and batch size of 128.

5.4. Optimization Results Obtained by the PSO-CNN-II Approach

The results presented in this section consist of 30 executions of the PSO-CNN-II approach applied in the ASL alphabet, ASL MNIST, and MSL alphabet databases; the objective is to maximize the recognition rate and minimize the value of AIC.
The experimental results obtained from the ASL alphabet database after applying the PSO-CNN-II optimization approach (Section 4.2) are presented in Table 12. In this test, the database was distributed so that 70% of the data were kept for the training phase and 30% of the data for testing. Table 12 shows the best recognition rate with a value of 99.23% and a mean of 98.69%. The best architecture found by the PSO for the CNN had the following structure: three convolutional layers where the first layer had 84 convolutional filters and 3 × 3 size filters; the second layer with 128 convolutional filters with the size of 9 × 9 and the third layer with 128 convolutional filters and 7 × 7 size filters. In this approach, the objective function is composed of the recognition rate and the AIC value; that is, the best recognition rate is evaluated first and then the AIC value, if it was the case that CNN achieved two or more architectures with the same recognition rate, the process takes the architecture with the minimum AIC, with the goal of achieving an optimal architecture with the fewest parameters and the highest recognition rate.
In Table 13, we present the results where the PSO-CNN-II was implemented in the ASL MNIST database. In this test, the best recognition rate was 99.80%, an AIC value of 506.79 and, a mean of 99.48%. The optimal parameters found by the PSO were the following: two-layer CNN architecture, the first layer had 128 filters of convolution and a filter size of 5 × 5; the second layer had 128 convolutional filters with a filter size of 9 × 9, and the batch size was 128.
In another experiment, the optimization approach was applied to the MSL alphabet database after 30 simulations. The results obtained are presented in Table 14, where the best recognition rate was 99.45% with an AIC of 248.79. The general mean for this study case was a value of 98.91%. In this optimization, one-layer CNN architecture was achieved, with 128 convolutional filters, 3 × 3 filter sizes, and 154 batch sizes.

5.5. Statistical Test between PSO-CNN-I and PSO-CNN-II Optimization Process

Table 15 presents a summary of the results obtained after the two approaches were applied to the three databases. We can see that good results were achieved in all the cases; we can analyze that for the ASL alphabet and the ASL MNIST, the PSO-CNN-I optimization approach was better with mean values of 99.58% and 99.53%, respectively. For the MSL alphabet database, the PSO-CNN-II optimization method achieved a better recognition rate with a mean value of 98.91%. Although, if the results were analyzed with respect to the AIC value, for the ASL MNIST and the MSL alphabet, the PSO-CNN-I reached the lowest values with AIC of 462.79 and 236.80, respectively, and for ASL alphabet, the PSO-CNN-II achieved a better AIC value. A low AIC value means that the CNN architecture required fewer parameters, so it is important to determine what is most relevant to any problem, the CNN accuracy, or to configure the CNN architectures with minimal parameters that can be implemented in real-time systems.
To confirm if significant evidence exists between the architectures and to identify which is better, the Wilcoxon signed-rank test was applied [50]; this is a non-parametric test that is recommended to be applied when the numerical data are not normally distributed, as is the case with the experimental results of metaheuristic algorithms. The Wilcoxon test was performed to compare the PSO-CNN-I and PSO-CNN-II optimization processes, considering the results presented in Table 9, Table 10, Table 11, Table 12, Table 13 and Table 14. The general description of the values used to execute the Wilcoxon test is presented in Table 16 and described below:
  • A confidence level of 95% (α = 0.05).
  • The null hypothesis is given that ( H 0 ): the PSO-CNN-I architecture ( μ 1 ) is equal to PSO-CNN-II architecture ( μ 2 ), expressed as H 0 : μ 1 =   μ 2 .
  • The alternative hypothesis is ( H 1 ): affirm that PSO-CNN-I architecture ( μ 1 ) is greater than that PSO-CNN-II architecture ( μ 2 ), expressed as H 1 :   μ 1 >   μ 2 Affirmation .
  • The objective is to reject the hypothesis null ( H 0 ) and support the alternative hypothesis ( H 1 ).
The first Wilcoxon test was applied for the ASL alphabet results (Table 9 and Table 12). Table 17 shows the R+, R, and the p-value (the p-values have been computed by using SPSS), where R+ represents the sum of ranks for the problems in which the first algorithm outperformed the second, and R− the sum of ranks for the opposite. The results obtained indicate an R+ of 455, an R of 10 and the p-value of <0.001. Because the p-value is less than the alpha value of α = 0.05, then we support the alternative hypothesis with a 95% level of evidence, and we can affirm that the PSO-CNN-I architecture is better than the PSO-CNN-II.
Table 18 presents the results after the Wilcoxon test was applied for the ASL MNIST results (Table 10 and Table 13). This test obtains the values R+ = 245.5, R = 189.5, and the p-value = 0.545. Since the p-value is greater than the alpha value of α = 0.05, the null hypothesis is accepted with a 95% level of evidence; therefore, we can affirm that evidence does not exist to determine that the PSO-CNN-I architecture is better than the PSO-CNN-II.
Finally, Table 19 presents the Wilcoxon test for the results of the MSL alphabet (Table 11 and Table 14). The results obtained indicate the values R+ = 291, R = 115, and the p-value = 0.045. We can see that the p-value is less than the alpha value of α = 0.05; therefore, we support the alternative hypothesis with a level of evidence of 95%, and we can affirm that the PSO-CNN-I architecture is better than the PSO-CNN-II.

5.6. State-of-the-Art Analysis Comparison

To obtain more evidence about the performance of the optimization approaches presented in this paper, we make a comparative analysis (Table 20) against the state-of-art research, where CNN models are implemented in Alphabet Sign Language database recognition. The results presented in Table 20 represent the best recognition rate values reported by the authors, the detail of which is explained as follows: Zhao et al. [51] reports an accuracy of 89.32%, the CNN architecture has two convolutional layers, two pooling layers, the batch size is 150, and 80 iterations. The authors generated their own ASL database, this was captured in five people covering 24 letters of the alphabet, and each person’s letters had about 528 photos.
Rathi [52] presents an optimization of the transfer learning model (based on CNN) and it was applied to the ASL MNIST database, using 27,455 images of 24 letters of the ASL alphabet. The data split was as follows, 80% of the data was for training, 10% for testing, and 10% of data for validation purposes with a training batch size of 100. The best recognition rate evidenced by the author was a value of 95.03%.
In Bin et al. [53], an architecture of four convolutional layers and two pooling layers was presented. The database was generated by the researchers themselves, taking characteristics of the ASL MINIST and consisting of 4800 images; the best accuracy reported by the authors was 95.00%.
Dionisio et al. [54], reported a recognition rate of 97.64% for the ASL MNIST with a six-layer convolutional architecture, three pooling layers, a filter size of 3 × 3, and a batch size of 128. The database was divided using 10% of data for phase testing, 10% for phase validation, and 80% for training.
Finally, we present the recognition rate achieved by our two approaches PSO-CNN-I and PSO-CNN-II with the best values of recognition rates of 99.98% and 99.80%, respectively. In the PSO-CNN-II, the best architecture obtained for the ASL MINIST was of two layers with 117 filters per layer with a size filter of 7 × 7 and batch size of 129. For the PSO-CNN-I, it was of two layers with 117 filters per layer with a size filter of 7 × 7 and batch size of 129.
As one can observe in Table 20, the highest performance was obtained by the proposed model (PSO-CNN-I) with a value of 99.98%, achieving an advantage over the rest of the approaches.

6. Conclusions and Future Work

In summary, in this paper, we present two approaches to optimize CNN architectures by implementing the PSO algorithm, these being applied to sign language recognition. The main contribution was to find some CNN hyper-parameters; in the proposals the number of convolutional layers, the size of the filter used in each convolutional layer, the number of convolutional filters, and the batch size were included. According to the experimentation and the results obtained in the two PSO-CNN optimization methodologies, we can conclude that the recognition rate increased in all case studies carried out, providing a robust performance with the minimum parameters. Overall, the recognition rates achieved by the three databases were as follows: for the ASL MNIST database, the best value was 99.98% and an average of 99.53% with the PSO-CNN-I approach. For the ASL alphabet database, the best accuracy was 99.87% and an average of 99.58% with PSO-CNN-I, and for the MSL alphabet, the best value was 99.45% and an average of 98.91% after applying the PSO-CNN-II approach. After a comparative analysis against other state-of-the-art works focused on sign language recognition (ASL and MSL), we can confirm that the optimization approaches of this work present competitive results.
This research focused on optimizing the number of convolutional layers, the filter size used in each convolutional layer, the number of convolutional filters, and the batch size. The results provide evidence of the importance of applying optimization algorithms to find the optimal parameters of convolutional neural network architectures.
As future work, the PSO algorithm could be applied to optimize other CNN hyper-parameters, implement another version of the PSO algorithm or explore different evolutionary computational techniques, to produce more robust CNN architectures that will be implemented in different sign language datasets used in other countries. In the experimental test, the images were introduced as static images, but we are considering working with input images in real-time or capturing them through video. On the other hand, our idea is to be able to implement the use of this proposal in the development of assisted communication tools and to contribute to human−computer iteration applications that can be of support to the deaf community.

Author Contributions

Individual contributions by the authors are the following: formal analysis, C.I.G.; conceptualization, writing—review and editing G.E.M. and C.I.G.; methodology, G.E.M.; investigation, software, data curation, and writing—original draft preparation, J.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

We thank CONACYT for the financial support provided with the scholarship number: 954950 and our gratitude to the program of the Division of Graduate Studies and Research of the Tijuana Institute of Technology.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Hemanth, J.D.; Deperlioglu, O.; Kose, U. An enhanced diabetic retinopathy detection and classification approach using deep convolutional neural network. Neural Comput. Appl. 2020, 32, 707–721. [Google Scholar] [CrossRef]
  2. Li, P.; Li, J.; Wang, G. Application of Convolutional Neural Network in Natural Language Processing. IEEE Access 2018, 64–70. [Google Scholar] [CrossRef]
  3. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar]
  4. Liang, S.D. Optimization for Deep Convolutional Neural Networks: How Slim Can It Go? IEEE Trans. Emerg. Top. Comput. Intell. 2020, 4, 171–179. [Google Scholar] [CrossRef]
  5. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; The MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
  6. Sun, Y.; Xue, B.; Zhang, M.; Yen, G.G. Evolving Deep Convolutional Neural Networks for Image Classification. IEEE Trans. Evol. Comput. 2020, 24, 394–407. [Google Scholar] [CrossRef] [Green Version]
  7. Sun, Y.; Yen, G.G.; Yi, Z. Evolving Unsupervised Deep Neural Networks for Learning Meaningful Representations. IEEE Trans. Evol. Comput. 2019, 23, 89–103. [Google Scholar] [CrossRef]
  8. Ma, B.; Li, X.; Xia, Y.; Zhang, Y. Autonomous deep learning: A genetic DCNN designer for image classification. Neurocomputing 2020, 379, 152–161. [Google Scholar] [CrossRef] [Green Version]
  9. Baldominos, A.; Saez, Y.; Isasi, P. Evolutionary convolutional neural networks: An application to handwriting recognition. Neurocomputing 2018, 283, 38–52. [Google Scholar] [CrossRef]
  10. Poma, Y.; Melin, P.; Gonzalez, C.I.; Martinez, G.E. Optimization of Convolutional Neural Networks Using the Fuzzy Gravitational Search Algorithm. J. Autom. Mob. Robot. Intell. Syst. 2020, 14, 109–120. [Google Scholar] [CrossRef]
  11. Poma, Y.; Melin, P.; Gonzalez, C.I.; Martinez, G.E. Filter Size Optimization on a Convolutional Neural Network Using FGSA. In Intuitionistic and Type-2 Fuzzy Logic Enhancements in Neural and Optimization Algorithms; Springer: Cham, Switzerland, 2020; Volume 862, pp. 391–403. [Google Scholar]
  12. Poma, Y.; Melin, P.; Gonzalez, C.I.; Martinez, G.E. Optimal Recognition Model Based on Convolutional Neural Networks and Fuzzy Gravitational Search Algorithm Method. In Hybrid Intelligent Systems in Control, Pattern Recognition and Medicine; Springer: Cham, Switzerland, 2020; Volume 827, pp. 71–81. [Google Scholar]
  13. Lee, W.-Y.; Park, S.-M.; Sim, K.-B. Optimal hyperparameter tuning of convolutional neural networks based on the parameter-setting-free harmony search algorithm. Optik 2018, 172, 359–367. [Google Scholar] [CrossRef]
  14. Wang, B.; Sun, Y.; Xue, B.; Zhang, M. A hybrid differential evolution approach to designing deep convolutional neural networks for image classification. In Proceedings of the Australasian Joint Conference on Artificial Intelligence, Wellington, New Zealand, 11–14 December 2018; Springer: Cham, Switzerland, 2018; pp. 237–250. [Google Scholar]
  15. Gülcü, A.; KUş, Z. Hyper-Parameter Selection in Convolutional Neural Networks Using Microcanonical Optimization Algorithm. IEEE Access 2020, 8, 52528–52540. [Google Scholar] [CrossRef]
  16. Zhang, N.; Cai, Y.; Wang, Y.; Tian, Y.; Wang, X.; Badami, B. Skin cancer diagnosis based on optimized convolutional neural network. Artif. Intell. Med. 2020, 102, 101756. [Google Scholar] [CrossRef]
  17. Tuba, E.; Bacanin, N.; Jovanovic, R.; Tuba, M. Convolutional Neural Network Architecture Design by the Tree Growth Algorithm Framework. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 17–19 July 2019; pp. 1–8. [Google Scholar]
  18. Sun, Y.; Xue, B.; Zhang, M.; Yen, G.G. A particle swarm optimization based flexible convolutional autoencoder for image classification. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2295–2309. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  19. Singh, P.; Chaudhury, S.; Panigrahi, B.K. Hybrid MPSO-CNN: Multi-level Particle Swarm optimized hyperparameters of Convolutional Neural Network. Swarm Evol. Comput. 2021, 63, 100863. [Google Scholar] [CrossRef]
  20. Wang, B.; Sun, Y.; Xue, B.; Zhang, M. Evolving deep convolutional neural networks by variable-length particle swarm optimization for image classification. In Proceedings of the 2018 IEEE Congress on Evolutionary Computation (CEC), Rio de Janeiro, Brazil, 8–13 July 2018; Volume 1–8. [Google Scholar]
  21. Gonzalez, B.; Melin, P.; Valdez, F. Particle Swarm Algorithm for the Optimization of Modular Neural Networks in Pattern Recognition. Hybrid Intell. Syst. Control Pattern Recognit. Med. 2019, 827, 59–69. [Google Scholar]
  22. Varela-Santos, S.; Melin, P. Classification of X-Ray Images for Pneumonia Detection Using Texture Features and Neural Networks. In Intuitionistic and Type-2 Fuzzy Logic Enhancements in Neural and Optimization Algorithms: Theory and Applications; Springer: Cham, Switzerland, 2020; Volume 862, pp. 237–253. [Google Scholar]
  23. Miramontes, I.; Melin, P.; Prado-Arechiga, G. Particle Swarm Optimization of Modular Neural Networks for Obtaining the Trend of Blood Pressure. In Intuitionistic and Type-2 Fuzzy Logic Enhancements in Neural and Optimization Algorithms: Theory and Applications; Springer: Cham, Switzerland, 2020; Volume 862, pp. 225–236. [Google Scholar]
  24. Peter, S.E.; Reglend, I.J. Sequential wavelet-ANN with embedded ANN-PSO hybrid electricity price forecasting model for Indian energy exchange. Neural Comput. Appl. 2017, 28, 2277–2292. [Google Scholar] [CrossRef]
  25. Sánchez, D.; Melin, P.; Castillo, O. Comparison of particle swarm optimization variants with fuzzy dynamic parameter adaptation for modular granular neural networks for human recognition. J. Intell. Fuzzy Syst. 2020, 38, 3229–3252. [Google Scholar] [CrossRef]
  26. Fernandes, F.E.; Yen, G.G. Particle swarm optimization of deep neural networks architectures for image classification. Swarm Evol. Comput. 2019, 49, 62–74. [Google Scholar] [CrossRef]
  27. Santucci, V.; Milani, A.; Caraffini, F. An Optimisation-Driven Prediction Method for Automated Diagnosis and Prognosis. Mathematics 2019, 7, 1051. [Google Scholar] [CrossRef] [Green Version]
  28. Zhou, G.; Moayedi, H.; Bahiraei, M.; Lyu, Z. Employing artificial bee colony and particle swarm techniques for optimizing a neural network in prediction of heating and cooling loads of residential buildings. J. Clean. Prod. 2020, 254, 120082. [Google Scholar] [CrossRef]
  29. Karaboga, D.; Gorkemli, B.; Ozturk, C.; Karaboga, N. A comprehensive survey: Artificial bee colony (ABC) algorithm and applications. Artif. Intell. Rev. 2014, 42, 21–57. [Google Scholar] [CrossRef]
  30. Xianwei, J.; Lu, M.; Wang, S.-H. An eight-layer convolutional neural network with stochastic pooling, batch normalization and dropout for fingerspelling recognition of Chinese sign language. Spinger Multimed. Tools Appl. 2019, 79, 15697–15715. [Google Scholar]
  31. Hayami, S.; Benaddy, M.; El Meslouhi, O.; Kardouchi, M. Arab Sign language Recognition with Convolutional Neural Networks. In Proceedings of the 2019 International Conference of Computer Science and Renewable Energies (ICCSRE), Agadir, Morocco, 22–24 July 2019. [Google Scholar]
  32. Huang, J.; Zhou, W.; Li, H.; Li, W. Attention-Based 3D-CNNs for Large-Vocabulary Sign Language Recognition. IEEE Trans. Circ. Syst. Video Technol. 2019, 29, 2822–2832. [Google Scholar] [CrossRef]
  33. Kaggle. American Sign Language Dataset. 2018. Available online: https://www.kaggle.com/grassknoted/asl-alphabet (accessed on 10 February 2020).
  34. Kaggle. Sign Language MNIST. 2017. Available online: https://www.kaggle.com/datamunge/sign-language-mnist (accessed on 8 February 2020).
  35. Rastgoo, R.; Kiani, K.; Escalera, S. Sign Language Recognition: A Deep Survey. Expert Syst. Appl. 2021, 164, 113794. [Google Scholar] [CrossRef]
  36. Hubel, D.H.; Wiesel, T.N. Receptive fields of single neurons in the cat’s striate cortex. J. Physiol. 1959, 148, 574–591. [Google Scholar] [CrossRef]
  37. Kim, P. Matlab Deep Learning; Apress: Seoul, Korea, 2017. [Google Scholar]
  38. Cheng, J.; Wang, P.-s.; Li, G.; Hu, Q.-h.; Lu, H.-q. Recent advances in efficient computation of deep convolutional neural networks. Front. Inf. Technol. Electron. Eng. 2018, 19, 64–77. [Google Scholar] [CrossRef] [Green Version]
  39. Zou, Z.; Shuai, B.; Wang, G. Learning Contextual Dependence with Convolutional Hierarchical Recurrent Neural Networks. IEEE Trans. Image Process. 2016, 25, 2983–2996. [Google Scholar]
  40. Fukushima, K. A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 1980, 36, 193–202. [Google Scholar] [CrossRef]
  41. Schmidhuber, J. Deep learning in neural networks: An overview. Elsevier Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef] [Green Version]
  42. Aggarwal, C.C. Neural Networks and Deep Learning; Springer Nature: Cham, Switzerland, 2018. [Google Scholar]
  43. Jang, J.; Sun, C.; Mizutani, E. Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence; Prentice-Hall: Upper Saddle River, NJ, USA, 1997. [Google Scholar]
  44. Kennedy, J.; Eberhart, R.C. Particle swarm optimization. In Proceedings of the IEEE International Conference on Neural Networks IV, Washington, DC, USA, 27 November–1 December 1995; pp. 1942–1948. [Google Scholar]
  45. Sandeep, R.; Sanjay, J.; Rajesh, K. A review on particle swarm optimization algorithms and their applications to data clustering. J. Artif. Intell. 2011, 35, 211–222. [Google Scholar]
  46. Hasan, J.; Ramakrishnan, S. A survey: Hybrid evolutionary algorithms for cluster analysis. Artif. Intell. Rev. 2011, 36, 179–204. [Google Scholar] [CrossRef]
  47. Fielding, B.; Zhang, L. Evolving Image Classification Architectures with Enhanced Particle Swarm Optimisation. IEEE Access 2018, 6, 68560–68575. [Google Scholar] [CrossRef]
  48. Sedighizadeh, D.; Masehian, E. A particle swarm optimization method, taxonomy and applications. Proc. Int. J. Comput. Theory Eng. 2009, 5, 486–502. [Google Scholar] [CrossRef] [Green Version]
  49. Gaxiola, F.; Melin, P.; Valdez, F.; Castro, J.R.; Manzo-Martínez, A. PSO with Dynamic Adaptation of Parameters for Optimization in Neural Networks with Interval Type-2 Fuzzy Numbers Weights. Axioms 2019, 8, 14. [Google Scholar] [CrossRef] [Green Version]
  50. Derrac, J.; García, S.; Molina, D.; Herrera, F. A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol. Comput. 2011, 1, 3–18. [Google Scholar] [CrossRef]
  51. Zhao, Y.; Wang, L. The Application of Convolution Neural Networks in Sign Language Recognition. In Proceedings of the 2018 Ninth International Conference on Intelligent Control and Information Processing (ICICIP), Wanzhou, China, 9–11 November 2018; pp. 269–272. [Google Scholar]
  52. Rathi, D. Optimization of Transfer Learning for Sign Language Recognition Targeting. Int. J. Recent Innov. Trends Comput. Commun. 2018, 6, 198–203. [Google Scholar]
  53. Bin, L.Y.; Huann, G.Y.; Yun, L.K. Study of Convolutional Neural Network in Recognizing Static American Sign Language. In Proceedings of the 2019 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), Kuala Lumpur, Malaysia, 17–19 September 2019; pp. 41–45. [Google Scholar]
  54. Rodriguez, R.; Gonzalez, C.I.; Martinez, G.E.; Melin, P. An improved Convolutional Neural Network based on a parameter modification of the convolution layer. In Fuzzy Logic Hybrid Extensions of Neural and Optimization Algorithms: Theory and Applications; Springer: Cham, Switzerland, 2021; pp. 125–147. [Google Scholar]
Figure 1. The minimal architecture of a CNN.
Figure 1. The minimal architecture of a CNN.
Axioms 10 00139 g001
Figure 2. Layers and the parameters per layer of a CNN.
Figure 2. Layers and the parameters per layer of a CNN.
Axioms 10 00139 g002
Figure 3. Feature maps generated by the convolution process.
Figure 3. Feature maps generated by the convolution process.
Axioms 10 00139 g003
Figure 4. Examples of pooling using the mean, max and sum operation.
Figure 4. Examples of pooling using the mean, max and sum operation.
Axioms 10 00139 g004
Figure 5. Representation of the movement of the particle.
Figure 5. Representation of the movement of the particle.
Axioms 10 00139 g005
Figure 6. General CNN optimization process using the PSO algorithm. The letters A, B, and C stand the illustrated sign language in image form.
Figure 6. General CNN optimization process using the PSO algorithm. The letters A, B, and C stand the illustrated sign language in image form.
Axioms 10 00139 g006
Figure 7. Flowchart of CNN optimization process using PSO.
Figure 7. Flowchart of CNN optimization process using PSO.
Axioms 10 00139 g007
Figure 8. Structure of the particle used in the PSO-CNN-I approach.
Figure 8. Structure of the particle used in the PSO-CNN-I approach.
Axioms 10 00139 g008
Figure 9. Structure of the particle used in the PSO-CNN-II approach.
Figure 9. Structure of the particle used in the PSO-CNN-II approach.
Axioms 10 00139 g009
Figure 10. Example of a particle generated by PSO.
Figure 10. Example of a particle generated by PSO.
Axioms 10 00139 g010
Figure 11. A sample of the ASL alphabet database.
Figure 11. A sample of the ASL alphabet database.
Axioms 10 00139 g011
Figure 12. A sample of the ASL MNIST database.
Figure 12. A sample of the ASL MNIST database.
Axioms 10 00139 g012
Figure 13. Sample of the MSL alphabet database.
Figure 13. Sample of the MSL alphabet database.
Axioms 10 00139 g013
Table 1. Search spaces used to define the particle in the PSO-CNN-I approach.
Table 1. Search spaces used to define the particle in the PSO-CNN-I approach.
Particle CoordinateHyper-ParameterSearch Space
x 1 Number of convolutional layers[1, 3]
x 2 Filter number[32, 128]
x 3 Filter size[1, 4]
x 4 Batch size in the training[32, 256]
Table 2. Convolutional filter dimensions for the x 3 position.
Table 2. Convolutional filter dimensions for the x 3 position.
x 3   Value Search Space
1[3, 3]
2[5, 5]
3[7, 7]
4[9, 9]
Table 3. Search spaces used to define the particle in the PSO-CNN-II approach.
Table 3. Search spaces used to define the particle in the PSO-CNN-II approach.
Particle CoordinateHyper-ParameterSearch Space
x 1 Convolutional layer number[1, 3]
x 2 Filter number (layer 1)[32, 128]
x 3 Filter size (layer 1)[1, 4]
x 4 Filter number (layer 2)[32, 128]
x 5 Filter size (layer 2)[1, 4]
x 6 Filter number (layer 3)[32, 128]
x 7 Filter size (layer 3)[1, 4]
x 8 Batch size in the training[32, 256]
Table 4. Objective function values based on the recognition rate and the AIC value.
Table 4. Objective function values based on the recognition rate and the AIC value.
Architecture NumberRecognition Rate (%)AIC Value
198.50466.78
298.50350.85
Table 5. ASL Alphabet database description.
Table 5. ASL Alphabet database description.
NameASL Alphabet Detail
Total images87,000
Images for training82,650
Images for test4350
Images size32 × 32
Database formatJPGE
Table 6. ASL MNIST database description.
Table 6. ASL MNIST database description.
NameASL MNIST Detail
Total images34,627
Images for training24,239
Images for test10,388
Images size28 × 28
Database formatCSV
Table 7. MSL alphabet database description.
Table 7. MSL alphabet database description.
NameMSL Alphabet Detail
Total images3780
Images for training2646
Images for test1134
Images size32 × 32
Database formatJPG
Table 8. Static parameters for CNN and PSO.
Table 8. Static parameters for CNN and PSO.
Parameters of CNN
Learning functionAdam
Activation function (classifying layer)Softmax
Non-linearity activation functionReLU
Epochs5
Parameters of PSO
Particles10
Iterations10
Inertial weight (W)0.85
Social constant (W2)2
Cognitive constant (W1)2
Table 9. Results achieved by the PSO-CNN-I in ASL alphabet database.
Table 9. Results achieved by the PSO-CNN-I in ASL alphabet database.
No.No.
Layers
No.
Filters
Filter
Size
Batch
Size
Recognition
Rate (%)
1399[7 × 7]10798.85
23104[9 × 9]25699.66
33128[9 × 9]25699.70
43128[7 × 7]25699.79
53128[9 × 9]25699.72
63128[7 × 7]25699.62
7232[7 × 7]25698.18
83109[7 × 7]25699.73
93128[7 × 7]19799.75
103128[7 × 7]25699.81
11366[7 × 7]18199.31
123118[7 × 7]25699.87
133128[9 × 9]25699.67
143128[7 × 7]25699.85
153128[9 × 9]25699.61
163128[9 × 9]25699.63
17390[9 × 9]25699.66
183128[7 × 7]25699.82
193128[7 × 7]25699.79
203128[7 × 7]25699.76
213128[9 × 9]25699.68
223128[9 × 9]25699.67
233128[7 × 7]25699.75
243123[7 × 7]3298.38
253128[9 × 9]25699.64
263128[7 × 7]25699.82
273128[9 × 9]21599.56
283128[7 × 7]25699.87
293100[9 × 9]25699.64
303128[7 × 7]25699.84
Mean99.58
Table 10. Results achieved by the PSO-CNN-I in ASL MNIST database.
Table 10. Results achieved by the PSO-CNN-I in ASL MNIST database.
No.No.
Layers
No.
Filters
Filter
Size
Batch
Size
Recognition
Rate (%)
13128[9 × 9]13799.27
22128[9 × 9]21899.54
32128[7 × 7]20599.52
43128[7 × 7]13699.33
52128[9 × 9]23299.59
6396[9 × 9]10798.82
72118[7 × 7]18999.36
82128[9 × 9]25699.59
92112[9 × 9]25699.49
102128[9 × 9]25699.60
112128[7 × 7]25699.59
122128[7 × 7]25699.61
132128[9 × 9]22099.67
142128[9 × 9]25699.57
152128[9 × 9]25699.51
162128[7 × 7]23799.55
172128[7 × 7]25699.61
182128[9 × 9]25699.58
192128[9 × 9]25699.53
202128[9 × 9]25699.65
212128[7 × 7]14899.42
222128[9 × 9]25699.51
232128[9 × 9]21599.53
242128[9 × 9]25599.56
252128[9 × 9]25699.65
262128[7 × 7]25699.57
272128[9 × 9]25699.53
282117[7 × 7]12999.98
293128[5 × 5]24299.87
302128[7 × 7]25699.55
Mean99.53
Table 11. Results achieved by the PSO-CNN-I in MSL alphabet database.
Table 11. Results achieved by the PSO-CNN-I in MSL alphabet database.
No.No.
Layers
No.
Filters
Filter
Size
Batch
Size
Recognition
Rate (%)
12101[7 × 7]9398.95
21128[3 × 3]5698.95
31110[3 × 3]5298.82
41128[3 × 3]12199.20
51128[3 × 3]12899.32
61128[3 × 3]12899.07
71128[3 × 3]11099.24
81101[5 × 5]11498.82
91128[3 × 3]12899.24
10174[3 × 3]8898.95
111128[3 × 3]12899.32
121128[3 × 3]3298.48
131128[3 × 3]9399.28
141128[3 × 3]9799.11
151128[3 × 3]3298.74
161128[3 × 3]7299.32
171128[3 × 3]9399.37
18163[3 × 3]4798.44
191128[3 × 3]12899.20
201126[3 × 3]12899.28
211128[3 × 3]12899.32
221128[3 × 3]8399.20
231128[3 × 3]6399.20
241122[3 × 3]12899.37
251128[3 × 3]12899.32
261114[3 × 3]8499.32
271128[3 × 3]3298.74
281128[3 × 3]8999.28
29143[3 × 3]5397.81
301128[3 × 3]7298.99
Mean99.10
Table 12. Results achieved by the PSO-CNN-II in ASL alphabet database.
Table 12. Results achieved by the PSO-CNN-II in ASL alphabet database.
No.No. LayersLayer 1Layer 2Layer 3Batch
Size
AIC
Value
(%)
Recogn.
Rate
No.
Filters
Filter
Size
No.
Filters
Filter
Size
No.
Filters
Filter
Size
13128[3 × 3]128[7 × 7]128[5 × 5]256764.7898.99
23128[5 × 5]121[5 × 5]128[5 × 5]213750.7898.73
3384[3 × 3]128[7 × 7]128[5 × 5]84676.7899.23
4245[5 × 5]128[7 × 7]000340.7898.15
5332[3 × 3]128[7 × 7]128[3 × 3]256572.7898.86
63128[3 × 3]128[7 × 7]128[5 × 5]256764.7898.96
7384[5 × 5]128[5 × 5]128[3 × 3]256676.7898.85
83128[3 × 3]128[7 × 7]128[5 × 5]256764.7899.02
9332[5 × 5]128[5 × 5]128[3 × 3]256572.7898.9
103124[3 × 3]128[7 × 7]128[3 × 3]256756.7898.64
11332[3 × 3]128[7 × 7]128[7 × 7]256572.7898.93
12332[3 × 3]128[7 × 7]128[3 × 3]256572.7898.53
133128[3 × 3]128[7 × 7]128[5 × 5]256764.7899.01
14373[7 × 7]128[7 × 7]108[3 × 3]256614.7897.91
153128[3 × 3]128[7 × 7]128[5 × 5]256764.7899.06
162128[3 × 3]128[7 × 7]000506.7898.23
17288[7 × 7]128[7 × 7]000426.7897.4
18332[5 × 5]128[7 × 7]128[5 × 5]256572.7899.06
192128[5 × 5]119[7 × 7]000488.7898.1
203116[3 × 3]128[5 × 5]128[7 × 7]252740.7898.96
21349[5 × 5]128[7 × 7]128[7 × 7]256606.7898.93
222128[3 × 3]128[7 × 7]000506.7898.19
23332[5 × 5]128[7 × 7]128[5 × 5]256572.7898.96
24232[5 × 5]128[7 × 7]000314.7898.04
252128[5 × 5]81[5 × 5]000412.7898.92
26332[5 × 5]128[7 × 7]128[3 × 3]256572.7898.58
27332[3 × 3]128[7 × 7]128[5 × 5]256572.7898.88
283128[3 × 3]128[7 × 7]128[5 × 5]256764.7899.02
293128[3 × 3]128[7 × 7]128[3 × 3]256764.7899.08
303128[3 × 3]128[7 × 7]128[3 × 3]256764.7898.71
Mean98.69
Table 13. Results achieved by the PSO-CNN-II in ASL MNIST database.
Table 13. Results achieved by the PSO-CNN-II in ASL MNIST database.
No.No. LayersLayer 1Layer 2Layer 3Batch
Size
AIC
Value
(%)
Recogn.
Rate
No.
Filters
Filter
Size
No.
Filters
Filter
Size
No.
Filters
Filter
Size
12128[5 × 5]128[9 × 9]00128506.7999.80
2274[9 × 9]114[9 × 9]00174370.7999.42
3332[5 × 5]128[9 × 9]128[5 × 5]122572.7999.53
42125[5 × 5]125[9 × 9]00147503.7999.58
5290[5 × 5]128[9 × 9]00256500.7999.68
6332[3 × 3]128[9 × 9]128[9 × 9]148572.7999.51
72121[7 × 7]95[9 × 9]00100426.7999.26
8332[7 × 7]128[9 × 9]125[9 × 9]256569.7999.6
9232[9 × 9]126[9 × 9]00106310.7999.4
103115[7 × 7]102[9 × 9]128[7 × 7]215686.7999.42
11232[9 × 9]128[9 × 9]00256314.7999.44
12277[7 × 7]100[9 × 9]00183348.7999.59
13287[7 × 7]128[9 × 9]00256424.7999.7
14232[9 × 9]128[9 × 9]00256314.7999.53
15332[5 × 5]103[9 × 9]125[9 × 9]256516.7999.53
16270[9 × 9]126[9 × 9]00256386.7999.63
17264[7 × 7]128[9 × 9]00256378.7999.7
18332[7 × 7]77[9 × 9]128[9 × 9]256470.7999.36
192128[7 × 7]128[9 × 9]00256506.7999.74
20332[3 × 3]128[9 × 9]128[5 × 5]32572.7998.95
21332[7 × 7]128[9 × 9]123[7 × 7]162577.7999.33
22251[9 × 9]128[9 × 9]00194352.7999.47
23250[7 × 7]128[9 × 9]00256350.7999.63
242128[7 × 7]128[9 × 9]00162506.7999.67
252100[5 × 5]76[5 × 5]0076346.7998.23
26252[9 × 9]128[7 × 7]00256354.7999.54
272128[5 × 5]128[9 × 9]00142506.7999.53
28383[3 × 3]125[9 × 9]00136410.7999.38
293128[5 × 5]128[9 × 9]128[9 × 9]256764.7999.57
30274[7 × 7]120[9 × 9]00256382.7999.72
Mean99.48
Table 14. Results achieved by the PSO-CNN-II in MSL alphabet.
Table 14. Results achieved by the PSO-CNN-II in MSL alphabet.
No.No. LayersLayer 1Layer 2Layer 3BATCH
SIZE
AIC
Value
(%)
Recogn.
Rate
No.
Filters
Filter
Size
No.
Filters
Filter
Size
No.
Filters
Filter
Size
11128[3 × 3]000032248.7998.74
21128[3 × 3]0000163248.7999.28
31116[3 × 3]0000105224.7998.99
4181[3 × 3]000032154.7998.44
51128[3 × 3]0000149248.7998.90
61128[3 × 3]0000221248.7998.57
71128[3 × 3]000057248.7999.24
8167[3 × 3]0000246126.7397.77
91118[3 × 3]0000113228.7999.16
101128[3 × 3]0000154248.7999.45
111103[3 × 3]000092198.7999.03
12165[3 × 3]000032122.7998.32
131128[3 × 3]000094248.7999.07
141128[3 × 3]000090248.7999.11
151112[3 × 3]000097216.7999.24
161128[3 × 3]000032248.7998.74
171128[3 × 3]000046248.7998.65
181128[3 × 3]0000199248.7998.32
191128[3 × 3]0000244248.7999.03
201120[3 × 3]000032232.7999.07
211128[3 × 3]0000105248.7999.16
221128[3 × 3]000077248.7999.03
231108[3 × 3]000084208.7999.07
24154[3 × 3]000032100.7998.44
251102[3 × 3]0000102196.7999.03
261128[3 × 3]0000114248.7999.20
271119[3 × 3]0000256230.7998.61
28198[3 × 3]0000122188.7999.20
291128[3 × 3]000083248.7999.07
301128[3 × 3]0000135248.7999.37
Mean98.91
Table 15. Summary of the results obtained in the PSO-CNN-I and PSO-CNN-II approaches.
Table 15. Summary of the results obtained in the PSO-CNN-I and PSO-CNN-II approaches.
DatabasePSO-CNN-IPSO-CNN-II
BestMeanAICBestMeanAIC
ASL alphabet99.87%99.58%764.7999.23%98.69%676.78
ASL MNIST99.98%99.53%462.7999.80%99.48%506.79
MSL alphabet99.37%99.05%236.8099.45%98.91%248.79
Table 16. General description of the Wilcoxon test.
Table 16. General description of the Wilcoxon test.
DescriptionHypothesis
Null hypothesisPSO-CNN-I architecture ( μ 1 ) = PSO-CNN-II architecture ( μ 2 ) H 0 : μ 1 =   μ 2
Alternative hypothesisPSO-CNN-I architecture ( μ 1 ) > PSO-CNN-II architecture ( μ 2 ), H 1 :   μ 1 >   μ 2 Affirmation
Table 17. Wilcoxon test results for the ASL alphabet.
Table 17. Wilcoxon test results for the ASL alphabet.
Comparison
PSO - CNN - I   μ 1 PSO - CNN - II   ( μ 2 )
R+Rp-Value
ASL alphabet45510<0.001
Table 18. Wilcoxon test results for the ASL MNIST.
Table 18. Wilcoxon test results for the ASL MNIST.
Comparison
PSO - CNN - I   μ 1 PSO - CNN - II   ( μ 2 )
R+Rp-Value
ASL MNIST245.5189.50.545
Table 19. Z-test results for the ASL MNIST.
Table 19. Z-test results for the ASL MNIST.
Comparison
PSO - CNN - I   μ 1 PSO - CNN - II   ( μ 2 )
R+Rp-Value
MSL alphabet2911150.045
Table 20. State-of-the-Art Comparison.
Table 20. State-of-the-Art Comparison.
ReferenceRecognition Rate (%)Dataset
Y. Zhao and L.Wang [40]89.32ASL own
D.Rathi [41]95.03ASL MNIST
L.Y.Bin y Y.Huann [42]95.00ASL own
R. Dionisio [39]97.64ASL MNIST
PSO-CNN-I99.98ASL MNIST
PSO-CNN-II99.80ASL MNIST
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Fregoso, J.; Gonzalez, C.I.; Martinez, G.E. Optimization of Convolutional Neural Networks Architectures Using PSO for Sign Language Recognition. Axioms 2021, 10, 139. https://doi.org/10.3390/axioms10030139

AMA Style

Fregoso J, Gonzalez CI, Martinez GE. Optimization of Convolutional Neural Networks Architectures Using PSO for Sign Language Recognition. Axioms. 2021; 10(3):139. https://doi.org/10.3390/axioms10030139

Chicago/Turabian Style

Fregoso, Jonathan, Claudia I. Gonzalez, and Gabriela E. Martinez. 2021. "Optimization of Convolutional Neural Networks Architectures Using PSO for Sign Language Recognition" Axioms 10, no. 3: 139. https://doi.org/10.3390/axioms10030139

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop