A Novel Image Recognition Method Based on DenseNet and DPRN

Yin, Lifeng; Hong, Pujiang; Zheng, Guanghai; Chen, Huayue; Deng, Wu

doi:10.3390/app12094232

Open AccessArticle

A Novel Image Recognition Method Based on DenseNet and DPRN

by

Lifeng Yin

¹,

Pujiang Hong

²,

Guanghai Zheng

¹,

Huayue Chen

^3,* and

Wu Deng

^4,*

¹

School of Software, Dalian Jiaotong University, Dalian 116028, China

²

School of Computer and Communication Engineering, Dalian Jiaotong University, Dalian 116028, China

³

School of Computer Science, China West Normal University, Nanchong 637002, China

⁴

School of Electronic Information and Automation, Civil Aviation University of China, Tianjin 300300, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2022, 12(9), 4232; https://doi.org/10.3390/app12094232

Submission received: 3 March 2022 / Revised: 13 April 2022 / Accepted: 20 April 2022 / Published: 22 April 2022

(This article belongs to the Special Issue Soft Computing Application to Engineering Design)

Download

Browse Figures

Versions Notes

Abstract

:

Image recognition is one of the important branches of computer vision, which has important theoretical and practical significance. For the insufficient use of features, the single type of convolution kernel and the incomplete network optimization problems in densely connected networks (DenseNet), a novel image recognition method based on DenseNet and deep pyramidal residual networks (DPRN) is proposed in this paper. In the proposed method, a new residual unit based on DPRN is designed, and the idea of a pyramid residual unit is introduced, which makes the input greater than the output. Then, a module based on dilated convolution is designed for parallel feature extraction. Finally, the designed module is fused with DenseNet in order to construct the image recognition model. This model not only overcomes some of the existing problems in DenseNet, but also has the same general applicability as DensenNet. The CIFAR10 and CIFAR100 are selected to prove the effectiveness of the proposed method. The experiment results show that the proposed method can effectively reuse features and has obtained accuracy rates of 83.98 and 51.19%, respectively. It is an effective method for dealing with images in different fields.

Keywords:

image recognition; DenseNet; deep pyramidal residual networks; dilated convolution

1. Introduction

Deep learning [1] is a field of machine learning research. Its purpose is to train computers to complete human behaviors such as autonomous learning, judgment and decision making. It imitates the mechanisms of the human brain to explain data such as images, sounds and text [2]. With the development of the Internet and computer hardware, deep learning has an increasingly wide range of applications in image recognition [3]. In recent years, it has shown amazing capabilities.

Image recognition [4] is a technology that uses computers to analyze and process images to identify targets and objects in various patterns, based on the main features of the image. It is based on the main features of the image. There are many algorithms for image recognition, such as the improved differential evolution algorithm [5,6], enhanced MSI-QDE algorithm [7], neural network algorithm, and so on. Among them, the convolutional neural network [8] is one of the representative algorithms for image recognition using a neural network. The earliest emerging convolutional neural network algorithms are the Time Delay Neural Network [9] and LeNet-5 [10]. In 2012, AlexNet [11], which was designed by Hinton and his student Alex Krizhevsky, won the ImageNet competition and attracted widespread attention. In the ILSVRC competition in 2013, M.D. Zeiler and others improved AlexNet and innovatively added the convolution visualization function to form ZFNet [12], which won first place in the image classification task. At first, neural networks were mainly studied in terms of increasing depth, such as the top two GoogLeNet [13] and VGG [14] in the 2014 ILSVRC competition. VGG deepens the network depth to 19 layers and GoogLeNet to 22 layers. These have achieved good results. As the depth increases, the performance of the convolutional neural network structure becomes stronger and stronger. The problems of over-fitting, the disappearance of gradients, the huge amount of model parameters and difficulty in optimization are also becoming more and more prominent. Researchers have proposed different deep convolutional neural network model structures for different problems. ResNet [15] (Residual Net) was proposed by He. Its internal residual block uses skip connections to alleviate the gradient disappearance problem caused by increasing depth in deep neural networks, and it won the ILSVRC champion in image classification and target recognition in 2015. Since then, a variety of computer vision tasks have mainly used ResNet as the backbone network, while increasing the number of feature maps when increasing the dimension [16]. In 2016, the DPRN [17] (Deep Pyramidal Residual Networks) gradually increased the feature map dimension of all units and improved the residual module on the basis of ResNet to build a pyramid-shaped residual network. It has a good application in hyperspectral image classification [18]. The deep network architecture DenseNet [19] proposed by Huang in 2017 proposed a new approach for feature reuse. Each layer in the model gets additional input from all previous layers and passes its own feature map to all subsequent layers to make each layer densely connected to other layers, which not only slows down the phenomenon of vanishing gradients but also reduces the amounts of parameters. Hu proposed SENet [20] (Squeeze and Excitation Network). It is an architectural unit that aims to improve the representational ability of the network by dynamically recalibrating the channel features of the network. SENet performs a squeeze operation on the feature map obtained by convolution to obtain the global feature. Then, it performs an excitation operation on the global feature to learn the relationship between each channel and also obtains the weights of different channels. Finally, the result is multiplied by the original feature map to obtain the final feature. XIE [21] proposed the ResNeXt network, which replaced the three-layer convolution module of the residual network with a module of the same topology stacked in parallel to reduce the number of hyperparameters and improve computational efficiency. In 2019, Zhang [22] proposed a multi-feature weighted network MFR-DenseNet based on DenseNet for image classification. MFR-DenseNet improves the representational power of DenseNet by adaptively recalibrating channel feature responses and explicitly modeling the inter-dependencies between features of different convolutional layers. Sabour [23] proposed CapsNets (capsule network), the network is constructed by capsules, which is one of the new breakthroughs in deep learning methods. In 2020, Zhang proposed an improved ResNeSt [24] based on a residual network. The network designs a Split-Attention module that divides the feature map into several groups and finer-grained branches along the channel dimension. The feature representation of each group is represented by a weighted combination of its branches. By stacking Split-Attention, the ResNeSt network obtained by the Attention module is easier to transfer to other tasks than classification tasks [25,26,27,28,29,30,31]. In addition, some other methods or algorithms are also proposed, which can deal with images in recent years [31,32,33,34,35,36,37,38,39,40].

Most of these algorithms are implemented based on ResNet or DenseNet, and the image recognition methods formed by them all have their shortcomings. DPRN solves the shortcomings of ResNet to a certain extent, but the shortcomings of DenseNet are rarely mentioned. Therefore, aiming at the problems of insufficient feature utilization, single convolution kernel, and incomplete network optimization in DenseNet, a new convolutional neural network model based on DenseNet is designed by integrating DPRN and improving it. The experimental results show that the accuracy of the proposed model is better than the original model under the same conditions.

The main contributions of this paper are described as follows.

(1): A novel image recognition method based on DenseNet and DPRN is proposed.
(2): A new residual unit based on DPRN is designed, and the idea of a pyramid residual unit is introduced, which makes the input is greater than the output.
(3): A module based on dilated convolution is designed. It increases the types of convolution kernels to enhance feature reuse and improves feature utilization efficiency.

2. Related Work

2.1. Residual Network

Under the premise that the neural network can converge, the traditional neural network will have problems with the increase in the network depth. That is, the performance of the network accuracy first gradually increases to a critical point. Then, it decreases rapidly, which is the problem of network degradation. In this regard, He from Microsoft Research proposed the Residual Network model, which provides two options for the features transmitted from the upper layer, residual mapping and identity mapping. The residual map is constructed based on a residual unit, and features are calculated through different convolution kernels and activation functions. The identity map directly adds the input of the unit to the output of the unit and activates it. It realizes the connection between the upper layer feature and the current layer feature in the form of a skip layer connection. After the network reaches the optimal level, if you continue to deepen the network, the residual map will be set to 0, leaving only the identity map to ensure that the network is always in the optimal state, and the performance of the network will not decrease as the depth increases. Therefore, the residual network alleviates the gradient vanishing problem of the deep network and can be easily implemented with the mainstream automatic differentiation deep learning framework. The Residual Network structure is shown in Figure 1. In Figure 1, X is the value output to the neuron from the previous layer.

2.2. Deep Pyramidal Residual Networks

Deep Pyramidal Residual Networks was proposed by Dongyoon Han in 2017. DPRN provides optimization of the residual unit model based on the residual network. It improves the module performance by removing the first linear rectification function (ReLU) layer and adding a batch normalization (BN) layer at the end. At the same time, DPRN proposes the idea of the bottleneck pyramid residual block. That is, the shape of the network architecture can be compared to the shape of the bottleneck pyramid, as shown in Figure 2 and Figure 3. That is to say, the number of feature map channels in a module decreases first, and then increases, so that the number of output features is greater than the number of input features.

The idea of the bottleneck pyramid-shaped module can be used by any network architecture to improve performance, and this design greatly improves the generalization ability of the model.

2.3. DenseNet

DenseNet is a deep network architecture proposed by Huang in 2017, which not only solves the problem of network degradation on the basis of the residual network but also increases the use of features of each layer. The DenseNet network is composed of multiple DenseBlock and Transition layers in series. Each DenseBlock is mainly composed of a 1 × 1 convolution kernel and a 3 × 3 convolution kernel, and the features of all the previous layers are densely connected with the latter layers to realize feature complexity, which greatly enhances the spread of features. The Transition layer is composed of a 1 × 1 convolution and pooling layer, which reduces the dimension of the input data and controls the feature shrinkage to prevent it from being too large. The transition layer structure is shown in Figure 4 and the DenseNet structure is shown in Figure 5:

Compared with other networks, DenseNet breaks away from the stereotyped thinking of deepening the number of network layers and widening the network structure to improve network performance. From the perspective of feature reuse, setting bypasses and dense connections at the expense of increasing memory not only greatly reduces the number of parameters of the network, but also alleviates the problem of gradient disappearance to a certain extent.

3. Image Recognition Method

3.1. Image Recognition

Image recognition is a technology based on the main features of an image and uses a computer to process and analyze the image to identify targets and objects in different patterns. It is a practical application of deep learning algorithms and an important part of the artificial intelligence field. It is of great significance to industrial production and people’s daily life. The convolutional neural network is a main feature extraction technology in the field of computer vision. Among them, image recognition using DenseNet has been a classic method in this field. However, this method also inevitably has some drawbacks. For the insufficient use of features, the single type of convolution kernel and the incomplete network optimization problems in densely connected networks (DenseNet), a novel image recognition method based on DenseNet and deep pyramidal residual networks (DPRN) is proposed in this paper. In the proposed method, a new residual unit based on DPRN is designed, and the idea of a pyramid residual unit is introduced, which makes the input greater than the output. Then, a module based on dilated convolution is designed for parallel feature extraction. Finally, the designed module is fused with DenseNet in order to construct the image recognition model. In addition, the entire DenseNet model is dominated by 3 × 3 convolution kernels and the types are relatively single. The overall structure still has room for optimization and the use of features can be further strengthened. Some modules in DenseNet are improved and new modules are added. The overall flow of the proposed method is shown in Figure 6.

Input the image to be recognized. A specific model through Dense Pyramidal Net is trained, and then the model is used for image recognition. In addition, the results can also be compared with the labels of the test images to obtain the accuracy of this image recognition. Next, the improvement points of the Dense Pyramidal Net will be gradually analyzed.

3.2. Module Improvements

The DenseBlock module is the main module in the DenseNet model, and its role is to extract features from images. However, there is a problem that the arrangement of various functional layers is not reasonable enough in this module, so there is still a lot of room for optimization. In this regard, this paper refers to DPRN and improves it by deleting some layers and adjusting the convolution kernel. The improved module is shown in Figure 7:

The literature [41] pointed out that the use of a large number of ReLU layers in each module may have some negative impacts on the overall performance of the module, and removing the first ReLU in each module can improve the performance to a certain extent. Therefore, the first ReLU layer of the original module is first deleted to appropriately enhance the performance. Secondly, according to the literature [13], it is shown that using a 1 × 1 convolution kernel to raise and lower the dimension and using a 3 × 3 convolution kernel to improve features requires fewer parameters than directly using a 3 × 3 convolution kernel for feature extraction while raising and lowering the dimension. More convolutions are superimposed on the receptive field of size, which can extract richer features and have higher computational efficiency. Therefore, this paper splits the 3 × 3 convolution block in the original Dense Block into a 3 × 3 convolution block with constant dimension and a 1 × 1 dimension reduction convolution block. The calculation formula of the convolution parameter quantity is described as follows.

p = K_{1} \times K_{2} \times M \times N

(1)

In the formula, p is the amount of convolution parameters. K₁, and K₂ are the length and width of the convolution kernel and they are both equal generally. M and N are the input and output feature map dimension vectors of the convolution layer, respectively. Formula (1) generally does not consider the bias parameters and the effect of the Batch Norm, thus simplifying the calculation. It can be seen from Formula (1) that in the case of using a pyramid structure, splitting the 3 × 3 convolution can reduce the amounts of parameters.

In addition, DenseNet mainly processes the input features through the Dense Block module, and there is only one convolution kernel for feature extraction, which is inevitably relatively single, and it is easy to lose the information in the picture. In order to make full use of the features without increasing too many parameters, a new dilated convolution block (DC Block) based on the dilated convolution method is designed. This problem is improved by adding this module in Figure 8.

In the DC Block, the input data is first normalized and activated by the BN layer and the ReLU layer, and then calculated by a 3 × 3 dilation rate [42] with a dilation rate of 2, and then normalized by a BN layer. Output to complete the calculation of the entire DC Block. The equivalent convolution kernel size calculation formula for dilated convolution is:

K = k + (k - 1) \times (r - 1)

(2)

where K indicates equivalent convolution kernel size, k indicates dilated convolution kernel size, and r is dilation rate. According to the calculation of Formula (2), the equivalent convolution kernel size of the 3 × 3 dilated convolution in the DC Block is 5, which reduces the parameters and can approximate the effect of the 5 × 5 convolution kernel, which not only changes the original DenseNet. It solves the problem of a single convolution kernel in the model and improves the richness of feature extraction in the model. It can read image features from different angles and make full use of different features for analysis.

3.3. Model Structure

Based on the improved DenseBlock module and DC Block module, the Dense Pyramidal Net network model structure is designed as shown in Figure 9.

Firstly, after the data is input, the 7 × 7 convolution layer is used to increase the dimension, and the size of the feature map is reduced at the same time and then the copied results are, respectively, entered into the improved DenseBlock and DC Block for parallel channel calculation. The DC Block can increase the receptive field, use a small number of parameters to achieve the effect of large kernel convolution, enrich the feature extraction results, and to a certain extent make up for the blank that the original model only has a single 3 × 3 convolution without other kernel convolutions. Image recognition. Finally, the results of the parallel computing are added, and the Cat splicing operation is performed with the features from the previous layers. The calculation formula of the output feature vector of the first layer is shown in Formula (3).

x_{i} = Cat (x_{1}, x_{2}, \dots \dots, x_{i - 1}, H_{i} (x_{i - 1}) + D_{i} (x_{i - 1}))

(3)

where, the Cat() function represents the concatenation operation; the H_i() function represents the nonlinear transformation of the i-th layer; and the D_i() function represents the dilated convolutional transformation of the i-th layer.

In order to further improve the compactness of the model, this model retains the Transition layer of the DenseNet, and adds the Transition layer after every certain number of DenseBlocks to reduce the number of feature map outputs. At the end of the model, a global average pooling layer is used to reduce dimensionality.

The global average pooling method is to average all the features of each feature map, and generate

k

numbers of feature maps in the last convolutional layer of feature extraction; after the global average pooling layer, k numbers of 1 × 1 feature maps are obtained, Finally, these feature maps are input into the fully connected layer to obtain the confidence of

k

numbers of categories, so as to obtain the classification result.

4. Experiment Results and Analysis

4.1. Experimental Setup

The experimental environment is described as follows. All network models in this experiment are based on the deep learning framework, and each model experiment is completed on a computer i7-11800H 8-core CPU, Windows10 operating system, 16 GB memory and NVIDIA GeForce RTX3060 Laptop GPU.

Parameter settings are given as follows. The networks in this experiment use the same parameter settings. On the data set, the learning rate size is set to 0.001, and the batch size is set to 100. The training epoch is set to 100, and the stochastic gradient descent optimizer with a momentum of 0.9 is used to optimize the network. All models use the ReLU activation function and use the cross-entropy loss function to calculate the loss.

The datasets are introduced as follows. The datasets used in this experiment are the public datasets of Cifar10 [43] and Cifar100 [44]. The Cifar10 dataset has a total of 60,000 RGB three-channel color images with a resolution of 32 × 32 pixels, which are divided into 50,000 training images and 10,000 test images, which can be divided into 10 categories; Cifar100 dataset and Cifar10 dataset are similar but the difference is that the Cifar100 dataset has 100 categories, each category contains 600 images, and each category has 500 training images and 100 testing images.

4.2. Ablation Experiment

In order to further explore the role of each improved part of the Dense Pyramidal Net model, an ablation experiment was designed. Each improvement is added sequentially from DenseNet to determine the impact of each part on the overall model. The results are shown in Table 1.

It can be seen from Table 1 that the improved residual block has a partial impact on the original network, and the use of the pyramid structure increases the accuracy of model recognition. The parallel computing of the DC Block significantly enhances the model’s ability to recognize images.

4.3. Experimental Results and Analysis

In this paper, Densenet121 and Resnet101 are selected as comparison networks to verify Dense Pyramidal Net. The three networks are trained and tested on the cifar10 and cifar100 datasets, respectively. Accuracy and loss are mainly used as comparison metrics. In addition, the size of the model structure is also used as a basis for comparison. The test accuracies of the three networks on the cifar10 and cifar100 datasets are shown in Table 2. The parameters of the three models are shown in Table 3.

It can be seen from Table 2 and Table 3 that the parameter amount of Dense Pyramidal Net is slightly larger than that of densenet121 and much smaller than that of resnet101 but the accuracy of Dense Pyramidal Net is ahead of them. The accuracy curves of each model on the cifar10 and cifar100 test datasets are shown in Figure 10 and Figure 11. As can be seen from the figure, the accuracy of the Dense Pyramidal Net network gradually slowed down around 20 rounds and finally reached a peak, and it was always in the lead in terms of accuracy, showing a stronger learning ability than other networks.

As can be seen from the Figure 10 and Figure 11, the accuracy of the Dense Pyramidal Net network gradually slowed down around 20 rounds and finally reached a peak, and it was always in the lead in terms of accuracy, showing a stronger learning ability than other networks.

The training losses of the three algorithms on the cifar10 and cifar100 datasets are shown in Figure 12 and Figure 13.

It can be seen from Figure 12 and Figure 13 that the loss curve of Dense Pyramid Net is between the two, gradually approaching flat, and finally reaching a value of 0. Combining the four pictures, Dense Pyramid Net not only has advantages over traditional algorithms in terms of accuracy but also the loss value curve is between the two algorithms. This shows that this model not only exhibits a high learning rate, that is, it can also quickly converge the loss value to get out of the underfitting state earlier and prevent overfitting. In this respect, the algorithm proposed in this paper is superior to the traditional algorithm and has strong robustness.

At the same time, the new algorithm ResNeSt50 is also compared, and the experimental results are shown in Figure 14. It can be seen that under the same conditions, the algorithm proposed in this paper is better than ResNeSt50 in 100 rounds of training.

The experimental results show that the optimized model structure, the use of parallel channels and convolution kernels of different sizes make the feature extraction more sufficient. The network training accuracy is also higher, and the anti-overfitting ability is stronger. The proposed Dense Pyramidal Net network has better performance when compared with DenseNet, the classic network rensnet101 and the new ResNeSt50.

5. Conclusions

In DenseNet, there are problems of insufficient feature utilization, a single type of convolution kernel, and incomplete network optimization. In this regard, this paper will improve DenseNet. First, by introducing the residual unit idea in DPRN, the ReLU layer and BN layer of the original module are appropriately increased or decreased to optimize the structure. Secondly, the module bottleneck pyramid structure is given, that is, the output is larger than the input bottleneck structure, which improves the training results. Then, a new module based on dilated convolution is designed to be computed in parallel with the original module, and the results are fused. Finally, experiments were performed on the CIFAR10 and CIFAR100 datasets, respectively, and obtained 83.98 and 51.19% accuracy. Experiments show that the proposed model can effectively reuse features and addresses some of the shortcomings in DenseNet. It has certain advantages in accuracy and efficiency by comparing with traditional networks.

Due to the grid effect of dilated convolution, that is, the convolution kernel is discontinuous, and there are some loopholes in identifying image features. This model still has room for improvement for pixel-by-pixel classification tasks. Hybrid Dilated Convolution (HDC) [45] is a new dilated convolution method that improves the shortcomings of dilated convolution. At the same time, this model does not discuss network security issues. In the future, it will be considered to combine the HDC with this model to reduce the loss of information and make the continuous use of feature maps more optimized. In addition, the model proposed in this article does not discuss security after being attacked by backdoors. At present, many scholars have also undertaken this work [46,47,48,49]. Related work in the field of security will also be carried out in the future.

Author Contributions

Conceptualization, L.Y. and P.H.; Methodology, L.Y. and P.H.; Software, G.Z.; Validation, H.C. and P.H.; Resources, W.D.; Writing—original draft preparation, L.Y. and P.H.; Writing—review and editing, H.C.; Visualization, G.Z.; Funding acquisition, L.Y. and W.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Innovation Team Funds of China West Normal University (No. KCXTD2022-3), Research Foundation for Civil Aviation University of China (3122022PT02), and the Innovation Team Funds of China West Normal University (NO.KCXTD2022-3).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Hinton, G.E.; Osindero, S.; Teh, Y.W. A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18, 1527–1554. [Google Scholar] [CrossRef] [PubMed]
Sun, Z.; Xue, L.; Xu, Y.; Wang, Z. A Survey of Deep Learning Research. Appl. Res. Comput. 2012, 29, 2806–2810. [Google Scholar]
Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Peng, S. Research and Implementation of Neural Network Image Recognition Technology. Master’s Thesis, Xidian University, Xi′an, China, January 2005. [Google Scholar]
Deng, W.; Shang, S.; Cai, X.; Zhao, H.; Zhou, Y.; Chen, H.; Deng, W. Quantum differential evolution with cooperative coevolution framework and hybrid mutation strategy for large scale optimization. Knowl. Based Syst. 2021, 224, 107080. [Google Scholar] [CrossRef]
Deng, W.; Shang, S.; Cai, X.; Zhao, H.; Song, Y.; Xu, J. An improved differential evolution algorithm and its application in optimization problem. Soft Comput. 2021, 7, 5277–5298. [Google Scholar] [CrossRef]
Deng, W.; Xu, J.; Gao, X.; Zhao, H. An enhanced MSIQDE algorithm with novel multiple strategies for global optimization problems. IEEE Trans. Syst. Man Cybern. Syst. 2020, 52, 1578–1587. [Google Scholar] [CrossRef]
Zhou, F.; Jin, L.; Dong, J. A Survey of Convolutional Neural Network Research. Chin. J. Comput. 2017, 40, 1229–1251. [Google Scholar]
Lecun, Y.; Bengio, Y. Convolutional networks for images, speech, and time series. In The Handbook of Brain Theory and Neural Networks; A Bradford Book: Cambridge, MA, USA, 1995; Volume 3361. [Google Scholar]
Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2012, 60, 84–90. [Google Scholar] [CrossRef]
Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional neural networks. arXiv 2013, arXiv:1311.2901. [Google Scholar]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Zagoruyko, S.; Komodakis, N. Wide residual networks. In Proceedings of the British Machine Vision Conference (BMVC), York, UK, 19–22 September 2016. [Google Scholar] [CrossRef] [Green Version]
Han, D.; Kim, J.; Kim, J. Deep Pyramidal Residual Networks. arXiv 2016, arXiv:1610.02915. [Google Scholar]
Paoletti, M.E.; Haut, J.M.; Fernandez-Beltran, R.; Plaza, J.; Plaza, A.J.; Pla, F. Deep Pyramidal Residual Networks for Spectral—Spatial Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 740–754. [Google Scholar] [CrossRef]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze and excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Xie, S.; Girshick, R. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1492–1500. [Google Scholar]
Zhang, K.; Guo, Y.; Wang, X.; Yuan, J.; Ding, Q. Multiple feature reweight DenseNet for image classification. IEEE Access 2019, 7, 9872–9880. [Google Scholar] [CrossRef]
Sabour, S.; Frosst, N.; Hinton, G.E. Dynamic routing between capsules. arXiv 2017, arXiv:1710.09829v2. [Google Scholar]
Zhang, H.; Wu, C.; Zhang, Z.; Sun, Y.; He, T.; Mueller, J.; Manmatha, R.; Li, M.; Smola, A. ResNeSt: Split-attention networks. arXiv 2020, arXiv:2004.08955. [Google Scholar]
Yu, J.; Zhang, W. Face mask wearing detection algorithm based on improved YOLO-v4. Sensors 2021, 21, 3263. [Google Scholar] [CrossRef]
Roy, A.M.; Bhaduri, J. A deep learning enabled multi-class plant disease detection model based on computer vision. AI 2021, 2, 413–428. [Google Scholar] [CrossRef]
Roy, A.M.; Bhaduri, J. Real-time growth stage detection model for high degree of occultation using DenseNet-fused YOLOv4. Comput. Electron. Agric. 2022, 193, 106694. [Google Scholar] [CrossRef]
Roy, A.M.; Bose, R.; Bhaduri, J. A fast accurate fine-grain object detection model based on YOLOv4 deep neural network. Neural Comput. Appl. 2022, 34, 3895–3921. [Google Scholar] [CrossRef]
Zhang, Z.H.; Min, F.; Chen, G.S.; Shen, S.P.; Wen, Z.C.; Zhou, X.B. Tri-partition state alphabet-based sequential pattern for multivariate time series. Cogn. Comput. 2021, 1–19. [Google Scholar] [CrossRef]
Chen, H.; Zhang, Q.; Luo, J. An enhanced Bacterial Foraging Optimization and its application for training kernel extreme learning machine. Appl. Soft Comput. 2020, 86, 105884. [Google Scholar] [CrossRef]
Cui, H.; Guan, Y.; Chen, H. Rolling element fault diagnosis based on VMD and sensitivity MCKD. IEEE Access 2021, 9, 120297–120308. [Google Scholar] [CrossRef]
Li, T.Y.; Shi, J.Y.; Deng, W.; Hu, Z.D. Pyramid particle swarm optimization with novel strategies of competition and cooperation. Appl. Soft Comput. 2022, 121, 108731. [Google Scholar] [CrossRef]
Deng, W.; Li, Z.; Li, X.; Chen, H.; Zhao, H. Compound fault diagnosis using optimized MCKD and sparse representation for rolling bearings. IEEE Trans. Instrum. Meas. 2022, 71, 3508509. [Google Scholar] [CrossRef]
Shao, H.D.; Lin, J.; Zhang, L.W.; Galar, D.; Kumar, U. A novel approach of multisensory fusion to collaborative fault diagnosis in maintenance. Inf. Fusion 2021, 74, 65–76. [Google Scholar] [CrossRef]
Ran, X.; Zhou, X.; Lei, M.; Tepsan, W.; Deng, W. A novel k-means clustering algorithm with a noise algorithm for capturing urban hotspots. Appl. Sci. 2021, 11, 11202. [Google Scholar] [CrossRef]
Li, G.; Li, Y.; Chen, H.; Deng, W. Fractional-Order Controller for Course-Keeping of Underactuated Surface Vessels Based on Frequency Domain Specification and Improved Particle Swarm Optimization Algorithm. Appl. Sci. 2022, 12, 3139. [Google Scholar] [CrossRef]
Zhang, X.; Wang, X.; Wang, H.; Du, C.; Fan, X.; Cui, L.; Chen, H.; Deng, F.; Tong, Q.; He, M.; et al. Custom-molded offloading footwear effectively prevents recurrence and amputation, and lowers mortality rates in high-risk diabetic foot patients: A multicenter, prospective observational study. Diabetes Metab. Syndr. Obes. Targets Ther. 2022, 15, 103–109. [Google Scholar] [CrossRef]
He, Z.Y.; Shao, H.D.; Zhong, X.; Zhao, X.Z. Ensemble transfer CNNs driven by multi-channel signals for fault diagnosis of rotating machinery cross working conditions. Knowl. Based Syst. 2020, 207, 106396. [Google Scholar] [CrossRef]
Wei, Y.Y.; Zhou, Y.Q.; Luo, Q.F.; Deng, W. Optimal reactive power dispatch using an improved slime mould algorithm. Energy Rep. 2021, 7, 8742–8759. [Google Scholar] [CrossRef]
Deng, W.; Zhang, X.X.; Zhou, Y.Q.; Liu, Y.; Zhou, X.B.; Chen, H.L.; Zhao, H.M. An enhanced fast non-dominated solution sorting genetic algorithm for multi-objective problems. Inform. Sci. 2022, 585, 441–453. [Google Scholar] [CrossRef]
Veit, A.; Wilber, M.; Belongie, S. Residual networks behave like ensembles of relatively shallow networks. arXiv 2016, arXiv:1605.06431. [Google Scholar]
Yu, F.; Koltun, V. Multi-Scale Context Aggregation by Dilated Convolutions. In Proceedings of the 4th International Conference on Learning Representations (ICLR), San Juan, PR, USA, 2–4 May 2015. [Google Scholar]
Li, H.; Liu, H.; Ji, X.; Li, G.; Shi, L. CIFAR10-DVS: An event-stream dataset for object classification. Front. Neurosci. 2017, 11, 309. [Google Scholar] [CrossRef] [PubMed]
Mcclure, P.; Kriegeskorte, N. Representational distance learning for deep neural networks. Front. Comput. Neurosci. 2016, 10, 131. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Wang, P.; Chen, P.; Yuan, Y.; Liu, D.; Huang, Z.; Hou, X.; Cottrell, G. Understanding Convolution for Semantic Segmentation. arXiv 2018, arXiv:1702.08502. [Google Scholar]
Kwon, H.; Kim, Y. BlindNet backdoor: Attack on deep neural network using blind watermark. Multimed. Tools Appl. 2022, 81, 6217–6234. [Google Scholar] [CrossRef]
Kwon, H. Multi-Model Selective Backdoor Attack with Different Trigger Positions. IEICE Trans. Inf. Syst. 2022, 105, 170–174. [Google Scholar] [CrossRef]
Kwon, H.; Lee, S. Textual Backdoor Attack for the Text Classification System. Secur. Commun. Netw. 2021, 2021, 2938386. [Google Scholar] [CrossRef]
Kwon, H. Defending Deep Neural Networks against Backdoor Attack by Using De-trigger Autoencoder. IEEE Access 2021, 4, 1–12. [Google Scholar] [CrossRef]

Figure 1. Residual Network.

Figure 2. Pyramidal Net.

Figure 3. Pyramidal Residual Block.

Figure 4. Transition architecture.

Figure 5. DenseNet structure.

Figure 6. The flow of the proposed image recognition method.

Figure 7. The improved Dense Block.

Figure 8. DC Block architecture.

Figure 9. Dense Pyramidal Net architecture.

Figure 10. Test results of three algorithms on the cifar10.

Figure 11. Test results of three algorithms on the cifar100.

Figure 12. Training loss of three algorithms on cifar10.

Figure 13. Training loss of three algorithms on cifar100.

Figure 14. Comparison with ResNeSt50 in cifar10 test accuracy.

Table 1. Comparison results of each part on Cifar10.

Part	Accuracy
DenseNet	77.10%
DenseNet with improved residual block	77.86%
Dense Pyramidal Net without DC block	79.78%
Dense Pyramidal Net	83.16%

Table 2. Accuracy of the three algorithms on two datasets.

	Dense Pyramidal Net	Densenet121	Resnet101
Dataset	Dense Pyramidal Net	Densenet121	Resnet101
Cifar10	83.98%	77.48%	73.74%
Cifar100	51.19%	46.51%	34.94%

Table 3. Three model parameters.

	Dense Pyramidal Net	Densenet121	Resnet101
Params(M)	18	8	45

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yin, L.; Hong, P.; Zheng, G.; Chen, H.; Deng, W. A Novel Image Recognition Method Based on DenseNet and DPRN. Appl. Sci. 2022, 12, 4232. https://doi.org/10.3390/app12094232

AMA Style

Yin L, Hong P, Zheng G, Chen H, Deng W. A Novel Image Recognition Method Based on DenseNet and DPRN. Applied Sciences. 2022; 12(9):4232. https://doi.org/10.3390/app12094232

Chicago/Turabian Style

Yin, Lifeng, Pujiang Hong, Guanghai Zheng, Huayue Chen, and Wu Deng. 2022. "A Novel Image Recognition Method Based on DenseNet and DPRN" Applied Sciences 12, no. 9: 4232. https://doi.org/10.3390/app12094232

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Image Recognition Method Based on DenseNet and DPRN

Abstract

1. Introduction

2. Related Work

2.1. Residual Network

2.2. Deep Pyramidal Residual Networks

2.3. DenseNet

3. Image Recognition Method

3.1. Image Recognition

3.2. Module Improvements

3.3. Model Structure

4. Experiment Results and Analysis

4.1. Experimental Setup

4.2. Ablation Experiment

4.3. Experimental Results and Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI