1. Introduction
Litopenaeus vannamei, also known as whiteleg shrimp, is one of the most popular aquaculture species worldwide. Due to its high nutrient density,
Litopenaeus vannamei has become the most productive strain used in Chinese shrimp farming [
1]. However, changes in farming environments and other factors can easily lead to infections by various pathogenic bacteria, resulting in massive deaths and causing significant ecological and economic losses [
2]. In recent years, research on the isolation and identification of shrimp pathogens has been continuously developing, but the majority of studies still rely on traditional methods such as gene sequencing. These methods are costly, time-consuming, and necessitate the use of a professional worker [
3]. Therefore, a rapid and accurate method for classifying related pathogenic bacteria is needed.
Raman spectroscopy is a technique that uses scattered light to determine the vibrational modes of molecules, which can provide a structural fingerprint by which molecules can be identified [
4]. Due to its highly efficient, non-destructive, and easy-to-operate characteristics, it has been favored by researchers and is widely used in the chemistry [
5,
6,
7], materials science [
8,
9,
10], and biomedical fields [
11,
12,
13], among others. In the field of microbial classification, Raman spectroscopy has also made significant progress. For example, various related research has arisen in the detection of foodborne pathogens [
14], environmental microorganisms [
15], and human pathogens [
16], and a full system has gradually been established. However, despite Raman spectroscopy’s success in rapidly identifying microorganisms, it is an unavoidable fact that obtaining Raman spectroscopy data takes up the majority of the timespan of the entire microbial identification process, especially when surface enhancement techniques are required to obtain low-noise data, which significantly increases the duration of the experiment [
17].
With the rapid development of machine learning, microbiology research has begun to be combined with machine learning, and even deep learning, and the results of the combination of the two are satisfactory. For example, regarding microbial genes, Chen et al. used methods such as L0 + L1 regulation to perform computational reconstruction on haplotypes of mixed sequencing, achieving the highest accuracy levels on multiple datasets [
18,
19]. Wang et al. developed a new deep learning prediction method called MDeep, which is based on CNN and phylogenetic trees and is more competitive than traditional methods [
20]. Regarding microbial classification, Maruthamutu et al. used CNN and attention mapping to classify 12 microorganisms, with an overall recognition rate of over 97%, a technique which helps to efficiently distinguish microbial pollutants [
21]. Wang et al. combined Raman spectroscopy and deep learning to classify 30 pathogenic bacteria collected clinically. In comparison to several traditional machine learning methods, the CNN used in the paper had higher accuracy and efficiency levels [
22]. The combination of microbial spectroscopy and deep learning is gradually becoming a trend. Due to the difficulty involved in obtaining a large amount of Raman spectroscopy data from microorganisms, the use of deep learning methods for Raman spectroscopy amplification is also one of the research directions.
In 2015, Goodfellow et al. first proposed the concept and implementation framework of generative adversarial networks (GANs) [
23], which involve designing a pair of networks: one generates data and the other identifies data, with the two networks competing and learning from each other. After sufficient iterations, the two networks approach Nash equilibrium infinitely, and the generated data is very close to the real data. Over the years, GANs have produced many variants and demonstrated their strong capabilities in various fields. Although the Diffusion model has recently gained popularity and outperformed GANs in image generation [
24], as a mature generation model, the latter’s status will not be challenged in the short term.
The success of generative adversarial networks in image generation has increased researchers’ confidence in spectral data amplification. Researchers in the area have adapted GANs and properly set the network structure and parameters to make them suitable for Raman spectroscopic data. Du used GANs to considerably extend the dataset in an experiment employing Raman spectroscopy to identify three foodborne pathogenic bacteria, eventually obtaining a classification accuracy of 90% [
25]. Yu et al. used GANs to gather a large amount of eligible spectral data in a marine pathogen classification experiment, establishing the groundwork for accurate classification [
26]. Ma et al. proposed a spectrum recovery conditional generative adversarial network (SRGAN). An SRGAN can accelerate spectral collection and improve the throughput of Raman spectroscopy. The researchers used the SRGAN to process the spectral data of five foodborne bacteria and achieved a classification accuracy of 94.9% in the final classification task. In the comparative experiment, without using the SRGAN, the classification accuracy was only 60.5% [
27]. Liu et al. used a PGGAN to amplify spectral data in an experiment classifying five types of marine microorganisms. The results showed that only one-third of the original data needed to be substituted into the model for training to obtain ideal classification results [
28].
The above studies demonstrated the efficiency of generative adversarial networks in microbial Raman spectroscopy; however, they have significant limitations. For example, most of them do not evaluate classification accuracy before and after data amplification, making it impossible to determine whether the generative adversarial network plays a role. And if you closely examine the photos associated with the generated data, you will notice that there is still noise in the data. Furthermore, most studies on microbial Raman spectrum classification still use classic machine learning approaches, indicating that there is still space for improvement. To address the aforementioned issues, we proposed a distributed deep learning network based on data enhancement for few-shot Raman spectral classification of shrimp pathogens. The research intentions include the following:
Propose a distributed deep learning network based on data enhancement for few-shot Raman spectral classification of shrimp pathogens. The network is made up of three modules: a Raman spectrum enhancement module (RSEM), a Raman spectrum denoising module (RSDM), and a distributed learning classification module (DLCM). The RSEM controls the enhancement of the spectral data. During the training process, the network in the module employs transfer learning to improve the efficiency and quality of the generated data.
Establish the first application of the UNET network in the denoising of microbial Raman spectroscopy. The RSDM module consists of an improved UNET network, which utilizes the unique structure of the UNET network to effectively eliminate irrelevant noise in the data generated by the RSEM.
As opposed to traditional machine learning methods, we design a distributed deep learning classification module (DLCM). The module consists of a server and multiple clients. The clients and server perform parallel training and interact according to the designed algorithm. This module achieves the accurate classification of high-dimensional Raman spectra and solves network degradation problems common in deep learning.
2. Materials and Methods
2.1. The Framework of the Proposed Network
The framework of our proposed network is shown in
Figure 1. The network consists of a Raman spectrum enhancement module (RSEM), a Raman spectrum denoising module (RSDM), and a distributed learning classification module (DLCM). First, we use a Raman spectrometer to obtain spectral data from the sample and use relevant algorithms to preprocess the data to eliminate the impact of the spectrometer itself or environmental factors on the data. Subsequently, the preprocessed data is input into the RSEM for training, thereby amplifying the sample dataset. The blue box represents the RSEM module, and the core of the module is generative–adversarial. As shown in the upper part of the blue box, the generative adversarial network achieves data generation by training the generator network and discriminator network. Specifically, the type of generative adversarial network is the WGAN; due to its excellent performance in generating images, it has been improved and utilized as the network for Raman spectroscopy data amplification in this experiment. At the same time, in order to enable the WGAN to train normally on datasets with small amounts of data, we used a large microbiological Raman dataset for pre-training and then froze and transferred the trained parameters. In addition, experiments have proven that using transfer learning in an RSEM can also speed up network training and improve the quality of generated data. After waiting for the RSEM to complete training, the data generated in the module will be input into an RSDM for further processing. In this module, we use the improved UNET network to denoise the generated data. As shown in the orange box, the network can be viewed as an encoder and decoder. The encoder compresses the dimensions of the data, while the decoder is used to restore the dimensions of the original data. The network uses a loss function to adjust network parameters by minimizing the error between input and decoded data, thereby achieving the goal of noise reduction. The network can reduce the noise generated by WGAN training, thereby improving the classification accuracy of the downstream classification model. The green box displays the general framework of the DLCM module, which includes a server and multiple clients. The server receives the parameters trained by the client, generates global parameters through algorithms, and sends them to the client. In the client, we use residual networks to avoid gradient vanishing during the iteration process, effectively improving the accuracy of classification.
2.2. Data Acquisition and Preprocessing
2.2.1. Acquisition of Raman Spectrum Data
Due to changes in the external environment and other circumstances, whiteleg shrimp are frequently susceptible to infections produced by pathogenic bacteria during the shrimp farming process, leading to disease and death. After reviewing the relevant literature, this study selected four common whiteleg shrimp pathogens:
Vibrio parahaemolyticus,
Escherichia coli,
Aeromonas hydrophila, and
Aeromonas veronii.
Vibrio parahaemolyticus is the most common pathogen in whiteleg shrimp farming, with infected shrimp turning white-grayish red and having a high mortality rate [
29].
Escherichia coli [
30] and
Aeromonas hydrophila are among the causes of bacterial shrimp enteritis, which can lead to slow growth, significant body weight loss, and even death in severe cases.
Aeromonas veronii has also been identified as a pathogen that is commonly isolated from infected and dead shrimp [
31].
This study selected the above four pathogens as the research objects for the classification task. We obtained pathogen samples from the National Pathogen Collection Center for Aquatic Animals (NPCCAA) at Shanghai Ocean University, all of which were extracted from diseased whiteleg shrimps. They were then inoculated into LB (Luria Bertani) liquid medium and cultivated for 24 h at 37 °C before being rinsed with PBS and centrifuged. The bacterium solution that resulted was then kept at 4 °C. To prepare in situ coated silver nanoparticles, the bacteria solution was centrifuged to remove the supernatant, and the resulting pellet was resuspended in nitric acid solution. After thorough shaking and mixing, sodium hydroxide solution was added to further shake and mix, forming the in situ-coated silver nanoparticles. This process is known as surface enhancement, and it attempts to adhere the silver colloid to the bacterial surface, thereby boosting the bacteria’s Raman signal and keeping low bacterial density from preventing Raman spectroscopy from being obtained.
Additionally, we used the LabRAM HR Evolution Confocal Raman Microscope (HORIBA, Kyoto, Japan, 532 nm excitation wavelength) to obtain the Raman spectra of the samples. After adding an appropriate amount of surface enhanced bacteria, drops were placed on the center of the microscope slide, and the Raman spectroscopy was used to collect spectral data after the sample dried. Each sampling point was collected 3 times, and the average value of the 3 collected data was used as the Raman spectroscopy data for that point. This process was repeated to obtain a total of 160 Raman spectroscopy data points for the four bacteria. Specifically, each microorganism is assigned 40 spectral data points, with each wavelength range selected from the range of 1289 cm−1 to 4000 cm−1, totaling 1600 spectral features. After obtaining the Raman intensity of each feature (wavelength value), these 1600 points can be connected to obtain the final Raman spectral curve.
2.2.2. Preprocessing of Raman Spectrum Data
When using Raman spectroscopy to analyze data, the initial data often need to be preprocessed due to interference from factors such as cosmic rays, instrument noise, and autofluorescence of the sample itself, in order to prevent noise from affecting the experimental results. Common preprocessing methods include smoothing, scattering correction, baseline correction, and normalization [
32]. In this experiment, S-G (Savitzky-Golay) smoothing and normalization were used to preprocess the raw data. S-G smoothing can maximize the preservation of data information while reducing noise and increasing the signal-to-noise ratio. Normalization can reduce the negative effects of large variations in spectral data, allowing the spectra to fall within a specific range. We preprocessed the obtained 40 Raman spectra, and the preprocessed spectral images are shown in
Figure 2. It can be seen that the spectral curve is very smooth, and the Raman intensity (y-axis) is concentrated between 0 and 1. This proves convenient for subsequent analysis and model training. Meanwhile, we can also observe that the spectral images of the four microorganisms are very similar, with their peak positions mostly overlapping. Due to the overlapping Raman spectral characteristics of the four microorganisms, special methods are needed to distinguish them.
2.3. Raman Spectrum Enhancement Module (RSEM)
2.3.1. WGAN Network Structure
Generative adversarial networks (GANs) consist of a generator network and a discriminator network. The generator network is responsible for generating data similar to real samples, while the discriminator network is responsible for distinguishing generated data from real data. Since the introduction of GANs, hundreds of different types of GANs have emerged, and they have been proven to demonstrate excellent performance in specific domains.
In order to select the most suitable generative adversarial network, we conducted a series of comparative experiments on several types of generative adversarial networks and found that WGAN had the most outstanding performance in terms of the quality of generated data and the duration of model training. Therefore, we chose WGAN as the network for the data augmentation module. The comparative experimental results are presented in
Section 3.2.
Unlike the original GAN discriminator, which solves for the classification of 0 or 1, the WGAN discriminator employs the Wasserstein distance to distinguish between generated and real data [
33]. The network provides a trustworthy training process indicator, avoiding the gradient vanishing problem that frequently happens in the original GAN. The Wasserstein distance formula is shown below:
where
represents the distribution of real data,
represents the distribution of generated data,
represents the set of all possible joint distributions combining
and
,
[
] represents the expectation value of the distance between the real data x and the generated data y under the joint distribution
γ.
After a series of derivations, the Wasserstein distance can be approximated as
where
represents a function containing parameter
ω. Specifically in the WGAN network, the loss function expression can be written as
where
represents the result judged by the discriminator on real data,
represents the generated data, and
represents the result judged by the discriminator on the generated data.
Therefore, the generator’s loss function can be expressed as
The loss function of the discriminator can be expressed as
The generator and discriminator are trained towards the goal of minimizing the loss function, and the two constantly compete until reaching a Nash equilibrium.
Figure 3 shows the improved WGAN structure used in our paper. The WGAN network is made up of a generator and a discriminator. The generator features a five-layer network structure. Each layer has a deconvolution layer and a normalization layer. Each layer generates output using the LeakyRelu activation function. Similarly, the discriminator has a five-layer network structure, with each layer consisting of a convolution layer and a normalization layer, and each layer using the Relu activation function to generate output. First, low-dimensional noise data Z is generated at random and used as the generator network’s input, after which sample data is generated using generator network mapping. The discriminator then receives both the sample and real data as inputs. The generator network optimizes the network gradient through the generator’s loss function, making the generated data distribution closer to that of the real data. The discriminator network optimizes the network gradient through the discriminator loss function, so that the discriminator’s ability to identify false data is improved. After a sufficient number of confrontations between the generator and discriminator networks, the two are infinitely close to a Nash equilibrium. At this time, the data generated by the generator network is also infinitely close to the real data. In addition, the parameters of network training are also displayed in the lower right corner of the figure below.
Compared with the original WGAN, we have made some improvements in order to better generate Raman spectrum data. The one dimensional convolution layer nn.Conv1d is used instead of the linear layer nn.Linear in the discriminator network, and the deconvolution layer nn.ConvTranspose1d is used instead of the linear layer nn.Linear in the generator network. The one-dimensional convolution layer can adjust the information interaction between channels, not only so that the model can have strong abstraction capabilities, but also so that computational efficiency is ensured. In addition, Spectral Normalization is used instead of Batch Normalization in the network. Spectral Normalization can make the parameter matrix of the network satisfy the Lipschitz continuity condition, thus making WGAN training more stable [
34].
2.3.2. Transfer Learning for WGAN
The main idea of transfer learning is to use the rich knowledge information of the source domain dataset (usually with a large amount of data) to supplement the information missing in the target domain due to the small amount of data in the target domain, by finding the relationship between the source domain and the target domain [
35]. The more similar the data between the original domain and the target domain, the better the transfer learning effect. In this study, we choose a large-scale pre-trained microbial spectral dataset as the source domain data, which includes 60,000 Raman spectra of 30 types of microorganisms collected by Stanford Hospital from 2016 to 2017 [
16]. After sufficient iterations to stabilize the model, the training is stopped, and the model’s parameters are frozen. The frozen parameters are then transferred, and the target domain data (preprocessed spectral data) is directly trained using the transferred parameters. A schematic diagram is shown in
Figure 4.
2.4. Raman Spectrum Denoising Module (RSDM)
Although GANs can generate a large amount of data similar to the original data, the instability of the training process exists due to the principle of the generator and the discriminator competing with each other. In order to reduce the exaggerated noise in the training process and make the generated data more similar to the real data, the Unet model is introduced to denoise the generated data in this experiment.
Unet is a classic network model that has been widely used in various image segmentation tasks due to its efficiency, simplicity, and ability to adapt to small datasets [
36]. The original intention of Unet was to solve the problem of medical image segmentation, as its encoder–decoder structure can extract complex features and restore the original resolution. Skip connections in the model reduce the feature loss caused by convolutions, helping the decoder extract important shallow information.
With the RSDM, we improve the original U-NET structure to improve the network’s ability to extract spectral-feature information. The model is shown in
Figure 5. It can be seen that the denoising model shows perfect symmetry. The left side of the network can be seen as the encoder, the right side of the network can be seen as the decoder, and BLOCK is used for transition in the middle of the network. Specifically, the model consists of an encoding structure containing four convolutional layers and a decoding structure containing four deconvolutional layers. In the convolution operation of each layer, by using a 3 × 3 convolution kernel, the number of parameters of the convolution layer is reduced and a sufficiently receptive field size is maintained. The downsampling operation in the encoding structure and the upsampling operation in the decoding structure are implemented by 2 × 2 maximum pooling and 2 × 2 upconvolution operations, respectively.
In addition, compared to the original UNET, we have made some improvements to reduce the loss of data features in the network during upsampling and downsampling processes. Firstly, we improved the skip connection method of the U-Net network. In traditional UNet, in order to avoid losing a large number of precise spatial details in the decoder, the skip technique is used, which directly concatenates the map extracted from the encoder to the corresponding layer of the decoder. However, we believe that shallow feature information enters the decoder too early, which is not conducive to the process of the decoder extracting global feature information. Therefore, we connect the output of the encoder and the output of the decoder. The new connection method can integrate shallow feature information into deep spectral data details, which is more conducive to restoring clean spectral data. Secondly, we introduce an attention mechanism. The upsampling and downsampling in the UNET network are both based on convolution, and the receptive field kernel size of the convolution operation is small, so each convolution operation can only cover local features of the data. This results in the decoder losing data features during the data recovery process. The attention mechanism can better learn the dependency relationships between global features.
We use a module called Attention Gates to train attention mechanisms, in which coarse-grained features capture contextual information and highlight the categories and positions of foreground objects. This module can suppress the task-independent parts of the model learning, while emphasizing the learning of task-related features. Subsequently, feature maps extracted at multiple scales are merged through skip connections to combine dense predictions at both coarse-grained and fine-grained levels. Our improvement can avoid data loss while allowing each layer of the network to perform noise reduction.
Figure 6 shows the Attention Gates module we introduced, which can be trained in 5 steps.
Step 1: After a 1 × 1 convolution, add together the outputs of the encoding layer Oen and the output of the decoding layer Ode.
Step 2: Pass the added results through the Relu function.
Step 3: Convolve the result of step 2 and reduce the channel to 1.
Step 4: Sigmoid the result of Step 3 so that the value falls within the (0, 1) interval, which results in attention weights.
Step 5: Multiply the attention weights obtained in step 4 with the output Oen of the encoding layer, and assign the attention weights to the low level feature.
Additionally, during the downsampling process, we added residual blocks to the encoder [
37]. The residual network adds the inputs and outputs of a large number of convolutional layers to extract features from the data. For networks with deeper network layers, adding a residual network can avoid the model degradation problem caused by vanishing gradients during training. Combining the relevant improvements mentioned above, the improved U-NET network proposed in this article can denoise spectral data very well. When the spectral data with noise is input into the network, the encoder compresses the features of the data, and then the decoder amplifies the features to the original size. During the encoding and decoding process, redundant noise information in the spectral data is eliminated, and at the same time, important features of the data are preserved.
2.5. Distributed Learning Classification Module (DLCM)
Distributed learning is an emerging learning method that can allocate data and computing tasks to multiple computing nodes (clients) for parallel computing. During the training process, the computing node (client) can exchange relevant parameters with the server according to a specific algorithm to achieve the purpose of jointly training the model. Through parallel computing, distributed learning can significantly speed up model training and improve the efficiency of the algorithm. In addition, distributed learning can make full use of the information in the dataset and improve the accuracy of the model.
After training with the RSEM and RSDM, we obtained microbial Raman spectrum data with a large amount of data and high dimensions. If we want to accurately classify the data, we need a complex deep learning model. However, in order to achieve the desired effect, the deep learning model needs to be trained a sufficient number of times, and time is also a very important factor in the classification and identification of microorganisms. This is also the original intention of using spectra to classify microorganisms. In addition, deep learning inevitably suffers from training instability during the iterative process. In order to solve this problem, we introduced the idea of distributed learning and built a distributed learning module for the classification of microbial Raman spectrum data. The framework of this module is shown in
Figure 7.
Specifically, the module sets up “i” local clients, each using the same classification network and obtaining consistent initialization parameters from the server. The dataset is evenly divided according to the number of clients, and different clients obtain different sub-datasets. To be more specific, each client’s classification network is RESNET, which helps to prevent network degradation caused by vanishing or explosive gradients. Subsequently, all clients began training, with each client iterating “m” times in its own classification network, continuously updating gradients during the iteration process. Each client submits their revised model parameters to the server after finishing their respective training sessions. The server takes the model parameters from the client and agrees to them to create the global model for this round. In this manner, a round of global model iteration was completed using distributed learning, and the preceding stages were repeated until the model iteration had achieved the intended result and halted.
The overall process of the algorithm is described using Algorithm 1.
Algorithm 1. Model Parallel Training |
Input: Client Ci, Client’s respective data, Client initial parameters , Training round T, Fixed parameters Output: Global model WInitialize global model W0 For each client in clients: Train using initialized parameters
and generate For round t in T: Server executes: Receive and aggregate to generate global model Wt Wt = Wt−1 + Wt−1 Return Wt Client executes: Receive the global model sent by the server Wt Use the global model Wt for training and generate parameters Send parameters to server Return WT
|
END |
2.6. Evaluation Indicators
In order to evaluate the classification effect of the model, we use
Precision,
Recall,
F1
Score and
Accuracy as evaluation indicators. The calculation formulas of the four indicators are as follows.
where
TP indicates that the predicted result is microorganism A, and the actual result is microorganism A.
TN means that the predicted result is not microorganism A, and the actual result is not microorganism A.
FP means that the predicted result is microorganism A, but the actual result is not microorganism A.
FN indicates that the predicted result is not microorganism A, but the result is actually microorganism A.