1. Introduction
Conventional spectrometers are bulky due to complex optical paths and moving parts that hinder such spectrometers from achieving wider applications in handheld, spaceborne, and airborne scenarios. Contrarily, the proposed computational spectroscopy conceptualization, based on the principle of compressed sensing (CS), has significant potential for portable applications, such as remote sensing, healthcare, and astronomical observation [
1,
2]. These applications are in high demand because of their high spectral resolution, compactness, and low cost. The computational spectrometer can break the confinement from hardware technology and improve efficiency and accuracy, based on its effective design technique and efficient algorithms. In the field of computational spectroscopy, an unknown signal is measured according to CS theory by projecting the unknown signal onto a random basis. The filter structure, which acts as the encoding part, has a specific response function. The transmission spectrum has a randomly distributed spectral signature due to multiple reflections at the interface of thin film filters or nanostructures [
3]. Each different configuration of the filter structure produces a unique response function, which results in a completely different response function that is used on a random basis [
4].
The major problem is the technical complexity of optical filters with random transmittance spectra on the broadband spectrum, and the second problem is computation efficiency. The performance of CS relies on the randomness of the measuring bases, namely the encoding part. The transmittance of the optical filters of the computational spectrometer should have diverse spectral features over the broadband spectrum range. Thus, we are looking for a sizable group of filters with minimal correlation. The low correlation or non-correlation of the transmittance between any two of the spectrometer pixels is basically the prerequisite for the application of CS [
5,
6]. This brings a challenge to the design and fabrication of thin film filters and nanostructures like photonic crystal slabs, metasurfaces, and thin film filters [
7]. This problem also leads to poor performance of conventional iterative reconstruction algorithms such as gradient projection for sparse reconstruction (GPSR), orthogonal matching pursuit (OMP), and subspace pursuit (SP) algorithm [
8,
9,
10]. Neural networks (NNs) have been introduced because of their various advantages. Bao [
11] combined conventional algorithms with an NN, named solver-informed NN, to achieve better alignment of the reconstructed spectral. Even though the bulky volume of the large training dataset hinders its widespread application. Many researchers have made some progress with NNs in computational spectroscopy. Zhang [
12] proposed a broadband encoding stochastic camera featuring fully connected NN layers. Ding [
13] proposed an encoding and reconstruction convolutional NN (CNN), named wide-spectrum encoding, and a reconstruction NN named wide-spectrum encoding and reconstruction neural network (WER-Net). Both Zhang and Ding’s networks are trained by approximately 1,650,000 spectral datasets from the Columbia Imaging and Vision Laboratory (CAVE) [
14] and the Interdisciplinary Computational Vision Laboratory (ICVL) [
15]. Kulkarni applied an NN in the field of image compression reconstruction [
16], which was the first study that solved the CS problem with NN. Subsequently, Song introduced deep-learned broadband encoding stochastic filters for computational spectroscopic instruments [
12], the NN is also trained by the heavy training dataset CAVE and ICVL. Bao used a NN to improve the reconstruction accuracy of the conventional spectral reconstruction algorithm by fitting the original spectral curve and the reconstructed curve obtained from the conventional iterative algorithm [
11]. The loss functions for almost all the NNs used in the computational spectrometer are all about the mean squared error (MSE) between the original curves and the reconstructed spectral curves. It enlightens us that there may be a possibility of getting rid of the compulsory demand for non-correlation properties in computational spectroscopy. To address the aforementioned problems, this study introduces an NN computational spectrometer trained by a small dataset with high-correlation optical filters.
The remainder paper is organized as follows:
Section 2 presents the NN architecture and methodology.
Section 3 describes the training and simulation.
Section 4 reports the experimental results. Finally,
Section 5 concludes this study.
2. Theoretical Model and Design Methodology
For traditional computational spectrometers, iterative algorithms based on the CS theory have been developed for many years. The encoding procedure is completed by random optical filters and focal plane detectors which are alike in various spectrometers. In this way, satisfying the incoherence criterion requires minimum correlation between any two of the spectrometer filters, which brings great challenges to the design and fabrication of broadband filters. Contrarily, the training process of the filters in NNs only pays attention to the conformity of the reconstructed spectrum and the ground truth, which does not require the incoherence criterion to be satisfied. The NNs proposed in this study can achieve very good reconstruction accuracy with highly correlated broadband optical filters. The decoding procedure, which is completed by the reconstruction algorithm of an NN, often requires storage in megabytes, and the reconstruction procedure takes up computing resources. So the architecture needs to be as simple as it can be.
Neurons comprise the structure of a unit of artificial NNs, based on the unit of deep learning to simulate the working process of biological NNs. The McCulloch–Pitts (M–P) neuron model, proposed by McCulloch and Pitts [
17], is the most commonly used to develop the computational model of neurons. Hopfield used NNs to solve NP-hard problems for the first time [
18], helping to achieve rapid development in NNs. LeCun proposed LeNet-5, the standard CNN, which greatly contributed to the development of NNs [
19]. Since then, deep learning has started to spring up in various fields. Among them, Kulkarni applied deep learning to the field of image compression and reconstruction for solving CS problems [
16]. Zhang applied feedforward NNs with all fully connected layers to the field of spectral reconstruction [
12]. Ding proposed a lightweight CNN for computational spectroscopy [
13].
Since the proposed neural network featured with trained by a small training dataset, we then refer to it as STD-Net. The architecture of the STD-Net is presented as follows: The first layer, which serves as an encoding layer, is a matrix of known transmittance curves for 15 polymethyl methacrylate (PMMA) filters. The second to the fifth layers are the reconstruction layers (referred to as decoding layers), and each layer is followed by the Leaky ReLU as the activation function. The overall reconstruction network structure is summarized as FC (encoding without bias) − (FC − LEAKY_RELU) × 4, as described in
Figure 1.
2.1. CNN and All FC Network
The CNN and all FC networks are first compared, based on the analysis and verified by training process and reconstruction performance before focusing on all FC layers in STD-Net. CNN is more suitable for image segmentation because it uses two strategies of local connectivity and weight sharing to reduce the network model complexity. CNN also has the property in which the extracted features have a certain degree of output invariance to change the input data, such as translation, rotation, and scale scaling. In more depth, local connectivity means that each node in a convolutional layer is connected to only part of its predecessor layer and learns only local features of the input data. This is because the correlation between image pixels is related to the distance between pixels, and the correlation is strong between pixels that are closer together. However, in the field of computational spectroscopy, the data input to the decoding layer all represent some feature of the measurement spectrum in each band and are correlated with each other, and learning only partial features is not sufficient to fully utilize the encoded data. In addition, weight sharing refers to the use of the same convolution kernel to convolve different regions of the input matrix to detect the same feature. These two strategies result in localized features of the input matrix that are independent of the position of the data composing the features in the matrix. Thus, when the data in the matrix is moved, the convolutional layer still finds the same feature, only the position is changed.
However, the output invariance of the CNN leads to a loss of reconstruction accuracy in the computed spectrum. In the encoding layer, the original spectral curve is compressed and sampled by 15 filters to obtain 15 data, and their arrangement order is the same as the arrangement order of the filters, the same local data features may represent different spectral curve information because of their different positions in the matrix, and the CNN treats them as the same information due to the output invariance and thus leads to the reduction in reconstruction accuracy. Therefore, we canceled the method of scanning the input matrix by the convolution kernel and changed the size of the convolution kernel to be the same as the input matrix, which is computed in the same way as the fully-connected layer, so we replaced the convolution layer of the reconstruction algorithm by the fully-connected layer.
On the other hand, some studies suggest that convolutional layers can reduce the number of parameters to improve computational efficiency [
13,
20]. However, the convolutional kernel scanning method increases a large number of operations and computations by multiple convolutions of the input matrix compared to the fully connected layer, which reduces the computational speed and requires a higher performance of the equipment.
2.2. Small Training Dataset
A proposed presumption in this study is that the common spectral large training dataset such as CAVE and ICVL has a certain distribution law, approximate or consistent with the distribution law of natural spectra. Therefore, a small training dataset, randomly drawn from a large training dataset of the same proportion, should have similar distribution laws as the entire training dataset. This shows that this small dataset contains the main features of the whole training dataset. To analyze and find this distribution law, we introduce the concept of the Pearson product-difference correlation coefficient, whose mathematical expression is as follows:
where x and y each represent a spectral curve;
xi and
yi refer to the intensity corresponding to the
i-th wavelength of the spectral curve, and
and
represent the average intensity of the spectral curve.
The Pearson product difference correlation coefficient has a value between −1 and 1. The closer its absolute value to 1, the stronger the linear correlation between variables x and y.
The whole dataset is a hybrid dataset of CAVE and ICVL. CAVE dataset covers the spectra of five scenes: objects, skin and hair, paint, food, and drink, and has been cited by more than 632 literature sources. ICVL dataset covers the spectral information of nature, trees, and buildings, and has been cited by more than 415 pieces of literature. The mixture of the two, with a total of 1.65 million spectral curves, contains spectral information of the most commonly used scenes and is also widely recognized and used by the image processing industry for satisfying experimental requirements. Since the spectral resolution of the dataset is only 10 nm, which is somewhat different from the ideal resolution, we use the least squares fitting method to increase the spectral resolution to 2 nm to improve the resolution of the spectral reconstruction algorithm. Then, based on the assumptions made in the previous section, we take the absolute average of the correlation coefficient between each spectral curve in the dataset and the other spectral curves and are regarded as the distribution weight of that spectral curve in the large training dataset to derive the spectral distribution, and divide the large dataset into several intervals based on their distribution weights, to obtain the small dataset for training, testing, and simulation.
2.3. Loss-Function and Training Methods
To assess the conformity level between the reconstructed spectral curve and the ground truth, we adopt MSE as the objective function:
where
S denotes the input actual spectral matrix, and
represents the output reconstructed spectral matrix.
Since its loss function focuses on the alignment between the input and output spectral curve, the focus here is not on the optical filters’ non-correlation property. Luckily, the non-correlation property of optical filters is only a sufficient condition but not necessary for computational spectroscopy. That explains why the NN computational spectrometer can achieve good performance later even with high-correlation optical filters in this study.
A high learning rate is capable of accelerating learning in the early stage of algorithm optimization, facilitating the model to approach the local or global optimal solution for the loss function. However, this approach can cause the model to fluctuate too much in the later stage and prevent the model from converging to the optimal value. The learning rate adaptive decay mechanism throughout the training process is applied, which takes the average reconstruction accuracy MSE of each training epoch as the index and reduces the learning rate when the MSE does not decrease to a noticeable extent. This mechanism can effectively reduce the model fluctuations in the middle and later stages of training, making the model closer to the optimal solution.
In addition, the batch gradient descent model uses all samples to update the gradient in each iteration, resulting in a stable model iteration. However, the iteration speed is very slow for a sizable training dataset. In contrast, the stochastic gradient descent method iterates once after calculating the loss function for one training data, which has a fast iteration speed at each step, but with poor stability. We used the small-batch stochastic gradient descent method to coordinate the model stability and training speed. Thus, we achieved strong stability of the optimization direction while ensuring the training speed after summing up the loss function and iterating the gradient immediately.
3. Training and Simulation
3.1. Small Training Dataset Establishment
The CAVE and ICVL datasets mentioned in
Section 2.2 have a spectral profile resolution of only 10 nm, which falls short of the experimental accuracy requirements. Therefore, we increase the spectral resolution to 2 nm by average interpolation based on extracting the spectral curve data in the visible band, i.e., between 400 and 700 nm. Based on the assumptions in
Section 2.2, the whole dataset with 1.65 million spectral curves is regarded as a large training dataset, and the correlation coefficients between each spectral curve and the other spectral curves are taken as the absolute average value and are considered as the distribution weight of that spectral curve in the large training dataset to derive the contribution of that spectral curve to the richness of the whole dataset. The lower the distribution weight, the higher the contribution. The large training dataset is divided into 1000 intervals in order of the distribution weights of each spectral data in fixed steps. The overall distribution is shown in
Figure 2.
We extract a certain number of spectral data to establish the small training dataset, according to the distribution proportion in each interval. The rest of the large training dataset is delegated as a testing dataset and experimental dataset. The training dataset serves as the small training dataset for NN model training. The testing dataset is used for the model to observe the current model accuracy and to judge whether the model is overfitting after each epoch of the training process. The experimental dataset is used for the reconstruction of the experimental accuracy after the NN parameters are fixed. According to the different proportions of the training and testing datasets, four groups of experiments were conducted, as shown in
Table 1.
The training dataset is extracted by 90%, 10%, 5%, and 3% of the entire dataset, based on the proportion of each interval, using 1,450,000, 164,540, 82,021, and 49,007 spectral curves. Most of the remaining datasets of each interval are selected as the testing dataset, accounting for 89%, 94%, 96%, and 9% of the intervals, with 1,469,467, 1,551,986, 1,585,000, and 149,007 spectral curves, respectively. The experimental datasets of the four groups are the same; all of which are selected from the 1% of each interval of the entire dataset randomly, with 15,993 spectral curves.
3.2. Encoding Layer
Figure 3 shows the 15 selected optical filter transmittance curves, and
Table 2 is the correlation coefficient matrix. The larger the correlation coefficient, the deeper the color in
Table 2. This process shows that the optical filter transmittance curves are highly smooth, with significant variation and high richness. Although the maximum value of the correlation coefficient in
Table 2 reaches 0.7, the MSE of reconstruction accuracy in the experimental results is still high. This is noteworthy since NN helps break the constraint of the intense demand for high non-correlation between any two of the optical filter transmittance curves. This process surely facilitates the design and fabrication of optical filters in the field of computational spectra.
3.3. Training and Simulation
In the training process, we take the average reconstruction accuracy MSE of each training epoch as the index and multiply the learning rate by 0.5 when the MSE does not decrease in 2 consecutive epochs. The model gets to the optimal solution very soon. In addition, the small-batch stochastic gradient descent method is applied to randomly and non-repeatedly select 64 data from all training data to coordinate the model stability and training speed.
The training process of the decoding network lasted until the 200th epoch. The MSE of training and testing is shown in
Figure 4. At the 19th epoch, the training and testing MSE is already below 1 × 10
5. In the middle of training, the downward trend of the loss function of the test has been reduced to close to 0. Subsequent training is less helpful in reducing the loss function of the test, which can be assumed that the test error at this time is the same as the training error at the end of training. At this time, the MSE of the test reaches 3 × 10
6, and the comparison in
Table 3 can be found to be a sufficiently high reconstruction accuracy.
Simulated spectra are reconstructed with NNs trained on each of the four datasets to investigate the effect of the size of the training dataset on the spectral reconstruction accuracy. In order to simulate the error in measuring the filter transmittance profile and spectra, we add random Gaussian noise with mean 0 and standard deviation (σ) of 10
−3 and 10
−2 to the encoding network, namely the filter transmittance profile matrix. Afterward, we compare them with the NN (STD-Net) without being doped with Gaussian noise to derive the average MSE, full width at half maximum (FWHM), peak amplitude error(PAE), peak wavelength position deviation(PWPD), and reconstruction speed of the reconstructed spectra versus the spectra in the simulated dataset, and record the data in
Table 3. Some important conclusions are stated as follows.
- (1)
The larger the dataset, the better the performance in metrics such as MSE, FWHM, PAE, and PWPD, and the better resistance against noise interference.
- (2)
Small datasets also show good accuracy and have shorter training processes under the condition of satisfying certain accuracy.
- (3)
The reconstruction speed is approximately identical when the network architectures are the same, and no correlation exists with the size of the training dataset.
These enlighten us that in some new fields where it is difficult to build large datasets or where the training cost is extremely high, the rational use of small datasets is important for cost saving.
5. Conclusions
This study proposes an NN computational spectrometer with high-correlated optical filters. It consists of high-correlation optical filters for encoding and a neural network called STD-Net for decoding, which is trained on a small training dataset. First, we propose a presumption that the spectral has a specific distribution law, in which a small training dataset, composed of randomly extracted data from the entire dataset, can cover the main features. Based on this, targeting the CAVE and the ICVL spectral datasets, the correlation coefficients of each spectral curve, and all other spectral curves are averaged and processed in absolute value as the distribution weight. All data are divided into 1000 intervals. Then, 90%, 10%, 5%, and 3% of the data are randomly extracted for training while certain portions of the dataset work as the test dataset in case of overfitting in the training process. In addition, a tailored loss function and adaptive learning rate mechanism are introduced to improve the training efficiency and reconstruction accuracy. Since this part, only focuses on the MSE between the original curves and the reconstructed ones. It has no compulsory request on the non-correlation property of optical filters. The non-correlation property for optical filters is a sufficient but unnecessary condition. Then the highly correlated PMMA filters are applied to work as the encoding layer. In the reconstruction network, we proposed a four-layer FC architecture network. Multi-scale training datasets are established to train the constructed neural network. Furthermore, we conducted four groups of comparison experiments with different data extraction ratios, indicating that STD-Net could achieve high reconstruction accuracy and robustness, even when the training data quantity was immensely limited. Finally, an experimental system was introduced, and the results indicate that the proposed NN computational spectrometer has good accuracy and efficiency. The STD-Net has successfully enabled computational spectrometry free from finding highly non-correlated filters which diminished the difficulty of filter design and fabrication. The STD-Net may provide a new method for the development of computational spectrometers.