Temporal Subtraction Technique for Thoracic MDCT Based on Residual VoxelMorph

Miyake, Noriaki; Lu, Huinmin; Kamiya, Tohru; Aoki, Takatoshi; Kido, Shoji

doi:10.3390/app12178542

Open AccessArticle

Temporal Subtraction Technique for Thoracic MDCT Based on Residual VoxelMorph

by

Noriaki Miyake

^1,*,

Huinmin Lu

¹,

Tohru Kamiya

¹,

Takatoshi Aoki

² and

Shoji Kido

³

¹

Department of Mechanical and Control Engineering, Graduate School of Engineering, Kyushu Institute of Technology, Kitakyushu 804-8550, Japan

²

Department of Radiology, University of Occupational & Environmental Health, Kitakyushu 807-8555, Japan

³

Department of Artificial Intelligence Diagnostic Radiology, Osaka University Graduate School of Medicine, Suita 565-0871, Japan

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(17), 8542; https://doi.org/10.3390/app12178542

Submission received: 11 July 2022 / Revised: 22 August 2022 / Accepted: 24 August 2022 / Published: 26 August 2022

(This article belongs to the Special Issue Image Processing and Machine Learning in Disease Predictions and Diagnosis)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The temporal subtraction technique is a useful tool for computer aided diagnosis (CAD) in visual screening. The technique subtracts the previous image set from the current one for the same subject to emphasize temporal changes and/or new abnormalities. However, it is difficult to obtain a clear subtraction image without subtraction image artifacts. VoxelMorph in deep learning is a useful method, as preparing large training datasets is difficult for medical image analysis, and the possibilities of incorrect learning, gradient loss, and overlearning are concerns. To overcome this problem, we propose a new method for generating temporal subtraction images of thoracic multi-detector row computed tomography (MDCT) images based on ResidualVoxelMorph, which introduces a residual block to VoxelMorph to enable flexible positioning at a low computational cost. Its high learning efficiency can be expected even with a limited training set by introducing residual blocks to VoxelMorph. We performed our method on 84 clinical images and evaluated based on three-fold cross-validation. The results showed that the proposed method reduced subtraction image artifacts on root mean square error (RMSE) by 11.3% (p < 0.01), and its effectiveness was verified. That is, the proposed temporal subtraction method for thoracic MDCT improves the observer’s performance.

Keywords:

digital healthcare; medical imaging analytics; computer aided diagnosis; temporal subtraction technique; nonrigid image registration; deep neural network; VoxelMorph; residual blocks

1. Introduction

Highly accurate images can be obtained in a short amount of time, and diagnostic imaging has become an indispensable part of medical care because of the development of medical imaging equipment such as computed tomography (CT) and magnetic resonance imaging (MRI). CT and MRI are three-dimensional tomographic images that provide detailed information on the inside of the body, contributing to more accurate diagnoses. However, the number of images per patient is huge, increasing the burden on physicians, who are responsible for reading the images for many patients. This burden may increase the possibility of misdiagnosis, such as overlooking a lesion. Therefore, computer-aided diagnosis (CAD) systems have been developed to reduce the burden on physicians and improve diagnostic accuracy and efficiency. CAD systems use computers to perform various image analyses, such as segmentation of organ areas, image registration, detection of lesions, and differentiation of detected lesions from benign to malignant, and provide analysis results as a “second opinion” to physicians [1]. Using the analysis results for diagnosis, radiologists, who must evaluate many images to identify lesions and determine a diagnosis, can increase their decision-making resources, improve the accuracy of diagnosis, and reduce the time required for diagnosis; and the results have become a useful tool for visual screening. In particular, there is a CAD system to streamline comparative reading, which is called a temporal subtraction technique. In the temporal subtraction technique, temporal subtraction images are obtained by subtracting the previous image from the current image. It can highlight temporal differences, such as the appearance of new lesions or changes in existing abnormalities in the previous medical image, by removing most of the regions of the normal object. In actual interpretation, in reading diagnostic images of lung cancer, the interpreter reads comparing the previously taken images with the current images in order to detect new lesions and record temporal changes in existing lesions. Thus, the temporal subtraction technique can be used to present visually effective information to the physician. Furthermore, lung cancer is the leading cause of cancer-related deaths worldwide, and early diagnosis and detection is critical for survival [2], but subtle lesions, such as early lung nodules, tend to have low contrast. Since diagnosticians have a difficult task of interpreting in a limited amount of time, it is hoped that a development of a temporal subtraction method for the early diagnosis of lung cancer can also be realized. In reality, the usefulness of the temporal subtraction method for 2D X-ray images has been proven by several experiments for clinical evaluation, and its practical introduction is in progress [3]. Additionally, with screening using low-dose CT, a 20% reduction in the incidence of lung cancer-related deaths among smokers was reported by the National Lung Screening Trial [4]. It is expected that several types of CT scans will be further employed in lung screening in the future. In addition, according to the results of Aoki, the use of temporal subtraction images for chest CT images was reported to improve the detection accuracy of lesions for physicians [5]. Hence, there is a need for the practical application of a temporal subtraction technique for chest CT images. However, chest CT images often show misalignment between the current image and the previous image due to differences in the examination environment and imaging equipment. If the images are subtracted with image misalignment, the normal structure remains as an image artifact on the temporal subtraction image, because it cannot be not removed. These image artifacts can affect the interpretation of temporal subtraction images. To reduce these image artifacts, you need to generate a subtraction image by converting the previous image to a structure similar to the current image and subtracting it. It is important to develop highly efficient and accurate registration methods to reduce these image artifacts, as well as putting CAD systems to practical use in clinical settings as medical devices. There are rigid and nonrigid methods in image registration techniques. Rigid registration has a fast computation time, and it is ideal for aligning normally non-deformable objects such as bones. However, it is not suitable for matching objects with complicated deformation, such as organs, bronchial walls, and blood vessels. On the other hand, nonrigid registration allows for flexible matching of objects with these deformations. However, the limitations of attempting to achieve nonrigid registration are the high computational cost and the difficulty of accurate matching. Given these circumstances, we are currently investigating a nonrigid registration method that can perform fast, highly flexible, and rigorous matching. Although lots of methods for nonrigid image registration of medical images have been developed, there remain various problems, for instance, they are not yet sufficient to deal with local changes caused by temporal differences and the shooting device environment. Therefore, there is no established approach to registration methods that can be applied to a variety of images. Moreover, deep neural networks have been attracting attention in recent years. Deep neural networks are expected to achieve highly accurate analysis in a short time by intensively learning with a large training dataset of various images containing complex information. Among them is VoxelMorph, a registration method based on unsupervised deep learning [6] that integrates a fully convolutional U-net architecture and spatial transformer and simultaneously trains them to align images. Although VoxelMorph is a useful method, preparing large training datasets is difficult for medical image analysis, as in our study. Therefore, the possibilities of incorrect learning, gradient loss, and overlearning are concerns. However, the introduction of a residual block in image recognition is reportedly expected to be able to prevent the gradient loss problem as well as overlearning [7]. Residual blocks are realized by adding residual mappings and skip connections and can improve feature utilization when learning upper-layer features from lower-layer features in convolution. Hence, by introducing residual blocks to VoxelMorph, high learning efficiency can be expected even with a limited training set.

In this research, we propose a method for generating temporal subtraction images of thoracic CT images based on ResidualVoxelMorph, which introduces a residual block to VoxelMorph to enable flexible positioning at a low computational cost. In other words, this technique contributes to the construction of a 3D temporal subtraction technique that can rapidly generate subtraction images that can more clearly detect early lesions that are difficult to detect with 2D X-ray images and that can be used for lung cancer screening with thoracic CT. This technique can improve reading performance for physicians and contribute to early case detection and lesions treatment for patients.

2. Literature Review

In order to establish a temporal subtraction method, we must solve the problem of artifacts caused by misregistration. These image artifacts can affect the interpretation of temporal subtraction images. It is important to develop highly efficient and accurate nonrigid registration methods to reduce these image artifacts. This section outlines deformable registration methods, divided into signal processing-based and deep neural networks-based methods, and focuses on recent advances. Based on the research, the motivations for our own research are presented.

2.1. Signal Processing-Based Methods for Medical Image Registration

Signal processing-based methods for image registration algorithms is built on three main essential elements: a deformation model, an objective function, and an optimization method for deriving displacement vector fields in 3D space. The result of the registration algorithm is largely influenced by the deformation model and the objective function [8]. As these models, elastic models [9], statistical parametric mappings [10], B-spline free-form deformations [11], and daemons [12] were proposed. In addition, as cross-correlation, mutual information, objective functions, and PCA-SIFT were proposed. However, achieving high-precision image registration using these algorithms tends to increase computational costs. This means that all of these non-training approaches optimize the energy function of each image pair, which takes time to register. Therefore, it was reported that the execution time may be shortened to several minutes by implementing arithmetic processing using a GPU. However, a GPU is required for each registration of unknown data, and there are concerns about the cost of commercialization [13].

2.2. Deep Neural Networks-Based Methods for Medical Image Registration

Recently, research on deep neural networks have been actively conducted to achieve medical image registration. For example, Eppenhof proposed the use of 3D CNNs to achieve deformable registration of inhalation-exhalation 3D lung CT image volumes [14]. This technique provides deformable registration that preserves the appearance of a realistic image through a series of multiscale and random transformations of aligned image pairs, in a way that does not require manual annotation of ground truth data. In addition, like other methods of generating ground-truth data, CNNs can be trained in a supervised way using a relatively small number of medical images. The processing is real-time across applications, and it can enable robust registration with supervised transformation estimation. However, as a limitation to consider, first of all, it should be noted that the quality of registration using this framework depends on the quality of grand-truth, and this technique depends on the practitioner’s expertise in the quality of these labels. Additionally, these labels are complicated due to the relatively small number of individuals who have the necessary expertise to perform such registrations. Although these limitations can be addressed by transforming training data and generating composite ground-truth data, it is essential to make sure that the synthesized data closely resembles clinical data. This motivated several groups to explore unsupervised approaches [15,16]. One important innovation that has been useful in these studies is the spatial transformation network (STN) [17]. Several methods use STN to perform transformations relevant to their registration applications [18,19]. Kuang used a CNN- and STN-inspired framework to perform a deformable registration of T1-weighted brain MR volumes [19]. The loss function consists of an NCC term and a regularization term. They compared their method to VoxelMorph [6] and uTIlzReg GeoShoot [20] using the LBPA40 and Mindboggle 101 datasets, and demonstrated superior performance compared to both methods. Furthermore, these image registration methods do not explicitly enforce criteria that guarantee topology preservation, which often results in loss of structural information and inaccurate registration. To overcome the potential degeneracy problem of registration fields, Kim presented a deformed image registration method called CycleMorph [21]. CycleMorph forces the deformed image back into the original image, thereby providing cycle consistency between image pairs. Experiments on various image datasets have confirmed that CycleMorph provides topology-preserving image deformations for arbitrary image pairs, resulting in significant performance gains. However, existing deep neural network-based registration methods employ networks such as CNNs, which have a limited ability to extract global image features and thus, cannot effectively capture long-range dependencies in video and fixed images. Therefore, Wang proposed a network called Transformer-UNet (TUNet) to effectively capture long-range dependence in video and fixed images [22]. This network introduces a vision transformer (VIT) into the UNet framework to extract global and local features from moving and fixed images, thereby generating a deformation field. To validate its performance, experiments were conducted using LPBA40 and OASIS-1, and qualitative and quantitative evaluations demonstrated that it provides a high registration accuracy.

2.3. Conventional Temporal Subtraction Technology and the Positioning of This Research

Itai proposed a technique to generate temporal subtraction images for multi-detector row computed tomography (MDCT) volume images [23]. Although the image quality of the subtraction images was generally good, the problem was that many subtraction image artifacts appeared in these images due to misregistration. They often exist on temporal subtraction images which are generated from thoracic MDCT images and can be less effective in the detection of pulmonary nodules. We previously proposed a temporal subtraction technique based on nonrigid image registration using the finite element method in selected local regions [24]. In principle, the finite element method can handle any geometry, but it is expensive when attempting to increase accuracy. Therefore, using the FEM in the local region selected, the computational cost can be significantly reduced and artifacts can be reduced by flexible deformation with sufficient nodal placement in the volume of interest (VOI). However, the computational cost is still high, artifacts remain, and further improvements are expected.

Regarding our challenge, the limitation of these deep neural network-based methods is that the transformations in different directions do not significantly resemble actual lung movements. In particular, one of the problems is that the loss of structural smoothness in the image promotes deformation irregularities. To solve this problem, deep neural networks require huge amounts of data to train. However, excluded medical imaging, such huge amounts of data are often already available, and in some situations, transition learning can be used to avoid problems. On the other hand, transition learning is not possible with 3D medical images such as thoracic MDCT images [14]. In addition, the other limit is that networks trained by inputting entire 3D images require oversized memory units [25]. In previous studies, the image was scaled down to work around this limitation, but the problem of blurring the image remains. In this research, we propose a high-precision 3D image registration method that can be transformed smoothly at low computational cost and can be learned efficiently even with a small amount of training data. With this method, we aim to establish an efficient and clear 3D temporal subtraction image generation method.

3. Research Method

Subtraction image artifacts are generated by many differences. These differences include the location, shape, size of blood vessels, chest walls, ribs, nodules, and other anatomical structures such as lung and heart structures included in both current and previous images. Because each object in an image has different deformation properties, capturing these deformations in structural formulas is extremely difficult; thus, a registration method that allows fast and flexible deformation is required. Therefore, we introduce a temporal subtraction image generation method that introduces a deep neural network registration technique.

3.1. Temporal Subtraction Technique with Deep Neural Network

Figure 1 shows a flowchart of the proposed temporal subtraction image generation method. First, lung field regions were extracted for each of the current and previous images by the processing shown in Figure 2, and clipping was performed according to the lung field regions. This is expected to allow a more efficient registration of structures in the lung field, such as blood vessels, compared to applying deep neural network registration in the current and previous CT images of the entire lung field. Next, owing to changes in the field of view (FOV), the size of the pixels in the image was adjusted by upscaling or downscaling the previous lung field clip images so that the pixel size was the same as the pixel size of the current image. Furthermore, as shown in Figure 3, the current and previous lung field clip images were divided into patches, and each patch was aligned by a deep neural network to generate a deformed image of the previous patch, which was then replaced with the original position of the split to generate a deformed image of the lung field clip region. The patch size was 64 × 64 × 64 pixels for experimental purposes. This addresses the problem of device-memory limitations in learning deep neural networks with GPUs. Next, a subtraction image was generated by subtracting the signal values of the deformed image from the signal values of the current CT image in the lung field clip region. Finally, the subtraction image is scaled up or down such that the image is identical to the FOV of the original image of the current CT, thereby generating a temporal subtraction image.

3.2. Deep Neural Network Model for Registration of Temporal Subtraction Technique: ResidualVoxelMorph

In this section, we propose ResidualVoxelMorph as a deep neural network for image registration using a temporal subtraction technique. Figure 4 shows the network model of ResidualVoxelMorph. The parametrization of registration field generation is based on a neural network architecture that introduces residual blocks in a network architecture such as U-net [26], which consists of encoder and decoder sections with skip connections. The residual blocks in both encoder and decoder stages consist of a 3D convolutional layer with a kernel size of three and stride of two, as shown in Figure 5, with a leaky ReLU layer with parameter 0.2 and the addition of skip connections [7]. In the encoder, the residual-block layer is used to halve the spatial dimensions of each layer. Therefore, the encoder’s successive layers work with a coarser representation of the input, such as the image pyramid used in traditional image registration. The decoding stage alternates between up-sampling, the residual-block layer, and concatenating skip connections to propagate the features learned in the encoding stage directly to the layer that generates registration. Registration fields are generated by inputting a current and previous CT image patch into the residual U-net. The registration field and previous CT image patch were then input into the spatial transformer network [17] to generate the deformed image patch. The spatial transformer network used the registration field φ to calculate the (sub-pixel) voxel position p’ = p + u(p) in the previous CT image patch m for each voxel p. Because the image values can be defined only in integer positions, the values of the eight adjacent voxels are linearly interpolated:

m ◦ φ (p) = \sum_{q \in Z (p^{'})} m (q) \prod_{d \in \{x, y, z\}} (1 - |{p^{'}}_{d} - q_{d}|)

(1)

where Z(p′) is the voxel neighborhood of p′, and d iterates over the dimensions of Ω. The gradients and sub-gradients can be computed such that errors can be backpropagated during optimization. This model replaces U-net, which is the registration field generation layer of the VoxelMorph [6] proposed by Balakrishnan, with a residual U-net. In addition, because the residual blocks can improve feature utilization, it is expected to be highly efficient even with a limited training set.

3.3. Training ResidualVoxelMorph

This section describes the loss function and optimization algorithm required to train ResidualVoxelMorph for the temporal subtraction technique. First, for the current CT image patch f, previous CT image patch m, and registration field φ, the loss function L(f, m, φ) is defined as:

L (f, m, φ) = L_{s i m} (f, m ◦ φ (p)) + λ L_{s m o o t h} (φ)

(2)

L_{s i m} (f, m ◦ φ) = \frac{(1 - N C C (f, m ◦ φ))}{σ^{2}}

(3)

L_{s m o o t h} (φ) = \frac{d}{2} \sum_{i \in Ω} \sum_{j \in N (i)} {(φ (i) - φ (j))}^{2}

(4)

where λ = 10 and σ = 0.1 are empirically set regularization parameters, d = 3 is the dimension’s number, and N(i) is the neighborhood of voxel i. Furthermore, NCC is the normalized cross-correlation value given by

N C C = \frac{\sum_{P_{i} \in \hat{Ω}} (f (p_{i}) - \hat{f}) (m ◦ φ (p_{i}) - \hat{m ◦ φ})}{\sqrt{\sum_{P_{i} \in \hat{Ω}} {(f (p_{i}) - \hat{f})}^{2} \sum_{P_{i} \in \hat{Ω}} {(m ◦ φ (p_{i}) - \hat{m ◦ φ})}^{2}}}

(5)

where

\hat{f}

is the average signal value of the current CT image patch,

\hat{m ◦ φ}

the average signal value of the deformed image patch, and

\hat{Ω}

the region only within the lung field. This is expected to construct a network that can preferentially be positioned within the lung field by calculating cross-correlation values only within the lung field and ignoring the outside of the lung field. Based on this loss function, each patch was trained using the Adam optimizer [27]. The Adam optimizer is defined as

w^{t + 1} = w^{t} - α \frac{\hat{m}}{\sqrt{\hat{v}} + ε}

(6)

\hat{m} = \frac{m_{t + 1}}{1 - β_{1}^{t}}

(7)

\hat{v} = \frac{v_{t + 1}}{1 - β_{2}^{t}}

(8)

m_{t + 1} = β_{1} m_{t} + (1 - β_{1}) \nabla E (w^{t})

(9)

v_{t + 1} = β_{2} v_{t} + (1 - β_{2}) \nabla E {(w^{t})}^{2}

(10)

where E is the error function, m₀ = 0, v₀ = 0, α = 10⁻⁴ is the learning rate, and the hyperparameters are β₁ = 0.9, β₂ = 0.999, and ϵ = 10⁻⁸. Finally, the network was built by training on 20% of the dataset for validation.

4. Experiment

We experimented in the following PC environment: Intel^® CoreTM i9-9900K 3.6 GHz, 32 GB of RAM, Windows 10 Professional 64-bit, and PyCharm as the development environment. An NVIDIA GeForce RTX 2080 Ti GPU was used for deep neural network training.

4.1. Image Database

In the experiment, we applied a new method of temporal subtraction technique for volume data to images taken using a MDCT scanner (Aquilion, Toshiba, Japan). The images consist of 84 data out of 60 abnormal cases, including lesions, and 24 normal cases, and an average shooting interval of approximately 8 months (shortest 2 months to longest 34 months). In this research, it is important to detect relatively small nodular shadows of about 5–20 mm, so abnormal cases are data in which such shadows are newly appearing in current CT images. We verified using a three-part cross-validation method. The data were divided into the groups A, B, and C. Group A contained data from 27 cases (19 abnormal, including lesions, and 8 normal). Group B contained data from 29 cases (21 abnormal, including lesions, and 8 normal). Group C contained data from 28 cases (20 abnormal, including lesions, and 8 normal). During training, the dataset of current and previous images was augmented by a factor of 9 by preparing images rotated ±5° in the x axis and ±5° in the z axis.

4.2. Results and Evaluation

Figure 6 shows examples of the results applied the proposed method to current and previous image pairs. Figure 6a shows a section of a previous image, (b) a section of current image, (c) a section of the temporal subtraction image, and (d) a volume-rendering capture image of the temporal subtraction images. There is a nodular shadow with a diameter of 6.8 mm in the circular mark on the current image and the temporal subtraction image. As shown in the cross-section of the temporal subtraction image in (c), we can see that the newly appearing nodular shadow is emphasized in the current image. Additionally, as observed in (d), although some residual artifacts are still present, lesions with temporal changes are sufficiently emphasized such that identifying their location is easy and effective for interpretation. Table 1 shows the range of the loss function values for the training sets, and Table 2 shows the range of the loss function values for the validation sets. These tables show that the loss function values are sufficiently small. However, the minimum loss function value of the training set is smaller than that of the validation set, which raises concerns about overlearning. Since the data set used in this experiment is small, future validation with a large amount of data is needed. The average prediction processing time of the proposed method was 84.4 s by using GPU. Cases where only the CPU and the proposed method were used took 277.7 s, which is faster than the conventional temporal subtraction technique [7], which requires several tens of minutes.

The performance of the proposed method can be evaluated by comparing the temporal subtraction images generated using ResidualVoxelMorph as deep neural network models of the proposal method, those generated using the VoxelMorph as deep neural network models of the proposal method, and those generated by subtracting from deformed images using conventional VoxelMorph [6]. Figure 7 shows the (a) previous image, (b) current image, (c) temporal subtraction image from ResidualVoxelMorph with the proposed method, (d) temporal subtraction image from VoxelMorph with the proposed method, and (e) temporal subtraction image from conventional VoxelMorph. The current images show a nodular shadow with a diameter of 6 mm. As shown in Figure 7c–e, compared to conventional VoxelMorph, the proposed method suppressed the appearance of artifacts and highlighted the lesion. This is largely owed to the difference in the computed region of the NCC within the loss function during training. This is because conventional VoxelMorph uses the NCC of the entire image as the image similarity in the loss function, whereas the proposed method calculates the NCC only within the lung field, thereby allowing the construction of a network that can preferentially register structures within the lung field. Furthermore, the temporal subtraction images were blurred in conventional VoxelMorph. This is because of the GPU device memory size during training, which is reduced to 160 × 192 × 224 pixels in conventional VoxelMorph, whereas the proposed method divides the image into patches without reduction. Therefore, the proposed method suppresses blurring. Furthermore, comparing (c) and (e) in Figure 7, the artifacts were suppressed when ResidualVoxelMorph was used in the deep neural network model. This is because the residual blocks enable more efficient training.

However, some lesions in the temporal difference images obtained by applying ResidualVoxelMorph with the proposed method were depicted as missing, although they were not present in the previous images, such as in the lower part of the lesion in Figure 7c. This is because the deformation capability of ResidualVoxelMorph with the proposed method is too high, and the normal structures around the lesion were forcibly deformed and mapped. To solve this problem, devising a deep neural network model that considers new lesions is necessary.

In addition, ground truth images were generated by manually leaving only the temporally different lesions, and the root mean square error (RMSE) was calculated using ground truth images and temporal subtraction images to evaluate the performance of the proposed method. The ground truth images were generated manually by using Attractive software (PixSpace, Fukuoka, Japan) to manually create masked regions of only temporally different lesion areas, and then the images output, using the masked image output function, were used as ground truth images. Table 3 lists the RMSEs between the temporal subtraction images generated using ResidualVoxelMorph as deep neural network models of the proposed method, those generated using VoxelMorph as deep neural network models of the proposed method, and those generated by subtracting from deformed images using conventional VoxelMorph [6]. The results showed that the RMSE averages of ResidualVoxelMorph with the proposed method, VoxelMorph with the proposed method, and conventional VoxelMorph were 103.6, 112.6, and 116.8, respectively. Therefore, the average RMSE of the proposed method was 11.3% lower than that of the conventional method. To justify the normality of the difference in RMSE between methods, the Shapiro-Wilk test at a significance level of 0.01 was conducted, and normality was rejected: the comparison of ResidualVoxelMorph with the proposed method vs. VoxelMorph with the proposed method: p = 0.0039 (p < 0.01), the comparison of ResidualVoxelMorph with the proposed method vs. the conventional VoxelMorph was p = 1.1 × 10⁻⁸ (p < 0.01), and the comparison of VoxelMorph with the proposed method vs. the conventional VoxelMorph was p = 0.0015 (p < 0.01). Therefore, the Wilcoxon signed rank test with a significance level of 0.01, which does not require the normality assumption, indicated that the difference in RMSE between these methods was statistically significant: the comparison of ResidualVoxelMorph with the proposed method vs. VoxelMorph with the proposed method: p = 2.1 × 10⁻¹⁵ (p < 0.01), the comparison of ResidualVoxelMorph with the proposed method vs. the conventional VoxelMorph was p = 4.4 × 10⁻¹³ (p < 0.01), and the comparison of VoxelMorph with the proposed method vs. the conventional VoxelMorph was p = 5.1 × 10⁻⁵ (p < 0.01). These results show that the performance of the proposed method is superior to that of the conventional method, and that the performance of ResidualVoxelMorph is significantly superior to that of VoxelMorph. Figure 8 illustrates boxplots of the RMSEs. These boxplots represent the centerline of the box as the median, the top and bottom lines of the box as the 25th and 75th % tiles, the cross as the mean, and the points as the statistical outliers. Comparing these plots, ResidualVoxelMorph with the proposed method is lower than the traditional method in all indicators of the median, 25th percentile, 75th percentile, and average. Furthermore, the RMSE of the proposed method has a small case variation. Obviously, ResidualVoxelMorph with the proposed method was able to reduce artifacts more efficiently, and the proposed method performed better than the conventional method. Finally, Table 4 shows the average RMSE of each method in the abnormal cases, and Table 5 shows the average RMSE of each method in the normal cases. As can be seen from Table 4 and Table 5, these algorithms seem higher performance in the normal cases than in the abnormal cases. A possible reason for this is that, in abnormal cases, it is not possible to deal with the area around the lesion, and image artifacts may appear due to misalignment of normal structures around the lesion.

5. Conclusions

In this paper, we proposed a new temporal subtraction method with reduced subtraction image artifacts using a nonrigid image registration method based on a deep neural network modeled on ResidualVoxelMorph. The results showed that the proposed method reduced the RMSE by 11.3% on average compared with the conventional method with thoracic MDCT images as real data. By comparing the proposed method with the conventional method, high-speed processing of a temporal subtraction technique was possible, and it was confirmed that the image quality of the temporal subtraction image was improved because subtraction image artifacts have been removed from the temporal subtraction image. In the future, we plan to conduct semi-quantitative visual assessments by expert radiologists to assess the clinical relevance of this technique. Furthermore, as an improvement of the method, we plan to improve the accuracy by devising a deep neural network model that can be applied to new lesions for which no alignment support exists. In addition, we would like to consider introducing deep neural networks to select local regions in signal-processing-based registration, as this is expected to reduce costs and improve accuracy. Therefore, our preliminary results indicate that temporal subtraction images generated based on our proposed method are useful for radiologists in detecting interval changes on MDCT images.

In short, the proposed new temporal subtraction method contributes to the construction of a 3D temporal subtraction technique that can be used for thoracic CT screening of lung cancer and that can rapidly generate subtraction images that can be interpreted more efficiently for early lesions that are difficult to detect with 2D X-ray images. Finally, the proposed technique to improve the performance of the observer can be expected and can contribute to the early case detection and treatment of low contrast lesions.

Author Contributions

Conceptualization, N.M.; methodology, N.M.; software, N.M.; validation, N.M.; formal analysis, N.M.; investigation, N.M.; resources, N.M.; data curation, N.M.; writing—original draft preparation, N.M.; writing—review and editing, H.L., T.K., T.A., S.K.; visualization, N.M.; supervision, T.K.; project administration, N.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been partially supported by the Kayamori Foundation of Informational Science Advancement and Grants-in-Aid for Scientific Research of JSPS KAKENHI (21H03840).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare there is no conflict of interest.

References

Doi, K. Computer-aided diagnosis in medical imaging: Historical review, current status and future potential. Comput. Med. Imaging Graph. 2007, 31, 198–211. [Google Scholar] [CrossRef] [PubMed]
Ferlay, F.; Soerjomataram, I.; Dikshit, R.; Eser, S.; Mathers, C.; Rebelo, M.; Parkin, D.M.; Forman, D.; Bray, F. Cancer incidence and mortality worldwide: Sources, methods and major patterns in GLOBOCAN 2012. Int. J. Cancer 2015, 136, E359–E386. [Google Scholar] [CrossRef] [PubMed]
Kakeda, S.; Kamada, K.; Hatakeyama, Y.; Aoki, T.; Korogi, Y.; Katsuragawa, S.; Doi, K. Effect of temporal subtraction technique on interpretation time and diagnostic accuracy of chest radiography. Am. J. Roentgenol. 2006, 187, 1253–1259. [Google Scholar] [CrossRef] [PubMed]
National Lung Screening Trial Research Team. The national lung screening trial: Overview and study design. Radiology 2011, 258, 243–253. [Google Scholar] [CrossRef]
Aoki, T.; Murakami, S.; Kim, H.; Fujii, M.; Takahashi, H.; Oki, H.; Hayashida, Y.; Katsuragawa, S.; Shiraishi, J.; Korogi, Y. Temporal Subtraction Method for Lung Nodule Detection on Successive Thoracic CT Soft-Copy Images. Radiology 2014, 271, 255–261. [Google Scholar] [CrossRef]
Balakrishnan, G.; Zhao, A.; Sabuncu, M.R.; Guttag, J.; Dalca, A.V. VoxelMorph: A learning framework for deformable medical image registration. IEEE Trans. Med. Imaging 2019, 38, 1788–1800. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Sotiras, A.; Davatzikos, C.; Paragios, N. Deformable medical image registration: A survey. IEEE Trans. Med. Imaging 2013, 32, 1153–1190. [Google Scholar] [CrossRef]
Shen, D.; Davatzikos, C. Hierarchical attribute matching mechanism for elastic registration. IEEE Trans. Med. Imaging 2002, 21, 1421–1439. [Google Scholar] [CrossRef]
Ashburner, J.; Friston, K.J. Voxel-based morphometry—The methods. Neuroimage 2000, 11, 805–821. [Google Scholar] [CrossRef]
Rueckert, D.; Sonoda, L.I.; Hayes, C.; Hill, D.L.G.; Leach, M.O.; Hawkes, D.J. Nonrigid registration using free-form deformation: Application to breast MR images. IEEE Trans. Med. Imaging 1999, 18, 712–721. [Google Scholar] [CrossRef]
Thirion, J.P. Image matching as a diffusion process: An analogy with maxwell’s demons. Med. Image Anal. 1998, 2, 243–260. [Google Scholar] [CrossRef]
Modat, M.; Ridgway, G.R.; Taylor, Z.A.; Lehmann, M.; Barnes, J.; Hawkes, D.J.; Fox, N.C.; Ourselina, S. Fast free-form deformation using graphics processing units. Comput. Methods Programs Biomed. 2010, 98, 278–284. [Google Scholar] [CrossRef] [PubMed]
Haskins, G.; Kruger, U.; Yan, P. Deep learning in medical image registration: A survey. Mach. Vis. Appl. 2020, 31, 8. [Google Scholar] [CrossRef]
De Vos, B.D.; Berendsen, F.F.; Viergever, M.A.; Staring, M.; Išgum, I. End-to-end unsupervised deformable image registration with a convolutional neural network. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support; Springer: Cham, Switzerland, 2017; pp. 204–212. [Google Scholar]
Li, H.; Fan, Y. Non-rigid image registration using self-supervised fully convolutional networks without training data. In Proceedings of the 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), Washington, DC, USA, 4–7 April 2018; pp. 1075–1078. [Google Scholar]
Jaderberg, M.; Simonyan, K.; Zisserman, A. Spatial transformer networks. Adv. Neural Inf. Process. Syst. 2015, 28, 2017–2025. [Google Scholar]
Ferrante, E.; Oktay, O.; Glocker, B.; Milone, D.H. On the adaptability of unsupervised CNN-based deformable image registration to unseen image domains. In International Workshop on Machine Learning in Medical Imaging; Springer: Cham, Switzerland, 2018; pp. 294–302. [Google Scholar]
Kuang, D.; Schmah, T. Faim–A convnet method for unsupervised 3d medical image registration. In International Workshop on Machine Learning in Medical Imaging; Springer: Cham, Switzerland, 2019; pp. 646–654. [Google Scholar]
Vialard, F.-X.; Risser, L.; Rueckert, D.; Cotter, C.J. Diffeomorphic 3D image registration via geodesic shooting using an efficient adjoint calculation. Int. J. Comput. Vis. 2012, 97, 229–241. [Google Scholar] [CrossRef]
Kim, B.; Kim, D.H.; Park, S.H.; Kim, J.; Lee, J.-G.; Ye, J.C. CycleMorph: Cycle consistent unsupervised deformable image registration. Med. Image Anal. 2021, 71, 102036. [Google Scholar] [CrossRef]
Wang, Y.; Wen, Q.; Xuming, Z. A Transformer-based Network for Deformable Medical Image Registration. arXiv 2022, arXiv:2202.12104. [Google Scholar]
Itai, Y.; Kim, H.; Ishikawa, S.; Katsuragawa, S.; Doi, K. Development of A Voxel Matching Technique for Substantial Reduction of Subtraction Artifacts in Temporal Subtraction Images Obtained from Thoracic MDCT. J. Digit. Imaging 2010, 23, 31–38. [Google Scholar] [CrossRef]
Miyake, N.; Lu, H.; Kamiya, T.; Aoki, T.; Kido, S. Optimizing early cancer diagnosis and detection using a temporal subtraction technique. Technol. Forecast. Soc. Chang. 2021, 167, 120745. [Google Scholar] [CrossRef]
Fu, Y.; Lei, Y.; Wang, T.; Curran, W.J.; Liu, T.; Yang, X. Deep learning in medical image registration: A review. Phys. Med. Biol. 2020, 65, 20TR01. [Google Scholar] [CrossRef] [Green Version]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Kingma, D.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]

Figure 1. Overview of proposed CT temporal subtraction method.

Figure 2. Overall scheme for extraction method of lung field region.

Figure 3. Schematic of deep neural network registration for each patch.

Figure 4. ResidualVoxelMorph network architecture.

Figure 5. Residual block architecture.

Figure 6. Result images applied the proposal method to the current and previous CT images pair. (a) Shows a section of previous CT images, (b) a section of current CT images, (c) a section of temporal subtraction images, and (d) a volume rendering capture image of temporal subtraction images. The lesion is circled.

Figure 7. Comparison of the results of applying ResidualVoxelMorph with the proposal method, VoxelMorph with the proposal method, and conventional VoxelMorph to the previous and current CT image pair. (a) Shows a previous CT image, (b) a current CT image, (c) a temporal subtraction image from ResidualVoxelMorph with the proposal method, (d) a temporal subtraction image from VoxelMorph with the proposal method, and (e) a temporal subtraction image from conventional VoxelMorph. Lesions are circled.

Figure 8. Boxplots of RMSE for each method. These boxplots represent the centerline of the box as the median, the top and bottom lines of the box as the 25th and 75th % tiles, the cross as the mean, and the points as the statis-tical outliers.

Table 1. The range of the loss function values for the training sets.

	Group (A + B)	Group (B + C)	Group (A + C)	Mean
ResidualVoxelMorph + Proposed Method	[16.27, 56.81]	[18.22, 60.18]	[17.84, 59.61]	[17.4, 58.9]

Table 2. The range of the loss function values for the validation sets.

	Group (A + B)	Group (B + C)	Group (A + C)	Mean
ResidualVoxelMorph + Proposed Method	[18.87, 45.97]	[19.33, 47.78]	[19.19, 49.91]	[19.13, 47.9]

Table 3. RMSE average values of all evaluated methods. Values in parentheses represent standard deviation of RMSE.

	Group A	Group B	Group C	Mean
ResidualVoxelMorph + Proposed Method	99.3 (20.9)	110.9 (26.6)	100.6 (22.3)	103.6 (24.1)
VoxelMorph + Proposed Method	107.7 (21.3)	119.3 (31.1)	110.7 (23)	112.6 (26.1)
VoxelMorph + Conventional Method	110.7 (22.5)	124 (27.3)	115.7 (24.5)	116.8 (26)

Table 4. RMSE average values in the abnormal cases of all evaluated methods. Values in parentheses represent standard deviation of RMSE.

	Group A	Group B	Group C	Mean
ResidualVoxelMorph + Proposed Method	102.2 (23.3)	110.9 (29)	105 (24.3)	106 (25.5)
VoxelMorph + Proposed Method	110.3 (23.7)	120.1 (34.8)	114 (25.3)	114.8 (27.9)
VoxelMorph + Conventional Method	111.7 (24.9)	121.7 (30.9)	120 (25.4)	117.8 (27.1)

Table 5. RMSE average values in the normal cases of all evaluated methods. Values in parentheses represent standard deviation of RMSE.

	Group A	Group B	Group C	Mean
ResidualVoxelMorph + Proposed Method	92.2 (10.4)	111 (20.3)	89.3 (9.2)	97.5 (13.3)
VoxelMorph + Proposed Method	101.4 (12)	117.1 (18.7)	101.8 (11.2)	106.8 (14)
VoxelMorph + Conventional Method	108.4 (15)	130.1 (20.7)	104.1 (16.2)	114.2 (17.3)

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Miyake, N.; Lu, H.; Kamiya, T.; Aoki, T.; Kido, S. Temporal Subtraction Technique for Thoracic MDCT Based on Residual VoxelMorph. Appl. Sci. 2022, 12, 8542. https://doi.org/10.3390/app12178542

AMA Style

Miyake N, Lu H, Kamiya T, Aoki T, Kido S. Temporal Subtraction Technique for Thoracic MDCT Based on Residual VoxelMorph. Applied Sciences. 2022; 12(17):8542. https://doi.org/10.3390/app12178542

Chicago/Turabian Style

Miyake, Noriaki, Huinmin Lu, Tohru Kamiya, Takatoshi Aoki, and Shoji Kido. 2022. "Temporal Subtraction Technique for Thoracic MDCT Based on Residual VoxelMorph" Applied Sciences 12, no. 17: 8542. https://doi.org/10.3390/app12178542

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Temporal Subtraction Technique for Thoracic MDCT Based on Residual VoxelMorph

Abstract

1. Introduction

2. Literature Review

2.1. Signal Processing-Based Methods for Medical Image Registration

2.2. Deep Neural Networks-Based Methods for Medical Image Registration

2.3. Conventional Temporal Subtraction Technology and the Positioning of This Research

3. Research Method

3.1. Temporal Subtraction Technique with Deep Neural Network

3.2. Deep Neural Network Model for Registration of Temporal Subtraction Technique: ResidualVoxelMorph

3.3. Training ResidualVoxelMorph

4. Experiment

4.1. Image Database

4.2. Results and Evaluation

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI