Pancreatic cystic lesions (PCLs) have a prevalence rate of 2.4% to 24.3% in the asymptomatic adult population [1
]. Common PCLs consist of five main subtypes, and each presents different disease courses and aggressiveness: (1) intraductal papillary mucinous neoplasm (IPMN), (2) mucinous cystic neoplasm (MCN), (3) serous cystic neoplasm (SCN), (4) cystic neuroendocrine tumor (NET), and (5) pseudocysts [3
]. Differentiation among subtypes of PCLs is critical, as mucinous PLCs have higher cancer risk. Advanced neoplasia was reported in 100% of the main ductal-type of IPMN, in 39% of resected MCN, in 30% of branch-type IPMN, and in 10% of resected NET [4
]. Subtyping will impact the clinical decision on the surgical management and non-surgical surveillance, as SCN and pseudocysts have a low cancer risk, for which costly surveillance could be avoided. Abdominal ultrasound, computed tomography, magnetic resonance imaging, and endoscopic ultrasound (EUS) are utilized to evaluate PCLs [5
]. However, accurate preoperative diagnosis of subtypes of PCLs poses practical challenges to clinicians.
Confocal laser endomicroscopy (CLE), a novel endoscopy technology with real-time 1,000-fold magnification, enables in vivo optical pathology for various diseases in multiple organ systems [6
]. CLE enables the direct visualization of elastin fibers of bronchus and structural changes of alveoli in bronchial asthma and interstitial lung disease, respectively [7
]. In addition, CLE could enable the direct visualization of lung cancer cells and the potential of monitoring post-chemotherapy responses by direct observing the apoptosis of cancer cells.
EUS-guided needle-based confocal laser endomicroscopy (nCLE) enables in vivo optical pathology to examine PCLs [6
]. In a systematic review and international Delphi report with fifteen nCLE experts, twelve clinical studies were reviewed. Characteristic nCLE features enabled differentiation of mucinous versus non-mucinous PCLs with an accuracy of 71–93% and serous cystadenoma versus non-serous PCLs with an accuracy of 87–99% [8
In addition to differentiation of subtypes of PCLs, nCLE was also shown to provide risk stratification for malignant potential of IPMNs. Krishna et al. investigated characteristic findings of nCLE on 26 IPMNs (including 16 cases of high-grade dysplasia and cancers) and found that the quantification of papillary epithelial width and darkness on nCLE had high accuracy to identify cases of high-grade dysplasia and cancers [9
Krishna et al. investigated 29 nCLE videos on PCLs with six expert endosonographers and showed that the diagnostic accuracy and interobserver agreement (IOA) were 95%, k = 0.81 for mucinous PCLs, and 98%, k = 0.83 for SCN, respectively [10
]. Machicado et al. utilized 76 nCLE videos from three prospective studies and invited 13 expert endosonographers to test the diagnostic accuracy and IOA [11
]. The diagnostic accuracy for IPMN, MCN, SCN, cystic-NET, and pseudocyst revealed 86%, 84%, 98%, 96%, and 96%, and IOA k = 0.72, 0.47, 0.85, 0.73, and 0.57, respectively. Nevertheless, for non-experts, real-time interpretation of nCLE can be time-consuming and requires specialized operator training [12
Traditional machine learning methods encompass linear discriminants, Bayesian networks, random forest, and support vector machine, while modern machine learning methods consist of artificial neural networks and convoluted neural network (CNN) [13
]. Applications of deep learning CNN technologies in endoscopy have created an exciting new era of computer-aided diagnosis (CAD) in endoscopy [14
]. For example, Rashid et al. combined radiomics of feature extraction and CNN of featureless methods to improve the detection accuracy of breast lesions [15
], and appropriate optimization methods have been shown to further enhance the results of CNN [16
Two recent studies utilizing CNN-enabled CAD systems tried to address issues related to nCLE, such as long learning curve, low kappa value in readings, and time consumed by endoscopists [17
For CAD systems on nCLE videos, designation of ROIs in each frame of the video constitutes a practical problem to solve in the first place, especially when the 0.85 mm miniproble examines inside the PCLs.
We compared three different region of interest (ROI) designations and used VGG19 as the classifier: (1) CNN1: manually designated ROIs, (2) CNN2: maximal rectangular ROIs, and (3) CNN3: U-Net algorithm-designated ROIs.
In this study, we aimed to develop CAD classification system to differentiate subtypes of PCLs on nCLE and to investigate three different ways of designating ROI. Our work will contribute to solving the daily dilemma for endoscopists in classifying subtypes of PCLs on nCLE, which is the very first step in detecting mucinous PCLs of high malignant potential.
2. Materials and Methods
We retrospectively collected 68 de-identified nCLE videos (IPMN: 31, MCN: 10, SCN: 18, NET: 8, and pseudocyst: 12) on PCLs with histologically and/or clinically confirmed diagnosis from King Chulalongkorn Memorial Hospital, Bangkok, Thailand. All the nCLE procedures were performed by an experienced endoscopist (P.K.) with an AQ-Flex nCLE mini-probe (Cellvizio, Mauna Kea Technologies, Paris, France) after intravenous fluorescein. We also collected IPMN, MCN, and pseudocyst videos from publicly available sources [10
]. The research protocol was approved by the Institutional Review Board of the Faculty of Medicine, Chulalongkorn University (960/64, Dec 2021; 0127/66, Feb 2023), and the study was conducted in accordance with the Declaration of Helsinki.
For training and validation set images, we randomly selected nCLE videos within each subtype for a total of 50 videos for training and validation sets. Among the total of 21,937 images, we used a 70–30% ratio to randomly divide into training sets and validation sets per subtype of PCLs. The training set consisted of the following: IPMN (26 videos, 3122 images), MCN (5 videos, 1249 images), SCN (9 videos, 3239 images), NET (4 videos, 4220 images), and pseudocyst (6 videos, 3526 images), respectively. The validation set consisted of 6581 images.
The remaining 18 nCLE videos were used for the test set, including five IPMN videos (collectively 2537 images), three MCN videos (1557 images), four SCN videos (3693 images), four NET videos (2482 images), and two pseudocyst videos (778 images).
2.2. CAD System Overview
The proposed method was divided into the training and test stages shown in Figure 1
and Figure 2
, respectively. In the training stage (Figure 1
), we firstly performed image preprocessing (Gaussian pyramid application and local ternary pattern feature extraction) [20
] (Supplementary Materials
) and data augmentation. For regions of interest (ROI), we attempted three methods: (1) manual designation of ROIs, (2) maximal-sized rectangular ROIs, and (3) automatic designation of ROIs by another deep learning algorithm, U-Net. Finally, we utilized the deep learning algorithm, VGG19, as our classifier of subtypes.
In the test stage (Figure 2
), we applied the contrast-limited adaptive histogram equalization (CLAHE) preprocessing on the test video frames to enhance image contrast [21
]. The trained VGG19 algorithm was used to classify the PCL subtype frame by frame of the test set videos. The final classification of PCL subtype for the whole test video was then determined by the most frequent subtype.
2.3. CNN Architecture of VGG19
In this study, we utilized the VGG19 deep learning network of 19 layers to classify PCLs. VGG19 was proposed by the Visual Geometry Group (VGG) at Oxford University [22
]. The VGG19 network architecture consists of 19 convolution weight layers with 3 × 3 kernel sizes, 5 Maxpool layers with 2 × 2 pool sizes, and a final output layer with the Softmax activation function. The reason for using Softmax is its ability to execute multiclass-class classification. In the training stage, we used the Adam optimizer with 100 epochs, and input images were fixed at sizes of 224 × 224 pixels. The sensitivity analysis of para-meters in the algorithms was described in the Supplementary Materials
2.4. Designation of ROIs: (1) Manually Designated ROI, (2) Maximal Rectangular ROI, and (3) U-Net Algorithm-Designated ROI
Manual designation of ROIs in CNN1 was performed by selection of the most prominent image features in each frame (as shown in Table 1
), while the maximal size of rectangle in each frame was designated as the ROI in CNN2 (as shown in Figure 3
In CNN3, we trained another deep learning algorithm, U-Net, to automatically segment the ROI in each frame (as shown in Table 2
While VGG is designed for classification tasks (output: categories) but not for segmentation tasks (output: ROI within a given image), U-Net has been utilized in the tasks of segmentation in medical images with good performance [23
]. We created a dataset with all PCLs and the corresponding label (ground truth) that stated the difference between image features and the image background. The U-Net was then trained using 1200 training images and their PCLs labels. Figure 4
illustrates the U-Net model training process. The learning rate was 0.001, and the learning rate optimizer was adaptive moment estimation (Adam). For the loss function, we used binary cross-entropy. The training batch size was set at 8 and the epoch 100.
illustrates the ROI designation process for a given test image. First, the output of the trained U-Net was an irregular boundary. Then, the minimally-sized square (shown in blue) covering the entire region automatically formed the boundaries of the ROI. Thus, the ROI could be automatically designated.
2.5. Data Augmentation
We performed data augmentation to expand the numbers of images for training (Figure 6
). Because the original nCLE video images were circular, we rotated the original images every 30 degrees to obtain 12 different angles of images.
2.6. Hardware and Software Specifications
The system was operating in a PC with Windows 10 Professional Edition (64-bit), with an Intel CPU Core i7-9700 at 3.2 GHz, 64 GB-DDR4 memory, and NVIDIA GeForce RTX2080Ti graphics card. The software platform was Visual Studio 2017 with OpenCV library, and the programming language was Python 3.6.
Descriptive data are reported as proportions for categorical variables and means +/− SD for continuous variables. For statistical analysis, the chi-square test was performed for categorical variables and t-test for continuous variables. Statistical significance was defined as p < 0.05. The classification performances of deep learning algorithms with three different ROI selections were determined against the ground truth (histologically and/or clinically confirmed diagnosis). For frame-by-frame basis, the sensitivity, specificity, positive predictive value, negative predictive value, and accuracy were calculated per frame and are shown as percentages with 95% confidence interval (CI). All analyses were performed using SAS software version 9.4 (Cary, NC, USA).
After preprocessing with Gaussian pyramid, LTP extraction, and CLAHE, we utilized the CNN VGG19 as the classifier for five subtypes of PCLs in the 18 test nCLE videos.
In the training stage, we spent six hours training all of images using the VGG19 network: about 216 s per epoch. We also calculated the computation time in processing test nCLE videos. The average time for processing one frame is 0.05 s/frame, as shown in Table 3
We evaluated the performance of our three CAD systems of CNN1 (manually designated ROIs), CNN2 (maximal rectangular ROIs), and CNN3 (U-Net algorithm-designated ROIs) on the per-video and per-frame bases.
3.1. Performance of Three CNNs on the Per-Video Basis
and Table 2
, Figure 3
, show the exemplary views of designated ROIs in CNN1, CNN2, and CNN3, respectively. Table 4
, Table 5
and Table 6
reveal the results of subtypes on a per-video basis in CNN1, CNN2, and CNN3, respectively. Manually designated ROIs in CNN1 achieved the highest accuracy rate of 100% (18/18 videos), followed by U-Net algorithm-designated ROIs in CNN3 of 66.7% (12/18 videos). In contrast, maximal rectangular ROIs in CNN2 had the lowest accuracy rate of 38.9% (7/18 videos).
3.2. Performance of Three CNNs on the Per-Fame Basis
summarizes the performance of three CNNs on the per-frame basis in relation to five subtypes of PCLs. CNN1 achieved the highest average accuracy of 88.99%, followed by CNN3 at 76.12% and CNN2 at 73.94%.
Among the five subtypes, pseudocyst had the highest sensitivity (94.22%, 95% CI 92.34–95.75%), specificity (98.29%, 95% CI 95.75–98.02%), and accuracy (98%, 95% CI 97.72–98.25%) in CNN1.
MCN, among the five subtypes, had the lowest sensitivity (45.22% in CNN1, 7.11% in CNN2, and 8.54% in CNN3) but consistently high specificity (92.81% in CNN1, 99.4% in CNN2, and 95.97% in CNN3).
IPMN, as the main research focus of nCLE in published studies, exhibited consistently high specificity (94.31%, 96.35%, and 87.39%) and good accuracy (83.43%, 80.68%, and 76.62%), but relatively low sensitivity (45.95%, 28.14%, and 40.48%) in CNN1-3, respectively.
Our results also demonstrated the volatile performance of classifying SCN and NET across three different methods of ROI designations, which suggests the importance of ROI designation in diagnosing SCN and NET by nCLE.
To the best of our knowledge, our study is the first report utilizing deep learning CAD systems to classify the subtypes of PCLs on EUS-guided nCLE video frames. Our results demonstrate the feasibility of applying novel CNN technologies to classify common subtypes of PCLs on a per-video and per-frame basis.
Our exploration of three different methods of designating ROIs on a rapidly changing frame-by-frame nCLE video originated from practical clinical considerations. Although manual selection of ROIs might achieve high accuracy owing to selection bias, our results showed a promising automatic ROI designation by another deep learning algorithm, namely U-Net. In our results, CNN3 delivered a 66.7% accuracy rate per video and 76.12% rate per frame. Such automatic ROI designation could further assist the subtype classification of nCLE on PCLs, especially for non-expert endoscopists.
There have been attempts to adopt CNN in nCLE images for PCLs (Table 8
). Kuwahara et al. utilized a CNN algorithm of ResNet50 to analyze 3,970 still images of linear EUS on pathology-confirmed IPMNs. The mean AI value (output value of the TensorFlow algorithm) was shown to be higher in malignant IPMNs than that in non-malignant IPMNs (0.808 vs 0.104, p
< 0.001), which provided a tool for risk stratification of IPMNs [17
]. Machicado et al. utilized 15,027 image frames from 35 IPMN nCLE videos and applied the CNN algorithm of VGG16 [18
]. They developed two CAD models: a guided (epithelial thickness and darkness in papillary structures) segmentation-based model (SBM) targeting papillary epithelial thickness and darkness and a holistic-based model (HBM), in which the model automatically extracted nCLE features for IPMN malignant risk stratification. The study showed promising results when compared with clinical diagnosis guidelines: diagnostic accuracy SBM: 82.9%; HBM: 85.7%; and guidelines 68.6% and 74.3%, respectively. Of special note, Machicado et. al addressed the issue of automatic ROI designation in attempts using a mask region-based CNN in their segmentation-based model and another VGG-16 network in their holistic-based model. In this regard, we tried an additional CNN, i.e., U-Net, to automatically segment the ROIs in CNN3, which performed less well than CNN1 (manually designated ROIs).
Kurita et al. utilized basic deep learning neural network on the analysis of pancreatic cystic fluid in the differential diagnosis of mucinous PCLs [29
]. Meanwhile, Liang et al. utilized support vector machine on the classification of mucinous versus non-mucinous PCLs. Both studies only focused on the differentiation of mucinous PCLs [30
Our study differed from the aforementioned studies in that we tried to solve the initial step of subtyping of nCLE on PCLs for practicing EUS endoscopists, for whom training to master skills has been a challenge [31
]. CAD might shorten the learning curve and have the potential for real-time assistance in the interpretation of nCLE on PCLs.
The other difference was the depth of weight layers of VGG. Machicado et al. utilized VGG16, while we adopted VGG19, both of which had similar frameworks except the numbers of weight layers in the architecture: 16 layers in the former and 19 in the latter. Simonyan and Zisserman from the Visual Geometry Group (VGG) of University of Oxford pioneered the VGG algorithm and demonstrated the improved performance of CNN by increasing the weight layers up to 19 layers [22
]. The VGG16 processed 134–138 million parameters, and the VGG19 processed 144 million parameters. In their original report, VGG19 had better performance in single test scale than VGG16, but both performed similarly well in multi-test scales.
EUS-guided nCLE evaluation for PCLs provided the unique value of real-time optical pathology and demonstrates clinical utility (sensitivity 87% and specificity 91% for mucinous PCls; 81% and 98% for malignant PCLs) in a recent systematic review investigating 40 studies and 3,641 patients [32
]. With a pooled success rate of 88% and adverse event rate of 3%, nCLE will have an increasingly important role in the armamentaria for endoscopists caring for patients with PCLs. Such CAD systems in this study will hopefully provide unique value in this regard.
Our per-frame analysis on the performance of three CNNs among the five subtypes of PCLs revealed different profiles of sensitivity, specificity, and accuracy within the same CNN and across three different methods of designation of ROIs (CNN1-3). Although distinct nCLE features for five subtypes of PCLs have been characterized by experts [10
], our results suggested that classification performances, even by objective CNN algorithms, were not uniform among five subtypes. In the same topic, Machicado et al. investigated the diagnostic accuracy of 76 nCLE videos by 13 endosonographers (6 experts having > 50 nCLE cases experience) [11
]. Table 9
summarizes the comparison between this study and the work by Machicado et al., which implied that non-mucinous PCLs were associated with higher accuracy rates than mucinous PCLs. Our CNN demonstrated superior specificity compared to the state-of-the-art for the classification of mucinous PCLs (IPMN and MCN), with high specificity (94.3% and 92.8%, respectively) but low sensitivity (46% and 45.2%, respectively), which suggests a complimentary role of CNN-enabled CAD systems, especially for clinically suspected mucinous PCLs.
There are several limitations to this study. Firstly, the diagnosis performance still has large room for improvement. Due to the inherent heterogeneity of image contents frame by frame in the fast-moving nCLE videos and the potential selection bias of locating ROIs, mis-classification resulted in suboptimal performance of CNNs, especially in CNN2. Secondly, in per-video analysis, we arbitrarily denoted the highest percentage of subtype as the final subtype of the entire nCLE video. Yet, certain image features in some frames might be more pathognomonic and deserve heavier weight than other frames. Moreover, in real-time practice, we might need to develop an “accumulated frequency score” to denote the final subtyping of the video. Thirdly, our nCLE videos were obtained retrospectively from a single center with limited numbers. Fourthly, we did not include solid pseudopapillary neoplasms, nor did we separate main-ductal type from side-branch type IPMNs. Fifthly, our CNNs might not be applicable to diseases other than PCLs. Lastly and clinically most relevant, the current CAD system was developed and processed off-line on retrospectively collected nCLE video frames, and we have not yet tested prospectively in real-time nCLE examinations. Real-time assistance is urgently needed to facilitate the clinical use of nCLE on PCLs. The speedy computation time of CAD on each frame (0.03–0.07 s) in Table 3
suggests great potential in real-time applications. Future clinical studies incorporating external validation of our CNN algorithms with adequate representation of all five subtypes of PCLs and prospective comparison studies between the CAD system and novice and expert endoscopists are warranted to confirm the performance and clinical utility.
5. Conclusions and Future Work
Incidentally detected PCLs are increasing in the asymptomatic general population. The accurate detection of PCLs with malignant potential is a clinical dilemma. We utilized deep learning neural network VGG19 for CAD classification of subtypes of PCLs on EUS-guided nCLE video frames. Our work uniquely compared three different methods of designating ROIs by manual designation, maximal rectangular ROI, and U-Net algorithm designation and validated the use of automatic ROI designation in future CAD systems in nCLE. Our per-frame analysis suggested differential levels of diagnostic accuracy among the five subtypes of PCLs, where non-mucinous PCLs (SCN: 93.11%, NET: 84.31%, and pseudocyst: 98%) had higher diagnostic accuracy than mucinous PCLs (IPMN: 84.43% and MCN: 86.1%). Our CNN demonstrated superior specificity compared to the state-of-the-art for the classification of mucinous PCLs (IPMN and MCN), with high specificity (94.3%, and 92.8%, respectively) but low sensitivity (46% and 45.2%, respectively). Our results will contribute to improve the daily practice of differential diagnosis of PCLs with nCLE. Furthermore, our data revealed a high specificity of nCLE on mucinous PCLs, which carry high clinical importance because mucinous PCLs have higher malignant potential, and early treatment of mucinous PCLs will improve clinical outcomes.
In the future work, we believe machine learning methodologies could solve more complicated problems, such as dissecting the cellular and tissue structures to improve the initial differential diagnosis, monitoring of cellular morphological alterations during surveillance, and exploration of molecular target-labeled fluorescent CLE as molecular imaging of PCLs. In the near future, an “integrative computational model” [13
] combining relevant clinical information, nCLE images, radiomics, and next-generation sequencing of pancreatic cystic fluids may collectively provide clinicians with sophisticated diagnosis of PCLs.