The Promise of Semantic Segmentation in Detecting Actinic Keratosis Using Clinical Photography in the Wild

Derekas, Panagiotis; Spyridonos, Panagiota; Likas, Aristidis; Zampeta, Athanasia; Gaitanis, Georgios; Bassukas, Ioannis

doi:10.3390/cancers15194861

Open AccessArticle

The Promise of Semantic Segmentation in Detecting Actinic Keratosis Using Clinical Photography in the Wild

by

Panagiotis Derekas

¹,

Panagiota Spyridonos

^2,*

,

Aristidis Likas

¹,

Athanasia Zampeta

³,

Georgios Gaitanis

³

and

Ioannis Bassukas

³

¹

Department of Computer Science & Engineering, School of Engineering, University of Ioannina, 45110 Ioannina, Greece

²

Department of Medical Physics, Faculty of Medicine, School of Health Sciences, University of Ioannina, 45110 Ioannina, Greece

³

Department of Skin and Venereal Diseases, Faculty of Medicine, School of Health Sciences, University of Ioannina, 45110 Ioannina, Greece

^*

Author to whom correspondence should be addressed.

Cancers 2023, 15(19), 4861; https://doi.org/10.3390/cancers15194861

Submission received: 18 August 2023 / Revised: 1 October 2023 / Accepted: 2 October 2023 / Published: 5 October 2023

(This article belongs to the Special Issue Skin Cancers as a Paradigm Shift: From Pathobiology to Treatment)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Simple Summary

Understanding the relationship between the skin cancerization field and actinic keratosis (AK) is crucial for identifying high-risk individuals, implementing early interventions, and preventing the progression to more aggressive forms of skin cancer. Currently, the clinical tools for grading field cancerization primarily involve assessing AK burden. In addition to their inherent subjectivity, these grading systems are limited by the high degree of AK lesions’ recurrence. The present study proposes a method based on deep learning and semantic segmentation to improve the monitoring of the AK burden in clinical settings with enhanced automation and precision. The experimental results highlight the effectiveness of the proposed method, paving the way for more effective and reliable evaluation, continuous monitoring of condition progression, and assessment of treatment responses.

Abstract

AK is a common precancerous skin condition that requires effective detection and treatment monitoring. To improve the monitoring of the AK burden in clinical settings with enhanced automation and precision, the present study evaluates the application of semantic segmentation based on the U-Net architecture (i.e., AKU-Net). AKU-Net employs transfer learning to compensate for the relatively small dataset of annotated images and integrates a recurrent process based on convLSTM to exploit contextual information and address the challenges related to the low contrast and ambiguous boundaries of AK-affected skin regions. We used an annotated dataset of 569 clinical photographs from 115 patients with actinic keratosis to train and evaluate the model. From each photograph, patches of 512 × 512 pixels were extracted using translation lesion boxes that encompassed lesions in different positions and captured different contexts of perilesional skin. In total, 16,488 translation-augmented crops were used for training the model, and 403 lesion center crops were used for testing. To demonstrate the improvements in AK detection, AKU-Net was compared with plain U-Net and U-Net++ architectures. The experimental results highlighted the effectiveness of AKU-Net, improving upon both automation and precision over existing approaches, paving the way for more effective and reliable evaluation of actinic keratosis in clinical settings.

Keywords:

deep learning; semantic segmentation; U-Net; actinic keratosis; cutaneous cancerization field; skin lesions; clinical photography

1. Introduction

A cutaneous or skin cancerization field (SCF) refers to the medical condition wherein the chronic ultraviolet radiation (UVR)-damaged skin area around incident tumors or precancerous lesions harbors clonally expanding cell subpopulations with distinct pro-cancerous genetic alterations [1]. As a result, this skin area becomes more susceptible to the development of new similar malignant lesions or the recurrence of existing ones [2]. The identification of an SCF mainly relies on recognizing actinic keratosis (AK) [3]. AK, also known as keratosis solaris, is a common skin condition caused by long-term sun exposure. These pre-cancerous lesions not only serve as visible markers of chronic solar skin damage but, if left untreated, each of them can possibly progress into a potentially fatal cutaneous squamous cell carcinoma (CSCC) [4,5]. Since there are no established prognostic factors to predict which individual AK will progress into CSCC, early recognition, treatment, and follow-up are vital and endorsed by international guidelines to minimize the risk of invasive CSCC [5,6,7]. On the other hand, AKs, more than being the hallmark lesions of UVR-damaged skin at risk to develop skin cancers, because of their visual similarity to invasive neoplasia, also constitute a per se important macromorphological differential diagnostic challenge in a skin cancer screening setting. In the everyday clinical setting, dermatologists employ a spectrum of established non-invasive diagnostic techniques to increase the sensitivity and specificity of the clinical diagnostic workout and to improve the discrimination between AK and CSCC [3,8]. Besides the widely available and routinely applied dermoscopy to discriminate between invasive CSSC and AK [9], recent findings show that the combination of minimally invasive histologic markers, like the basal layer proliferation score, [10] with non-invasive imaging modalities, like LC-OCT, can be used to better stratify AKs according to their risk to progress to CSCC [11].

Understanding the relationship between field cancerization and AK is crucial for identifying high-risk individuals, implementing early interventions, and preventing the progression to more aggressive forms of skin cancer [12]. Currently, the clinical tools for grading SCF primarily involve assessing AK burden using systems like AKASI [13] and AK-FAS [14]. Notably, the latter approach is based on the evaluation of standardized clinical photographs of preselected, target skin areas to reduce the considerable inter-observer subjectivity of the clinical examination. However, it is still based on subjective evaluation by trained physicians who localize, assess, and count “visible” AK lesions on the available photographs [14]. Particularly, the high inter-observer variability in counting lesions seriously limits the reliability of the above approaches [15,16]. In addition to their inherent subjectivity, these grading systems are limited by their high degree of spatial plasticity, including the trend to recurrence of individual AK lesions. They are actually snapshots of the AK burden of a certain skin site, unable to track whether the recorded lesions are relapses of incident AK at the same site or newly emerging lesions arising de novo in nearby areas [17].

Clinical photography is a low-cost, portable solution that efficiently stores “scanned” visual information of an entire skin region. Considering the pathobiology of AK, Criscione et al. applied an image analysis-based methodology to analyze clinical images of chronically UV-damaged skin areas in a milestone study [18]. With this approach, they could estimate the risk of progression of AKs to keratinocyte skin carcinomas (KSCs) and assess the natural history of AK evolution dynamics for approximately six years. Since then, the evolving efficiency of deep learning algorithms in image recognition and the availability of extensive image archives have greatly accelerated the development of advanced computer-aided systems for skin disease diagnosis [19,20,21,22]. Particularly for AK, given the multifaced role of AK recognition in cutaneous oncology, the assignment of a distinct diagnostic label to this condition was also a core task in many studies that applied machine learning as a tool to differentiate skin diseases based on clinical images of isolated skin lesions [23,24,25,26,27,28,29,30,31].

However, despite significant progress in image analysis of cropped skin lesions in recent years, the determination of the AK burden in larger, sun-affected skin areas still remains a quite challenging diagnostic task. AKs present as pink, red, or brownish patches, papules, or flat plaques or may even have the same color as the surrounding skin. They vary in size from a few millimeters to 1–2 cm and can either be isolated or more typically, numerous, sometimes even widely confluent in some patients. Moreover, the surrounding skin may show signs of chronic sun damage, including telangiectasias, dyschromia, and elastosis [6] and also seborrheic keratoses and related lesions, notably lentigo solaris. These latter lesions are the most frequently encountered benign growths within an SCF and underlie most of the confusion related to the clinical differential diagnosis of AK lesions [32]. All these visual features add substantial complexity to the task of automated AK burden determination.

Earlier approaches to automated AK recognition from clinical images were limited either by treating AK as a skin erythema detection problem [33] or by restricting the detection to preselected smaller sub-regions of wider photographed skin areas, using a binary patch classifier for AK discrimination from healthy skin [34]. Moreover, these approaches are prone to a large risk of false-positive detections since erythema is present in various unrelated skin conditions, and the contamination of the images by diverse concurrent and confluent benign growths [35,36] seriously affects the accuracy of binary classification schemes to estimate AK burden in extended skin areas.

To tackle the demanding task of AK burden detection in UVR-chronically exposed skin areas, more recent work has proposed a superpixel-based, convolutional neural network (CNN) for AK detection (i.e., AKCNN) [37]. The core engine of AKCNN is a patch classifier, a lightweight CNN, adequately trained to distinguish AK from healthy skin but also from seborrheic keratosis and solar lentigo. However, a main limitation of AKCNN that persists is the manual preselection of the area to scan, excluding image parts with “disturbing” visual features, like the hairs, nose, lips, eyes and ears, which are found to result in false-positive detections.

To improve the monitoring of AK burden in clinical settings with enhanced automation and precision, the present study evaluates the application of semantic segmentation using the U-Net architecture [38], with transfer learning to compensate for the relatively small dataset of annotated images, and a recurrent process to efficiently exploit contextual information and address the specific challenges related to the low contrast and ambiguous boundaries of AK-affected skin regions.

Although there has been considerable research on deep learning-based skin lesion segmentation, the primary focus has been delineating melanoma lesions in dermoscopic images [39,40]. The present study is a further development of our previous research on AK detection and, to the best of our knowledge, is the first study to employ semantic segmentation to clinical images for AK detection and SCF evaluation. Our main aim herewith is to contribute to the development of reliable instruments to assess AK, and, consequently, SCF burden, to be primarily used in the evaluation of therapeutic interventions.

2. Materials and Methods

2.1. Methods

2.1.1. Overview of Semantic Segmentation and U-Net Architecture

Today, with the advent of deep learning and convolutional neural networks, semantic segmentation goes beyond traditional segmentation by associating a meaningful label to each pixel in an image [41]. Semantic segmentation plays a crucial role in medical imaging, and it has numerous applications, including image analysis and computer-assisted diagnosis, monitoring the evolution of conditions over time, and assessing treatment responses.

A groundbreaking application in the field of semantic segmentation is the U-Net architecture, which emerged as a specialized variant of the fully convolutional network architecture. This network architecture excels in capturing fine-grained details from images and demonstrates effectiveness even with limited training data, making it well suited for the challenges posed by the field of biomedical research [38]. The U-Net architecture is characterized by its “U” shape, with symmetric encoder and decoder sections (Figure 1).

The encoder section consists of a series of convolutional layers with 3 × 3 filters, followed by rectified linear unit (ReLU) activations. After each convolutional layer, a 2 × 2 max pooling operation is applied, reducing its spatial dimensions while increasing its number of feature channels. As the encoder progresses, the spatial resolution decreases while the number of feature channels increases. This allows the model to capture increasingly abstract and high-level features from the input image.

The decoder section of the U-Net starts with an upsampling operation to increase the spatial dimensions of the feature maps. The upsampling is typically performed using transposed convolutions. At each decoder step, skip connections are introduced. These connections directly connect the feature maps from the corresponding encoder step to the decoder, preserving fine-grained details and spatial information. The skip connections are achieved by concatenating the feature maps from the encoder with the upsampled feature maps in the decoder.

In the following, the concatenation, the combined feature maps go through convolutional layers with 3 × 3 filters and ReLU activations to refine the segmentation predictions. Each decoder step typically reduces the number of feature channels to match the desired output shape.

Finally, a 1 × 1 convolutional layer is applied to produce the final segmentation map, with each pixel representing the predicted class label.

Since its introduction, U-Net has served as a foundation for numerous advancements in medical image analysis and has inspired the development of various modified architectures and variants [42,43].

Transfer learning has significantly contributed to the success of U-Net in various medical image segmentation tasks [44,45,46,47]. Initializing the encoder part of U-Net with weights learned from a pretrained model enables U-Net to start with a strong foundation of learned representations, accelerating convergence, and improving segmentation performance, especially when the availability of annotated medical data is limited.

In this study, we used the VGG16 model [48], pretrained on ImageNet [49], as the backbone for the encoder in the U-Net architecture. Both U-Net and VGG16 are based on the concepts of deep convolutional neural networks, which are designed to encode hierarchical representations from images enabling them to capture increasingly complex features. Both architectures utilize convolutional layers with 3 × 3 filters followed by rectified linear unit activations and a 2 × 2 max pooling operation for downsampling as their primary building blocks.

Figure 2 illustrates our transfer learning scheme, where four convolution blocks of pretrained VGG16 were used for the U-Net encoder.

2.1.2. Batch Normalization

Batch normalization (BN), proposed by Sergey Ioffe and Christian Szegedy [50], is a technique commonly used in neural networks to normalize the activations of a layer by adjusting and scaling them. It helps to stabilize and speed up the training process by reducing the internal covariate shift, which refers to the change in the distribution of the layer’s inputs during training. The BN operation is typically performed by computing the mean and variance of the activations within a mini-batch during training. These statistics are then used to normalize the activations by subtracting the mean and dividing by the square root of the variance. Additionally, batch normalization introduces learnable parameters, known as scale and shift parameters, which allow the network to adaptively scale and shift the normalized activations [50].

Since each mini-batch has a different mean and variance, this introduces some random variation or noise to the activations. As a result, the model becomes more robust to specific patterns or instances present in individual mini-batches and learns to generalize better to unseen examples.

In U-Net, the decoding layers are responsible for reconstructing the output image or segmentation map from the upsampled feature maps. By applying BN to the decoding layers in our U-Net, the regularization effect is targeted to the reconstruction process, improving the model’s generalization ability and reducing the risk of overfitting in the output reconstruction.

2.1.3. ConvLSTM: Spatial Recurrent Module in U-Net Architecture

The recurrent mechanisms in neural network architecture (i.e., RNN) have been adapted to work with sequential data, with decisions based on current and previous inputs. There are different kinds of recurrent units based on how the current and last inputs are combined, such as gated recurrent units and long short-term memory (LSTM) [51]. The LSTM module was designed by introducing three gating mechanisms that control the flow of information through the network: the input gate, the forget gate, and the output gate. These gates allow the LSTM network to selectively remember or forget information from the input sequence, which makes it more effective for long-term dependencies. For text, speech, and signal processing, plain RNNs are directly used, while for use in 2D or 3D data (e.g., images), RNNs have been extended correspondingly using convolutional structures. Convolutional LSTM (convLSTM) is the LSTM counterpart for long-term spatiotemporal predictions [52].

In semantic segmentation, recurrent modules have been incorporated in various ways in U-Net architecture. Alom et al. [53] introduced recurrent convolutional blocks at the backbone of the U-Net architecture to enhance the ability of the model to integrate contextual information and improve feature representations for medical image segmentation tasks. In recent work, Arbelle et al. [54] proposed the integration of convLSTM blocks at every scale of the encoder section of the U-Net architecture, enabling multi-scale spatiotemporal feature extraction and facilitating cell instant segmentation in time-lapse microscopy image sequences.

In the U-Net architecture, max pooling is commonly used in the encoding path to downsample the feature maps and capture high-level semantic information. However, max pooling can result in a loss of spatial information and details between neighboring pixels. Several researchers in the field of skin lesion segmentation using dermoscopic images have exploited recurrent layers as a mechanism to refine the skip connection process of U-Net [55,56,57]. In the present study, to address the specific challenges related to the low contrast and ambiguous boundaries of AK-affected skin regions, we employed convLSTM layers to bridge the semantic gap between the feature map extracted from the encoding path

X_{e}^{l}

and the output feature map after upsampling in the decoding path

X_{d}^{l, u p}

. We assume

X_{l} \in R^{F_{l} x h_{l} x w_{l}}

is the concatenation of

X_{e}^{l}

and

X_{d}^{l, u p}

where

F_{l}

is the number of filters and

h_{l}, w_{l}

are the height and width of the feature map.

X_{l}

is split into

n x m

patches

P_{i, j} \in R^{F_{l} x h_{p} x w_{p}}

where

h_{p} = h_{l} / n

(1)

and

w_{p} = w_{l} / m

(2)

Following this, we will refer to

P_{i, j}

as

P_{t}

and replace subscripts i,j with the notion of processing step t. The input

P_{t}

is passed through the convolutional operation to compute three gates that regulate the spatial information flow: the input gate

i_{t}

, the forget gate

f_{t}

, and the output gate

o_{t}

as follows.

i_{t} = σ (W_{x i} * P_{t} + W_{h i} * H_{t - 1} + W_{c i} \cdot c_{t - 1} + b_{i})

(3)

f_{t} = σ (W_{x f} * P_{t} + W_{h f} * H_{t - 1} + W_{c f} \cdot c_{t - 1} + b_{f})

(4)

o_{t} = σ (W_{x o} * P_{t} + W_{h o} * H_{t - 1} + W_{c o} \cdot c_{t} + b_{o})

(5)

ConvLSTM also maintains the cell state (

c_{t}

) and the hidden state (

H_{t}

):

c_{t} = f_{t} \cdot c_{t - 1} + i_{t} \cdot \tanh (W_{x c} * P_{t} + W_{h c} * H_{t - 1} + b_{c})

(6)

H_{t} = o_{t} \cdot t a n h (c_{t})

(7)

where

W_{x *}

and

W_{h *}

correspond to the 2D convolutional kernel of the input and hidden states, respectively. * represents the convolutional operation and bullet

(\cdot)

, the Hadamard function (element-wise multiplication), respectively.

b_{i}, b_{f}, b_{c}

, and

b_{o}

are the bias terms and sigma is the sigmoid function.

Figure 3 depicts the model for AK detection based on U-Net architecture (AKU-Net) that incorporates transfer learning in the encoding path, the recurrent process in skip connections, and the BN in the decoding path.

2.1.4. Loss Function

In this study, we approached AK detection as a binary semantic segmentation task, where each pixel in the input image was classified as either being in the foreground (belonging to the skin areas affected by AK) or background. For this purpose, we utilized the binary cross-entropy loss function, commonly used in U-Net, also known as the log loss or sigmoid loss.

Mathematically, the binary cross-entropy loss for a single pixel can be defined as the following:

L (p, y) = - y * l o g (p) - (1 - y) * l o g (1 - p)

(8)

where

p

is the predicted probability of the pixel belonging to the foreground class, obtained by applying a sigmoid activation function to the model’s output, and

y

is the ground truth label (0 for background, 1 for foreground) indicating the true class of the pixel.

The overall loss for the entire image was then computed as the average of the individual pixel losses. During training, the network aimed to minimize this loss function by adjusting the model’s parameters through backpropagation and gradient descent optimization algorithms.

2.1.5. Evaluation

To demonstrate the expected improvements in AK detection, AKU-Net was compared with plain U-Net and U-Net⁺⁺ [58]. U-Net⁺⁺ comprises an encoder and decoder connected through a series of nested dense convolutional blocks. The main concept behind U-Net⁺⁺ is to narrow the semantic gap between the encoder and decoder feature maps before fusion. The authors of U-Net⁺⁺ reported superior performance compared to the original U-Net in various medical image segmentation tasks, including electron microscopy cells, nuclei, brain tumors, liver, and lung nodules.

To ensure comparable results, the trained networks utilized the VGG16 as the backbone and a BN in the decoding path. The segmentation models were assessed by means of the Dice coefficient and Intersection over Union (IoU).

The Dice coefficient measures the similarity or overlap between the predicted segmentation mask and the ground truth mask. The formula for the Dice coefficient is the following:

D i c e = \frac{(2 * |A \cap B|)}{(|A| + |B|)} = \frac{2 * T P}{2 * T P + F N + F P}

(9)

The

I o U

measures the intersection between the predicted and ground truth masks relative to their union. The formula for the

I o U

is the following:

I o U = \frac{|A \cap B|}{|A \cup B|} = \frac{T P}{T P + F N + F P}

(10)

The notion

| |

represents the total number of pixels.

T P, F N, F P

are the true-positive, false-negative, and false-positive prediction rates at the pixel level, respectively.

To provide comparison results with a recent work on AK detection, we also employed the adapted region-based

F 1

score

(a F 1)

[37]:

a F 1 = \frac{2 * a R e c * a P r e c}{a R e c + a P r e c}

(11)

The

a F 1

score was introduced by the authors to compensate for the fact that AK lesions often lack sharply demarcated borders, and experienced clinicians can provide only rough, approximate AK annotations in clinical images. The adapted estimators of Recall (

a R e c

) and Precision (

a P r e c

) are estimated as follows.

Assuming a ground truth set of N annotated (labeled) areas,

{A K}_{a r e a} = \{{A K}_{1,} {A K}_{2}, \dots {A K}_{N}\}

, and the set of pixels predicted as AK by the system,

A K p r e d

, we define

{T P C}_{i} = \{\begin{matrix} 0 i f {A K}_{i} \cap A K p r e d = \emptyset \\ 1 i f {A K}_{i} \cap A K p r e d \neq \emptyset \end{matrix}

(12)

and the

T r u e P o s i t i v e C o u n t s (T P C)

:

T P C = \sum_{i = 1}^{N} {T P C}_{i}

(13)

The adapted estimators

a R e c

and

a P r e c

are given as the following:

a R e c = \frac{t r u e p o s i t i v e c o u n t s}{a c t u a l p o s i t i v e c o u n t s} = \frac{T P C}{N}

(14)

a P r e c = \frac{T r u e p o s i t i v e a r e a}{T o t a l p r e d i c t e d p o s i t i v e a r e a} = \frac{|{A K}_{a r e a} \cap A K p r e d|}{|{A K C N N}_{a r e a}|}

(15)

All coefficients, the

D i c e,

the

I o U,

and the

a F 1,

range from 0 to 1. Higher values of these metrics indicate greater similarity between the predicted and ground truth masks and, consequently, better performance.

Figure 4 provides a qualitative example of the error tolerance in estimating

a F 1

, favoring compensation for rough annotations.

2.1.6. Implementation Details

The segmentation models were implemented in Python 3, adopting an open-source framework in Keras API with TensorFlow as the backend. The experimental environment was based on a Windows 10 workstation configured with an AMD 9 series 3900X CPU@3.80 GHz processor, 64 GB of 3200 MHz DDR4 ECC RDIMM, NVIDIA RTX 3070 GPU memory of 8 GB. The Adam optimizer

(l e a r n i n g r a t e = 0.001, w e i g h t d e c a y = 10^{- 6})

was employed to train the networks. Considering the constraints of our computational environment, we set the batch size equal to 32 and the number of iterations to 100.

2.2. Materials

The use of archival photographic materials for this study was approved by the Human Investigation Committee (IRB) of the University Hospital of Ioannina (Approval No.: 3/17-2-2015(θ.17)). The study included a total of 115 patients (60 males, 55 females; age range: 45-85 years) with facial AK who attended the specialized Dermato-oncology Clinic of the Dermatology Department.

Facial photographs were acquired with the camera axis perpendicular to the photographed regions. The distance was adjusted to include the whole face from the chin to the top of the hair/scalp border.

Digital photographs of a 4016 × 6016 pixel spatial resolution were acquired according to a procedure adapted from Muccini et al. [43], using a Nikon D610 (Nikon, Tokyo, Japan) camera with a Nikon NIKKOR^© 60 mm 1:2.8G ED micro lens mounted on it. The camera controls were set at f18, with a shutter speed of 1/80 s, ISO 400, autofocus, and white balance at auto-adjustment mode. A Sigma ring flash (Sigma, Fukushima, Japan) at TTL mode was mounted to the camera. In front of the lens and the flashlight, linear polarized filters were appropriately adapted to ensure a 90° rotation of the ring flash polarization axis to the relevance of the lens-mounted polarizing filter. For the AK annotation, two physicians (GG and AZ) jointly discussed and reached an agreement on the affected skin regions. Notably, cross-polarized photography was employed that enhanced the visibility of the vascular plexus of the skin (redness) and removed unwanted glare from the epidermis, allowing for the detailed evaluation of the assessed area at larger magnifications, as required. For the purposes of the present study, figures were eligible only from patients without a history of CSSC and from areas without ambiguous lesions, including pigmented lesions, clinically susceptive to cutaneous malignancy. Assessments of the clinical grades of individual lesions are not reported, as they are out of the scope of the present study. However, we incorporated lesions independent of clinical AK severity grading [59] such that the lesions could be suspected on the selected clinical photographs, as well as lesions with variable degrees of pigmentation. This also applied to AKs in extra-facial anatomical regions.

Multiple photographs were taken per patient to capture the presence of lesions across the entire face. Additionally, these photographs were intended to provide different views of the same lesions, resulting in 569 annotated clinical photographs (Figure 5).

Patches of 512 × 512 pixels were extracted from each photograph using translation lesion boxes. These boxes encompassed lesions in various positions and captured different contexts of the perilesional skin. In total, 16,891 lesion center and translation-augmented patches were extracted (Figure 6).

Experimental Settings

Since multiple samples (skin patches) from the same patient were used to prevent data leakage in train validation test datasets and ensure unbiased evaluations of the models’ generalization, we performed dataset splitting at the patient level. We used 510 photographs obtained from 98 patients for training, extracting 16,488 translation-augmented image patches. Among them, approximately 20% of the crops (3298 patches) from 5 patients were reserved for validation. An independent set consisting of 17 patients (59 photographs) was used to assess the model’s performance, yielding 403 central lesion patches (Table 1).

It is important to note that since the available images had a high spatial resolution, we had to train the model using rectangular crops. The size of these crops was selected to include sufficient contextual information. To achieve this, the images were first rescaled by 0.5, and crops of a size 512 × 512 were extracted. However, due to computational limitations, the patches were further rescaled by a factor of 0.5 (256 × 256 pixels). Using an internal fiducial marker [13], we estimated that our model was ultimately trained with cropped images at a scale of approximately ~5 pixels/mm.

3. Results

D i c e

and

I o U

coefficients were utilized to evaluate the segmentation accuracy of the model. Table 2 summarizes the comparison results employing the standard U-Net and U-Net⁺⁺ architectures.

The AKU-Net model demonstrated a statistically significant improvement in AK detection compared to U-Net⁺⁺ (p < 0.05; Wilcoxon signed-rank test). It is worth noting that the standard U-Net model had limitations in detecting AK areas, as it could not detect AK in 257 out of the total 403 testing crops, corresponding to approximately 63% of the testing cases. An exemplary qualitative comparison of the goodness of the segmentation of incident AK with the different model architectures is illustrated in Figure 7.

In a recent study, we implemented the AKCNN model for efficient AK detection [37] in manually preselected wide skin areas. To evaluate the performance of the AKU-Net model in broad skin areas and compare it with the AKCNN, we used the same evaluation set as the one employed in the previous study [37].

The AKU-Net model was trained using patches of 256 × 256 pixels. However, to allow the present model to be evaluated and compared with the AKCNN, we performed the following steps:

Zero Padding: Appropriate zero-padding was applied to the input image to make it larger and suitable for subsequent cropping.
Image Cropping: The zero-padded image was divided into crops of 256 × 256 pixels. These crops were used as individual inputs to the AKU-Net model for AK detection.
Aggregating Results: The obtained segmentation results for each 256 × 256-pixel crop were combined to obtain the overall AK detection for the entire broad skin area.

Table 3 compares the performances (accuracy measures) of the AKU-Net and AKCNN model architectures (for n = 10 random frames). At a similar image scale of approximately ~7 pixels/mm, there were no significant differences in the levels of model performance in terms of

a P r e c

(p = 0.6),

a R e c (p = 0.4)

, and

a F 1 (p = 0.6;

Wilcoxon signed-rank test).

Although AKU-Net exhibited AK detection accuracy at the same level as that of AKCNN, the substantial advantage of AKU-Net is that the latter does not require the manual preselection of scanning areas. Explanative examples are given in Figure 8 and Figure 9.

4. Discussion

In this study, we utilized deep learning, specifically in the domain of semantic segmentation, to enhance the detection of actinic keratosis (AK) on clinical photographs. In our previous approach, we introduced a CNN patch classifier (i.e., AKCNN) to assess the burden of AK in large skin areas [51]. Detecting AK in regions with field cancerization presents challenges, and a binary patch (regional) classifier, specifically “AK versus all”, had serious limitations. AKCNN was implemented to effectively distinguish AK from healthy skin and differentiate it from seborrheic keratosis and solar lentigo, thereby reducing false-positive detections. However, AKCNN is subject to manually predefined scanning areas which are necessary to exclude skin regions prone to false diagnosis as AK on clinical images (Figure 8 and Figure 9).

To enhance the monitoring of the AK burden in real clinical settings with improved automation and precision, we utilized a semantic segmentation approach based on an adequately adapted U-Net architecture. Despite using a relatively small dataset of weakly annotated clinical images, the AKU-Net exhibited a remarkably improved performance, particularly in challenging skin areas. The AKU-Net model outperformed the corresponding baseline models, the U-Net and U-Net++, highlighting the efficiency boost achieved through the spatial recurrent layers added in skip connections.

Considering its scanning performance, AKU-Net was evaluated by aggregating its scanning results from 256 × 256-pixel crops and comparing them with those from the AKCNN (Table 3). Both approaches exhibited a similar level of recall (true-positive detection). However, AKU-Net is favorably tolerant of the selection of the scanning area, which can simply be a boxed area that includes the target region, providing false-positive rates at least comparable to those obtained with the AKCNN on manually predefined scanning regions.

It is important to note that due to computational constraints, the AKU-Net was trained on image crops of 256 × 256 pixels, at a scale of about ~5 pixels/mm. For this, the original image was subsampled twice. This subsampling process resulted in a degradation of spatial resolution, which imposed limitations on the system’s ability to detect AK lesions (recall level). This limitation is also supported by evidence from our previous studies: in a similar scale of ~7 pixels/mm, AKCNN experienced a drop in recall [37]. We expect a significantly improved AK detection accuracy by subsampling the original image and training the network with crops of 512 × 512 pixels.

The palpation of the lesional skin to confirm the characteristic “sandpaper” sign is crucial for the diagnosis of barely visible, flat, grade I AKs. However, the fact that the present approach relies on photographic materials that might theoretically lead to underestimation of the AK burden does not represent a serious limitation of this method. It is worth noting that the “sandpaper” sign has been ranked as a less reliable feature of the sun damage of SCF skin areas compared to “visible” features (telangiectasia, atrophy, and pigmentation disorders) in a panel of experts’ study [3]. Moreover, the proposed approach, like AK-FAS too [14], primarily aims to quantify the AK burden in selected skin areas to assist with the evaluation of therapeutic interventions. Accordingly, the burden measurements are based on the evaluation of index lesions in the preselected target area. If required, the latter can be planned to include “hidden” areas, like the retroauricular area or skin regions covered by hair.

Future efforts will explore the optimal trade-off between image scale and the size of the input crop used to train the network, which could lead to superior detection of AK lesions. Moreover, implementing a system with known restrictions, that is, knowing the range of acceptable image scales, is essential for future studies to complement and validate the system’s generalizability utilizing multicenter datasets from various cameras.

In future studies, it is also essential to evaluate the system’s accuracy in relation to the variability among experts when recognizing AK using clinical photographs. This assessment is crucial as it will gauge the system’s performance compared to the varying detections of human experts.

5. Conclusions

Understanding the relationship between the biology of a skin cancerization field and the burden of incident AK is crucial to identifying high-risk individuals, implementing early interventions, and preventing the development of more aggressive forms of skin cancer.

The present study evaluated the application of semantic segmentation based on the U-Net architecture to improve the monitoring of the AK burden in clinical settings with enhanced automation and precision. Deep learning algorithms for semantic image segmentation are continuously evolving. However, the choice between different network architectures depends on the specific requirements of the segmentation task, the available computational resources, and the size and quality of the training dataset. Overall, the results from the present study indicated that the AKU-Net model is an efficient approach for AK detection, paving the way for more effective and reliable evaluation, continuous monitoring of condition progression, and assessment of treatment responses.

Author Contributions

Conceptualization, P.S., A.L. and I.B.; methodology, P.S. and P.D.; software, P.D.; validation, P.S., G.G. and I.B; formal analysis, P.S. and P.D.; investigation, P.D. and P.S., data curation, G.G., A.Z. and I.B.; writing—original draft preparation, P.S.; writing—review and editing, P.D., G.G., A.L. and I.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board of the University Hospital of Ioannina (Approval No.: 3/17-2-2015(θ.17), Approval date: 17 February 2015).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The developed code is available at the following link: https://github.com/PanagiotisDerekas/Skin-lesions/tree/PanDerek/Actinic%20Keratosis (accessed on 1 October 2023).

Conflicts of Interest

The authors declare no conflict of interest.

References

Willenbrink, T.J.; Ruiz, E.S.; Cornejo, C.M.; Schmults, C.D.; Arron, S.T.; Jambusaria-Pahlajani, A. Field cancerization: Definition, epidemiology, risk factors, and outcomes. J. Am. Acad. Dermatol. 2020, 83, 709–717. [Google Scholar] [CrossRef] [PubMed]
Werner, R.N.; Sammain, A.; Erdmann, R.; Hartmann, V.; Stockfleth, E.; Nast, A. The natural history of actinic keratosis: A systematic review. Br. J. Dermatol. 2013, 169, 502–518. [Google Scholar] [CrossRef] [PubMed]
Nart, I.F.; Cerio, R.; Dirschka, T.; Dréno, B.; Lear, J.; Pellacani, G.; Peris, K.; de Casas, A.R.; Progressing Evidence in AK (PEAK) Working Group. Defining the actinic keratosis field: A literature review and discussion. J. Eur. Acad. Dermatol. Venereol. 2018, 32, 544–563. [Google Scholar] [CrossRef] [PubMed]
Gutzmer, R.; Wiegand, S.; Kölbl, O.; Wermker, K.; Heppt, M.; Berking, C. Actinic Keratosis and Cutaneous Squamous Cell Carcinoma. Dtsch. Arztebl. Int. 2019, 116, 616–626. [Google Scholar] [CrossRef] [PubMed]
De Berker, D.; McGregor, J.M.; Mustapa, M.F.M.; Exton, L.S.; Hughes, B.R. British Association of Dermatologists’ guidelines for the care of patients with actinic keratosis 2017. Br. J. Dermatol. 2017, 176, 20–43. [Google Scholar] [CrossRef]
Werner, R.N.; Stockfleth, E.; Connolly, S.; Correia, O.; Erdmann, R.; Foley, P.; Gupta, A.; Jacobs, A.; Kerl, H.; Lim, H.; et al. Evidence- and consensus-based (S3) Guidelines for the Treatment of Actinic Keratosis—International League of Dermatological Societies in cooperation with the European Dermatology Forum—Short version. J. Eur. Acad. Dermatol. Venereol. 2015, 29, 2069–2079. [Google Scholar] [CrossRef]
Eisen, D.B.; Asgari, M.M.; Bennett, D.D.; Connolly, S.M.; Dellavalle, R.P.; Freeman, E.E.; Goldenberg, G.; Leffell, D.J.; Peschin, S.; Sligh, J.E.; et al. Guidelines of care for the management of actinic keratosis. J. Am. Acad. Dermatol. 2021, 85, e209–e233. [Google Scholar] [CrossRef]
Casari, A.; Chester, J.; Pellacani, G. Actinic Keratosis and Non-Invasive Diagnostic Techniques: An Update. Biomedicines 2018, 6, 8. [Google Scholar] [CrossRef]
Peris, K.; Micantonio, T.; Piccolo, D.; Concetta, M. Dermoscopic features of actinic keratosis. JDDG J. Dtsch. Dermatol. Gese. 2007, 5, 970–975. [Google Scholar] [CrossRef]
Schmitz, L.; Gupta, G.; Stücker, M.; Doerler, M.; Gambichler, T.; Welzel, J.; Szeimies, R.; Bierhoff, E.; Stockfleth, E.; Dirschka, T. Evaluation of two histological classifications for actinic keratoses—PRO classification scored highest inter-rater reliability. J. Eur. Acad. Dermatol. Venereol. 2019, 33, 1092–1097. [Google Scholar] [CrossRef]
Daxenberger, F.; Deußing, M.; Eijkenboom, Q.; Gust, C.; Thamm, J.; Hartmann, D.; French, L.E.; Welzel, J.; Schuh, S.; Sattler, E.C. Innovation in Actinic Keratosis Assessment: Artificial Intelligence-Based Approach to LC-OCT PRO Score Evaluation. Cancers 2023, 15, 4457. [Google Scholar] [CrossRef] [PubMed]
Rigel, D.S.; Gold, L.F.S.; Zografos, P. The importance of early diagnosis and treatment of actinic keratosis. J. Am. Acad. Dermatol. 2013, 68 (Suppl. S1), S20–S27. [Google Scholar] [CrossRef] [PubMed]
Dirschka, T.; Pellacani, G.; Micali, G.; Malvehy, J.; Stratigos, A.J.; Casari, A.; Schmitz, L.; Gupta, G.; Athens AK Study Group. A proposed scoring system for assessing the severity of actinic keratosis on the head: Actinic keratosis area and severity index. J. Eur. Acad. Dermatol. Venereol. 2017, 31, 1295–1302. [Google Scholar] [CrossRef]
Dreno, B.; Cerio, R.; Dirschka, T.; Nart, I.F.; Lear, J.T.; Peris, K.; de Casas, A.R.; Kaleci, S.; Pellacani, G. A novel actinic keratosis field assessment scale for grading actinic keratosis disease severity. Acta Derm. Venereol. 2017, 97, 1108–1113. [Google Scholar] [CrossRef]
Schmitz, L.; Broganelli, P.; Boada, A. Classifying Actinic Keratosis: What the Reality of Everyday Clinical Practice Shows Us. J. Drugs Dermatol. 2022, 21, 845–849. [Google Scholar] [CrossRef]
Epstein, E. Quantifying actinic keratosis: Assessing the evidence. Am. J. Clin. Dermatol. 2004, 5, 141–144. [Google Scholar] [CrossRef]
Steeb, T.; Wessely, A.; Petzold, A.; Schmitz, L.; Dirschka, T.; Berking, C.; Heppt, M.V. How to Assess the Efficacy of Interventions for Actinic Keratosis? A Review with a Focus on Long-Term Results. J. Clin. Med. 2021, 10, 4736. [Google Scholar] [CrossRef] [PubMed]
Criscione, V.D.; Weinstock, M.A.; Naylor, M.F.; Luque, C.; Eide, M.J.; Bingham, S.F. Actinic keratoses: Natural history and risk of malignant transformation in the Veterans Affairs Topical Tretinoin Chemoprevention Trial. Cancer 2009, 115, 2523–2530. [Google Scholar] [CrossRef]
Adegun, A.; Viriri, S. Deep learning techniques for skin lesion analysis and melanoma cancer detection: A survey of state-of-the-art. Artif. Intell. Rev. 2020, 54, 811–841. [Google Scholar] [CrossRef]
Jeong, H.K.; Park, C.; Henao, R.; Kheterpal, M. Deep Learning in Dermatology: A Systematic Review of Current Approaches, Outcomes, and Limitations. JID Innov. 2023, 3, 100150. [Google Scholar] [CrossRef]
Li, L.-F.; Wang, X.; Hu, W.-J.; Xiong, N.N.; Du, Y.-X.; Li, B.-S. Deep Learning in Skin Disease Image Recognition: A Review. IEEE Access 2020, 8, 208264–208280. [Google Scholar] [CrossRef]
Kassem, M.A.; Hosny, K.M.; Damaševičius, R.; Eltoukhy, M.M. Machine Learning and Deep Learning Methods for Skin Lesion Classification and Diagnosis: A Systematic Review. Diagnostics 2021, 11, 1390. [Google Scholar] [CrossRef]
Wang, L.; Chen, A.; Zhang, Y.; Wang, X.; Zhang, Y.; Shen, Q.; Xue, Y. AK-DL: A shallow neural network model for diagnosing actinic keratosis with better performance than deep neural networks. Diagnostics 2020, 10, 217. [Google Scholar] [CrossRef] [PubMed]
Maron, R.C.; Weichenthal, M.; Utikal, J.S.; Hekler, A.; Berking, C.; Hauschild, A.; Enk, A.H.; Haferkamp, S.; Klode, J.; Schadendorf, D.; et al. Systematic outperformance of 112 dermatologists in multiclass skin cancer image classification by convolutional neural networks. Eur. J. Cancer 2019, 119, 57–65. [Google Scholar] [CrossRef]
Tschandl, P.; Rosendahl, C.; Akay, B.N.; Argenziano, G.; Blum, A.; Braun, R.P.; Cabo, H.; Gourhant, J.-Y.; Kreusch, J.; Lallas, A.; et al. Expert-Level Diagnosis of Nonpigmented Skin Cancer by Combined Convolutional Neural Networks. JAMA Dermatol. 2019, 155, 58–65. [Google Scholar] [CrossRef] [PubMed]
Pacheco, A.G.C.; Krohling, R.A. The impact of patient clinical information on automated skin cancer detection. Comput. Biol. Med. 2020, 116, 103545. [Google Scholar] [CrossRef]
Liu, Y.; Jain, A.; Eng, C.; Way, D.H.; Lee, K.; Bui, P.; Kanada, K.; de Oliveira Marinho, G.; Gallegos, J.; Gabriele, S.; et al. A deep learning system for differential diagnosis of skin diseases. Nat. Med. 2020, 26, 900–908. [Google Scholar] [CrossRef] [PubMed]
Karthik, R.; Vaichole, T.S.; Kulkarni, S.K.; Yadav, O.; Khan, F. Eff2Net: An efficient channel attention-based convolutional neural network for skin disease classification. Biomed. Signal Process. Control 2021, 73, 103406. [Google Scholar] [CrossRef]
Han, S.S.; Kim, M.S.; Lim, W.; Park, G.H.; Park, I.; Chang, S.E. Classification of the Clinical Images for Benign and Malignant Cutaneous Tumors Using a Deep Learning Algorithm. J. Investig. Dermatol. 2018, 138, 1529–1538. [Google Scholar] [CrossRef]
Fujisawa, Y.; Otomo, Y.; Ogata, Y.; Nakamura, Y.; Fujita, R.; Ishitsuka, Y.; Watanabe, R.; Okiyama, N.; Ohara, K.; Fujimoto, M. Deep-learning-based, computer-aided classifier developed with a small dataset of clinical images surpasses board-certified dermatologists in skin tumour diagnosis. Br. J. Dermatol. 2019, 180, 373–381. [Google Scholar] [CrossRef]
Han, S.S.; Moon, I.J.; Lim, W.; Suh, I.S.; Lee, S.Y.; Na, J.-I.; Kim, S.H.; Chang, S.E. Keratinocytic Skin Cancer Detection on the Face Using Region-Based Convolutional Neural Network. JAMA Dermatol. 2020, 156, 29–37. [Google Scholar] [CrossRef] [PubMed]
Kato, S.; Lippman, S.M.; Flaherty, K.T.; Kurzrock, R. The conundrum of genetic ‘Drivers’ in benign conditions. J. Natl. Cancer Inst. 2016, 108, djw036. [Google Scholar] [CrossRef] [PubMed]
Hames, S.C.; Sinnya, S.; Tan, J.-M.; Morze, C.; Sahebian, A.; Soyer, H.P.; Prow, T.W. Automated detection of actinic keratoses in clinical photographs. PLoS ONE 2015, 10, e0112447. [Google Scholar] [CrossRef] [PubMed]
Spyridonos, P.; Gaitanis, G.; Likas, A.; Bassukas, I.D. Automatic discrimination of actinic keratoses from clinical photographs. Comput. Biol. Med. 2017, 88, 50–59. [Google Scholar] [CrossRef]
South, A.P.; Purdie, K.J.; Watt, S.A.; Haldenby, S.; Breems, N.Y.D.; Dimon, M.; Arron, S.T.; Kluk, M.J.; Aster, J.C.; McHugh, A.; et al. NOTCH1 mutations occur early during cutaneous squamous cell carcinogenesis. J. Investig. Dermatol. 2014, 134, 2630–2638. [Google Scholar] [CrossRef]
Durinck, S.; Ho, C.; Wang, N.J.; Liao, W.; Jakkula, L.R.; Collisson, E.A.; Pons, J.; Chan, S.-W.; Lam, E.T.; Chu, C.; et al. Temporal dissection of tumorigenesis in primary cancers. Cancer Discov. 2011, 1, 137–143. [Google Scholar] [CrossRef] [PubMed]
Spyridonos, P.; Gaitanis, G.; Likas, A.; Bassukas, I.D. A convolutional neural network based system for detection of actinic keratosis in clinical images of cutaneous field cancerization. Biomed. Signal Process. Control 2023, 79, 104059. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Proocedings of the 18th International Conference, Munich, Germany, 5–9 October 2015; Part of the Lecture Notes in Computer Science book series; Springer: Berlin/Heidelberg, Germany, 2015; Volume 9351, pp. 234–241. [Google Scholar] [CrossRef]
Mirikharaji, Z.; Abhishek, K.; Bissoto, A.; Barata, C.; Avila, S.; Valle, E.; Celebi, M.E.; Hamarneh, G. A survey on deep learning for skin lesion segmentation. Med. Image Anal. 2023, 88, 102863. [Google Scholar] [CrossRef]
Hasan, M.K.; Ahamad, M.A.; Yap, C.H.; Yang, G. A survey, review, and future trends of skin lesion segmentation and classification. Comput. Biol. Med. 2023, 155, 106624. [Google Scholar] [CrossRef]
Aljabri, M.; AlGhamdi, M. A review on the use of deep learning for medical images segmentation. Neurocomputing 2022, 506, 311–335. [Google Scholar] [CrossRef]
Siddique, N.; Paheding, S.; Elkin, C.P.; Devabhaktuni, V. U-net and its variants for medical image segmentation: A review of theory and applications. IEEE Access 2021, 9, 82031–82057. [Google Scholar] [CrossRef]
Azad, R.; Aghdam, E.K.; Rauland, A.; Jia, Y.; Avval, A.H.; Bozorgpour, A.; Karimijafarbigloo, S.; Cohen, J.P.; Adeli, E.; Merhof, D. Medical Image Segmentation Review: The success of U-Net. arXiv 2022, arXiv:2211.14830. [Google Scholar] [CrossRef]
Ghafoorian, M.; Mehrtash, A.; Kapur, T.; Karssemeijer, N.; Marchiori, E.; Pesteie, M.; Guttmann, C.R.G.; de Leeuw, F.-E.; Tempany, C.M.; van Ginneken, B.; et al. Transfer Learning for Domain Adaptation in MRI: Application in Brain Lesion Segmentation. In Medical Image Computing and Computer Assisted Intervention—MICCAI 2017, Proceedings of the 20th International Conference, Quabec City, QC, Canada, 11–13 September 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 516–524. [Google Scholar]
Feng, R.; Liu, X.; Chen, J.; Chen, D.Z.; Gao, H.; Wu, J. A Deep Learning Approach for Colonoscopy Pathology WSI Analysis: Accurate Segmentation and Classification. IEEE J. Biomed. Health Inform. 2021, 25, 3700–3708. [Google Scholar] [CrossRef] [PubMed]
Huang, A.; Jiang, L.; Zhang, J.; Wang, Q. Attention-VGG16-UNet: A novel deep learning approach for automatic segmentation of the median nerve in ultrasound images. Quant. Imaging Med. Surg. 2022, 12, 3138–3150. [Google Scholar] [CrossRef]
Sharma, N.; Gupta, S.; Koundal, D.; Alyami, S.; Alshahrani, H.; Asiri, Y.; Shaikh, A. U-Net Model with Transfer Learning Model as a Backbone for Segmentation of Gastrointestinal Tract. Bioengineering 2023, 10, 119. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015; pp. 1–14. [Google Scholar]
Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef]
Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning (ICML’15), Lille, France, 6–11 July 2015; Volume 1, pp. 448–456. Available online: https://arxiv.org/abs/1502.03167v3 (accessed on 10 July 2023).
Yu, Y.; Si, X.; Hu, C.; Zhang, J. A review of recurrent neural networks: Lstm cells and network architectures. Neural Comput. 2019, 31, 1235–1270. [Google Scholar] [CrossRef]
Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.C. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. In Proceedings of the Advances in Neural Information Processing Systems 28 (NIPS 2015), Montreal, QC, Canada, 7–12 December 2015; Volume 2015, pp. 802–810. Available online: https://arxiv.org/abs/1506.04214v2 (accessed on 4 July 2023).
Alom, M.Z.; Yakopcic, C.; Hasan, M.; Taha, T.M.; Asari, V.K. Recurrent residual U-Net for medical image segmentation. J. Med. Imaging 2019, 6, 1. [Google Scholar] [CrossRef]
Arbelle, A.; Cohen, S.; Raviv, T.R. Dual-Task ConvLSTM-UNet for Instance Segmentation of Weakly Annotated Microscopy Videos. IEEE Trans. Med. Imaging 2022, 41, 1948–1960. [Google Scholar] [CrossRef]
Attia, M.; Hossny, M.; Nahavandi, S.; Yazdabadi, A. Skin melanoma segmentation using recurrent and convolutional neural networks. In Proceedings of the 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), Melbourne, VIC, Australia, 18–21 April 2017; pp. 292–296. [Google Scholar] [CrossRef]
Azad, R.; Asadi-Aghbolaghi, M.; Fathy, M.; Escalera, S. Bi-Directional ConvLSTM U-Net with Densley Connected Convolutions. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; pp. 406–415. [Google Scholar] [CrossRef]
Jiang, X.; Jiang, J.; Wang, B.; Yu, J.; Wang, J. SEACU-Net: Attentive ConvLSTM U-Net with squeeze-and-excitation layer for skin lesion segmentation. Comput. Methods Programs Biomed. 2022, 225, 107076. [Google Scholar] [CrossRef]
Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. UNet++: A Nested U-Net Architecture for Medical Image Segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Proceedings of the 4th International Workshop, DLMIA 2018 and 8th International Workshop, ML-CDS 2018 Held in Conjunction with MICCAI 2018, Granada, Spain, 20 September 2018; Springer: Berlin/Heidelberg, Germany, 2018; Volume 11045, pp. 3–11. [Google Scholar] [CrossRef]
Olsen, E.A.; Abernethy, M.L.; Kulp-Shorten, C.; Callen, J.P.; Glazer, S.D.; Huntley, A.; McCray, M.; Monroe, A.B.; Tschen, E.; Wolf, J.E. A double-blind, vehicle-controlled study evaluating masoprocol cream in the treatment of actinic keratoses on the head and neck. J. Am. Acad. Dermatol. 1991, 24 Pt 1, 738–743. [Google Scholar] [CrossRef]

Figure 1. U-Net architecture proposed by Ronneberger et al. (2015) [38].

Figure 2. VGG16 backbone transfer learning scheme for the U-Net encoder.

Figure 3. Proposed model for AK detection based on U-Net architecture (AKU-Net), with three key modifications: a utilizing the pretrained VGG16 as the encoder, incorporating convLSTM processing units in the skip connections, and integrating the BN in the decoding layer.

Figure 4. A patch from a clinical photograph with AK (left) and the predicted area (right). The AK labeling from the system is highlighted in red, and the yellow line represents the expert’s annotation. The estimations for the

D i c e

,

I O U

, and

a F 1

coefficients were 0.76, 0.62, and 0.97, respectively.

Figure 4. A patch from a clinical photograph with AK (left) and the predicted area (right). The AK labeling from the system is highlighted in red, and the yellow line represents the expert’s annotation. The estimations for the

D i c e

,

I O U

, and

a F 1

coefficients were 0.76, 0.62, and 0.97, respectively.

Figure 5. Photographs were taken to capture the presence of multiple lesions across the entire face and provide different views of the same lesions. With green color are the annotated by the experts AK lesions. The white circular sticker is a fiducial marker with a diameter of ¼ inch.

Figure 6. From left to right: 512 × 512 lesion center crop and the corresponding translation-augmented lesion crops.

Figure 7. A visual demonstration of the efficiency of the three trained models in AK detection in challenging skin areas. Skin folds, hairs, and small vessels all constituted sources of severe false positives for AKCNN (a–d). AK lesions with a low contrast and ambiguous boundaries (e,f) were successfully detected by AKU-Net. Note the inclusion of lesions from almost the whole spectrum of clinical AK grades. The experts’ annotations are in yellow, and the models’ predictions in red.

Figure 8. Exemplary visualization of AK detection of the same frame (Table 3; Frame 3) with two model architectures. (Left) The performance of AKCNN, where the “scanning” area (black line) was manually predefined to exclude areas covered by hairs and the anatomical structure of eyes: blue lines are the expert-annotated AK lesions and scanty colored areas correspond to the detected AK. (Right) Detection of AK using AKU-Net in the entire frame region (blue box). Note the aggregation of the AK-affected skin area in four distinct patches (red color).

Figure 9. Exemplary visualization of AK detection of the same frame (Table 3; Frame 6) with two model architectures. (Left) AKCNN detection results with the highest false-positive rate

a P r e c = 0.26

. (Right) The AKU-Net was favorably tolerant of the selection of the scanning area that was simply either a boxed area (blue box;

a P r e c = 0.71)

or considered as a wider frame (

a P r e c = 0.69

).

Figure 9. Exemplary visualization of AK detection of the same frame (Table 3; Frame 6) with two model architectures. (Left) AKCNN detection results with the highest false-positive rate

a P r e c = 0.26

. (Right) The AKU-Net was favorably tolerant of the selection of the scanning area that was simply either a boxed area (blue box;

a P r e c = 0.71)

or considered as a wider frame (

a P r e c = 0.69

).

Table 1. Dataset splitting into train validation and test sets.

	Patients	Images	Crops	Augmentation
Train	93	410	13,190	Yes
Validation	5	100	3298	Yes
Test	17	59	403	None
Total	115	569	16,891

Table 2. Segmentation accuracy of utilized model architectures.

Architecture	Dice (Mean)	IoU (Mean)
U-Net	0.14	0.48
U-Net⁺⁺	0.39	0.55
AKU-Net	0.50	0.63

Table 3. Comparison of the accuracy of AKU-Net and AKCNN (n = 10 random frames).

	AKCNN			AKU-Net
Frame	$a P r e c$	$a R e c$	$a F 1$	$a P r e c$	$a R e c$	$a F 1$
1	0.96	0.67	0.79	0.81	0.67	0.73
2	0.77	0.6	0.67	0.56	0.50	0.53
3	0.77	0.56	0.65	0.94	0.56	0.70
4	0.5	1	0.67	0.88	0.80	0.84
5	1	0.25	0.4	1.00	0.50	0.67
6	0.26	1	0.41	0.69	1.00	0.81
7	0.7	0.67	0.68	0.33	0.67	0.44
8	0.86	1	0.93	0.73	1.00	0.84
9	0.99	0.27	0.43	0.74	0.45	0.56
10	0.96	0.6	0.74	0.60	0.60	0.60
Median	0.82	0.64	0.67	0.73	0.63	0.68

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Derekas, P.; Spyridonos, P.; Likas, A.; Zampeta, A.; Gaitanis, G.; Bassukas, I. The Promise of Semantic Segmentation in Detecting Actinic Keratosis Using Clinical Photography in the Wild. Cancers 2023, 15, 4861. https://doi.org/10.3390/cancers15194861

AMA Style

Derekas P, Spyridonos P, Likas A, Zampeta A, Gaitanis G, Bassukas I. The Promise of Semantic Segmentation in Detecting Actinic Keratosis Using Clinical Photography in the Wild. Cancers. 2023; 15(19):4861. https://doi.org/10.3390/cancers15194861

Chicago/Turabian Style

Derekas, Panagiotis, Panagiota Spyridonos, Aristidis Likas, Athanasia Zampeta, Georgios Gaitanis, and Ioannis Bassukas. 2023. "The Promise of Semantic Segmentation in Detecting Actinic Keratosis Using Clinical Photography in the Wild" Cancers 15, no. 19: 4861. https://doi.org/10.3390/cancers15194861

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Promise of Semantic Segmentation in Detecting Actinic Keratosis Using Clinical Photography in the Wild

Abstract

Simple Summary

Abstract

1. Introduction

2. Materials and Methods

2.1. Methods

2.1.1. Overview of Semantic Segmentation and U-Net Architecture

2.1.2. Batch Normalization

2.1.3. ConvLSTM: Spatial Recurrent Module in U-Net Architecture

2.1.4. Loss Function

2.1.5. Evaluation

2.1.6. Implementation Details

2.2. Materials

Experimental Settings

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI