RETRACTED: Modern Subtype Classification and Outlier Detection Using the Attention Embedder to Transform Ovarian Cancer Diagnosis

Nobel, S. M. Nuruzzaman; Swapno, S M Masfequier Rahman; Hossain, Md. Ashraful; Safran, Mejdl; Alfarhood, Sultan; Kabir, Md. Mohsin; Mridha, M. F.

doi:10.3390/tomography10010010

Open AccessArticle

RETRACTED: Modern Subtype Classification and Outlier Detection Using the Attention Embedder to Transform Ovarian Cancer Diagnosis

by

S. M. Nuruzzaman Nobel

¹

,

S M Masfequier Rahman Swapno

¹

,

Md. Ashraful Hossain

¹

,

Mejdl Safran

^2,*

,

Sultan Alfarhood

²

,

Md. Mohsin Kabir

³

and

M. F. Mridha

⁴

¹

Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka 1216, Bangladesh

²

Department of Computer Science, College of Computer and Information Sciences, King Saud University, P.O. Box 51178, Riyadh 11543, Saudi Arabia

³

Superior Polytechnic School, University of Girona, 17071 Girona, Spain

⁴

Department of Computer Science, American International University-Bangladesh, Dhaka 1229, Bangladesh

^*

Author to whom correspondence should be addressed.

Tomography 2024, 10(1), 105-132; https://doi.org/10.3390/tomography10010010

Submission received: 30 October 2023 / Revised: 10 December 2023 / Accepted: 11 December 2023 / Published: 15 January 2024 / Retracted: 3 April 2024

Download

Browse Figures

Versions Notes

Abstract

:

Ovarian cancer, a deadly female reproductive system disease, is a significant challenge in medical research due to its notorious lethality. Addressing ovarian cancer in the current medical landscape has become more complex than ever. This research explores the complex field of Ovarian Cancer Subtype Classification and the crucial task of Outlier Detection, driven by a progressive automated system, as the need to fight this unforgiving illness becomes critical. This study primarily uses a unique dataset painstakingly selected from 20 esteemed medical institutes. The dataset includes a wide range of images, such as tissue microarray (TMA) images at 40× magnification and whole-slide images (WSI) at 20× magnification. The research is fully committed to identifying abnormalities within this complex environment, going beyond the classification of subtypes of ovarian cancer. We proposed a new Attention Embedder, a state-of-the-art model with effective results in ovarian cancer subtype classification and outlier detection. Using images magnified WSI, the model demonstrated an astonishing 96.42% training accuracy and 95.10% validation accuracy. Similarly, with images magnified via a TMA, the model performed well, obtaining a validation accuracy of 94.90% and a training accuracy of 93.45%. Our fine-tuned hyperparameter testing resulted in exceptional performance on independent images. At 20× magnification, we achieved an accuracy of 93.56%. Even at 40× magnification, our testing accuracy remained high, at 91.37%. This study highlights how machine learning can revolutionize the medical field’s ability to classify ovarian cancer subtypes and identify outliers, giving doctors a valuable tool to lessen the severe effects of the disease. Adopting this novel method is likely to improve the practice of medicine and give people living with ovarian cancer worldwide hope.

Keywords:

ovarian cancer; attention embedder; transfer learning; cancer subtype; computer vision; outlier detection; medical image; K-fold; hyperparameter tuning

1. Introduction

Ovarian cancer is well-recognized as the most deadly kind of cancer that affects the female reproductive system. Ovarian cancer has the highest fatality rate among gynecologic cancers and is the sixth leading cause of cancer-related deaths in women [1]. Regarding the death rates associated with female reproductive system malignancies, this one is impactful. More women die from ovarian cancer than any other reproductive system malignancy. The lifetime risk of ovarian cancer for women is 1 in 78. Their lifetime risk of ovarian cancer is 1 in 108. This malignancy mostly affects older women. About half of ovarian cancer patients are 63 or older. It is diagnosed more frequently in white women than in black women [2,3]. Pursuing enhanced patient outcomes has increased the recognition and use of subtype-specific therapeutic techniques. Nevertheless, the crucial need for such customized therapy is accurately identifying subtypes. Currently, the detection of ovarian cancer primarily relies on the expertise of pathologists. Nevertheless, this dependence presents notable obstacles, such as conflicts among observers, concerns over the ability to reproduce diagnoses, and the significantly restricted availability of pathologists with specialized expertise, particularly in disadvantaged regions. Ovarian cancer presents a major threat to the public’s health, as seen in its incidence percentage of 3.4%, including its mortality rate of 4.7%. The annual burden of this illness is substantial, affecting more than 300,000 women and resulting in nearly 152,000 deaths. These statistics underscore the grave implications that ovarian cancer has for women’s health and overall survival [4,5]. The prognosis for individuals diagnosed with ovarian cancer is unfavorable, as shown by a rate of survival of just 30 percent [6]. Combining platinum-based chemotherapy with cytoreductive surgery is the current first-line treatment for ovarian cancer [7]. The classification of ovarian carcinomas in the 2020 World Health Organization has at least five primary kinds, distinguished based on factors such as the immune profile, histology, and molecular analysis. The many kinds of ovarian carcinoma may be categorized as follows: the most common type is High-Grade Serous Carcinoma (HGSC), which constitutes around 70–80% of cases. Less common subtypes include Endometrioid Carcinoma (EC, 10%), Low-Grade Serous Carcinoma (LGSC, 5%), Clear-Cell Carcinoma (CC, 6–10%), and Mucinous Carcinoma (MC, 3–4%) [8,9,10]. The current strategy of subclassifying ovarian carcinomas follows a hierarchical structure, with the first phase being the categorization into histotypes based on classical histological characteristics. Histotypes are classified as distinct illnesses according to several factors, including the cell of origin, molecular changes, clinical characteristics, and treatment approaches [11,12]. Ovarian carcinoma is a diverse condition characterized by a range of cancers, each displaying a distinct precursor lesion, genesis, pattern of metastasis, responsiveness to therapy, and prognosis [13,14]. To transition smoothly from the general discussion of cancer subtypes to our specific methodology, we explicitly outline how our study addresses the identified challenges. This research aims to investigate the complex differences seen across several forms of ovarian cancer, which pose significant challenges in terms of the resolution associated with existing diagnostic techniques. In our comprehensive investigation of cancer subtype classification and outlier detection, we focused our efforts on the complex domain of ovarian cancer. The dataset used in this study consisted of a wide range of whole-slide images and tissue microarray (TMA) images. The collection contains 1051 samples in total; 538 of those samples are WSI and the rest, 513, are TMA. This introduced many issues related to variances in size, source origin, image quality, and staining processes. This highlights the significant importance of implementing a strong error-handling mechanism. We aim to precisely categorize CC, EC, HGSC, LGSC, and MC. This research aims to enhance personalized treatment techniques by using the capabilities of the Attention Embedder model. Figure 1 depicts an essential idea, showcasing WSI, bags, instances (patches), and scales. A complete study necessitates the extraction of several picture patches from lengthy WSIs at a resolution of 128 × 128 pixels, emphasizing the need to consider different scales. In the problem formulation, the class labels of instances are not observed, and instead, class labels are given to groupings of instances referred to as bags. The bag contains patches acquired at different scales, facilitating the recognition of distinct locations of interest at numerous scales. This strategic methodology improves our capacity to identify and evaluate critical characteristics within the complex domain of ovarian cancer.

By leveraging the Attention Embedder model, our approach enhances diagnostic accuracy and contributes to personalized treatment strategies, filling a critical gap in the current knowledge. Our methodology used the powerful Attention Embedder model for subtype categorization and outlier identification. Our results showcased remarkable accuracy, precision, recall, and F1-scores for 20×- and 40×-magnification images. Significantly, while analyzing the dataset, we encountered a distinctive issue associated with the imbalance in the EC data. We deliberately eliminated this subtype as a strategic decision to optimize the system’s efficiency and efficacy. The model, which underwent thorough training for 20 epochs, demonstrated impressive performance. The rationale for doing comprehensive training over epochs is based on achieving a trade-off between model convergence and computing efficiency. Increasing the number of training epochs can improve performance, but there is a danger of overfitting, particularly given the intricate nature of the Attention Embedder model. This approach is based on established principles of creating deep learning models, whereby the selection of the number of epochs is made judiciously to achieve a trade-off between underfitting and overfitting. The inclusion of openness in this decision guarantees a strong and dependable training procedure, hence enhancing the trust in the results produced by our model. The training accuracy of the 20×-magnification images was found to be 96.42%, which is a notable achievement. Additionally, the validation accuracy was determined to be 95.10%. Similarly, the model attained a Training Accuracy of 93.45% and a Validation Accuracy of 94.90% for 40×-magnification images. Our testing of fine-tuned hyperparameters demonstrates outstanding performance on independent images. With a magnification of 20×, we achieved a remarkable accuracy of 93.56%. Our Testing Accuracy remains excellent at 91.37% even when magnified to 40×. The precision, recall, and F1 scores were evaluated comprehensively for each subtype, yielding a detailed comprehension of the model’s robustness in correctly categorizing ovarian cancer subtypes. In addition to these quantitative measures, it is crucial to emphasize our model’s possible influence within the cancer treatment domain. The precise categorization of different subtypes of ovarian cancer has high significance for developing individualized treatment approaches. By identifying the individual subtype, our model offers doctors significant insights into the malignancy’s distinct features, enabling customized therapy strategies. This increases the effectiveness of therapy and aids in reducing adverse effects and enhancing overall patient outcomes. As shown by our model, incorporating sophisticated techniques represents a significant advancement in precision medicine. This discovery has enormous potential to enhance both the precision of diagnoses and the efficacy of treatment in the complex field of ovarian cancer. Our contributions and obligations are the following:

We aim to determine the best practical strategy for precisely classifying and identifying outliers within ovarian cancer subtypes.
We used a dataset generated from a Kaggle competition that posed significant challenges. The correctness of our suggested model has been effectively established, which we have achieved through diligent study.
We apply several benchmark machine learning algorithms to clinical criteria for ovarian cancer subtype classification and outlier detection. We use a separate algorithm that provides better classification and outlier detection with ovarian cancer.
A unique model called the Attention Embedder has been developed with a particular emphasis on classification and outlier detection in ovarian cancer subtypes.

This study is organized into multiple sections, each with a specific function. An overview of relevant works on the topic is given in Section 2, which serves as a basis for the investigation. A thorough analysis of the dataset utilized in the study is covered in Section 3. Section 4 delineates the methodology utilized in the study, providing insight into the research strategy. The analyses are presented in Section 5, which provides an understanding of the study’s conclusions. Section 6 delves into the subject, offering a critical analysis of the findings and their consequences. Last but not least, Section 7 presents findings from the study and suggests future paths of inquiry for the field.

2. Related Works

The Ovarian Cancer Subtype Classification divides patients into categories based on distinctive traits for customized care. Outlier Detection identifies unusual cases, assisting in a better understanding of the illness and potential novel therapeutic strategies. Many researchers have thoroughly investigated ovarian cancer, defining its various phases and forms. The researchers have identified outliers within the condition and categorized these variances as a result of their thorough research. This thorough classification and outlier detection are essential to create more practical remedies and advance our comprehension of ovarian cancer. Ovarian Cancer Subtype Classification and Outlier Detection select subtype ovarian cancer cases according to their unique genetic, molecular, and clinical characteristics. This division facilitates the personalization of therapy, prognosis classification, advancement of research, and treatment adaptation. Outlier Detection also seeks to find exceptional or rare cases with distinctive traits or therapeutic responses, adding to our understanding of the illness and possibly revealing new subtypes. Diagnoses of ovarian cancer are critical in the patient care process because different ovarian cancer histological subtypes have different genetic and molecular profiles, treatment choices, and patient outcomes, as discussed in Jack et al. [15] Introducing Discriminative Region Active Sampling for Multiple Instance Learning (DRAS-MIL). This computationally efficient slide classification method leverages attention scores to concentrate sampling on highly discriminative regions. Using a set of 714 WSIs gathered from 147 epithelial ovarian cancer patients at Leeds Teaching Hospitals NHS Trust, distinguishing between the four subtypes of epithelial ovarian cancer (low-grade serous, endometrioid, clear-cell, and mucinous carcinomas combined) was accomplished. The authors demonstrated that DRAS-MIL could reach classification performance comparable to thorough slide analysis, with a threefold cross-validated AUC of 0.8679 compared with 0.8781 with typical attention-based MIL classification. There, the authors utilized no more than 18% of the memory of the conventional approach while only spending 33% of the time when evaluating on a GPU and just 14% of the time when evaluating on a CPU alone. The reduced classification time and memory needs of AI may facilitate clinical implementation and democratization and lower the degree to which end-user adoption is constrained by computing hardware. Anwar et al. [16] conducted hypothesis-free phenome-wide association research (PheWAS) to discover qualities that share a genetic architecture with ovarian cancer and its comorbidities. The relationship between OC and OC subtype-specific genetic risk scores was investigated (OC-GRS) and 889 illnesses and 43 other features using data from 181,203 white British female UK Biobank individuals were included. PheWAS and colocalization analyses were performed for individual variations to find proof of similar genomic architecture. Ten diseases were associated with the OC-GRS, while five were linked to the clear-cell OC-GRS at the FDR threshold (p = 5.610-4). Strong evidence was provided via Mendelian randomization analysis (MR) for the relationship between OC and a higher risk of secondary malignant neoplasm in digestive systems (OR 1.64, 95% CI 1.33, 2.02), ascites (1.48, 95% CI 1.17, 1.86), chronic airway obstruction (1.17, 95% CI 1.07, 1.29), and abnormal findings upon examination of the lung. Analyses of lung spirometry measures provided additional support for decreased respiratory function. PheWAS on individual OC variations discovered five genetic variants connected to various diseases and seven variants linked to biomarkers (all, p = 4.510-8). Colocalization analysis was used to identify rs4449583 as the shared causal variation between seborrheic keratosis and OC. Identifying the ovarian cancer immune classification, Tang et al. [17] disclosed OV subtypes. The authors noticed 379 OV samples from the UCSC website. They examined 29 immune gene sets using single-sample gene set enrichment to identify the immunological subtypes of OV. Gene set variation analysis was used to examine the distinguishing characteristics and the Kyoto Encyclopaedia of Genes and Genomes offered details regarding the pathways of immune types. Using single-sample gene set enrichment analysis, a distinction between the immunity_H and immunity_L subtypes was observed. Weighted gene co-expression networks and four hub IRGs (CCR5, IL10RA, ITGAL, and PTPRC) were constructed by working together. When their team also investigated the mutations in four hub IRGs, an amplification of the PTPRC gene of about 7% was discovered. Additionally [18], eight immune-checkpoint genes—all but CD276—had increased expression in the Immunity_H group when compared with the Immunity_L group. The relationship between PD-1/PD-L1 and four hub IRGs was investigated, and gene set enrichment analysis was performed to investigate the underlying mechanisms of PTPRC in OV. Additionally, PTPRC may control PD-L1 expression by triggering the JAK-STAT signaling pathway, according to Western blotting data. It was ensured that a wide range of investigations were performed to pinpoint OV’s two immunological subtypes and four hub IRGs. Mohamed et al. [19] created a technique for identifying Ovarian Cancer (OC) that affected women’s ovaries where data was produced from the Internet of Medical Things (IoMT) to identify and separate OC. Self-organizing maps (SOM) and optimal recurrent neural networks (ORNN) were used to categorize OC. Better feature subset selection and the separation of useful, intelligible, and exciting data from enormous amounts of medical data were achieved using the SOM algorithm. The researchers stated that an ideal classifier, known as the Optimal Recurrent Neural Network (ORNN), was also used. By adjusting the weights of the Recurrent Neural Network (RNN) structure using the Adaptive Harmony Search Optimization (AHSO) method, the classification rate of OC detection was increased. A series of trials using information gathered from women with a high risk of OC because of a personal or family history of cancer was performed. When measured against other techniques such as RNN, FeedForward Neural Networks (FFNN), and others, their method had a maximum accuracy of 96.27%, a sensitivity rate of 85.2%, and a specificity rate of 85.2, respectively. The authors confirmed that the model can detect cancer early with excellent accuracy, sensitivity, specificity, and a low Root Mean Square Error. For well-defined groupings of ovarian tumors including the deep proteome, in Simonas et al. [20], nine cases of early-stage benign serous and ovarian cancer, including Type 1 and Type 2, were analyzed using TMT-LC-MS/MS. The study also included the expression analysis of Type 1 (low-grade serous, mucinous, endometrioid), Type 2 (high-grade serous), and Type 3 (benign serous) at FIGO stage I. ProteomeXchange provided access to information with the ID PXD010939. Examining new bioinformatics tools was a part of the discovery phase. Various normalizations, a mix of univariate statistics, a logistic model tree, and a naive Bayes tree classifier, as well as univariate statistics, were all used in this new selection approach. As a result of this combined method, 142 proteins were discovered. Among the nine distinct proteins and one biomarker panel that were confirmed in cyst fluid and serum were transaldolase-1, fructose-bisphosphate aldolase A (ALDOA), transketolase, ceruloplasmin, mesothelin, clusterin, tenascin-XB, laminin subunit gamma-1, and mucin-16. However, ALDOA was the only significant protein in the serum. Six of the proteins were found to be significant in cyst fluid (p = 0.05). Both 0.96 and 0.57 were the ROC AUC values for the biomarker panel. The research concluded that classification algorithms augmented traditional statistical approaches by identifying combinations that traditional univariate tests would have missed. Maxence et al. [21] claimed that HGSC originated from fallopian tube epithelial (FTE) cells, specifically those in the region of the tubal-peritoneal junction. Sectioning and Extensively Examining the Fimbriated End Protocol focused on three essential lesions: STILs, STICs, and p53 signatures. These lesions were detected based on the immunohistochemistry (IHC) pattern of the markers p53 and Ki67 and cellular abnormalities. A complete proteome assessment of these preneoplastic epithelial lesions was conducted using IHC and mass spectrometry imaging. The specific markers of each preneoplastic lesion were studied. CAVIN1, Emilin2, and FBLN5 were identified as specific lesion markers. Additionally, the authors used SpiderMass technology to undertake a lipidomic analysis and found that lesions included a specific lipid signature, including dietary fatty acid precursors. This revealed the molecular pathways of ovarian cancer and established the fimbria genesis of HGSC. In light of the threat of epithelial ovarian cancer (EOC), Mariola et al. [22] implemented clear-cell, mucinous, and endometrioid carcinomas. Additionally, the researchers demonstrated how the prognostic factors were predicated on EOC outcomes, which was difficult because the condition was frequently detected after spreading to multiple subtypes. The researchers demonstrated a highly developed analytical workflow based on solid-phase microextraction (SPME) and three orthogonal LC/MS acquisition modes that made it possible to map a variety of analytes in serum samples from EOC patients comprehensively. It was demonstrated that the four main EOC subtypes could be clearly distinguished using PLS-DA multivariate analysis of the metabolomic data, and the significance of discriminative metabolites and lipids was confirmed using multivariate receiver operating characteristic (ROC) analysis (AUC value > 88% with 20 features). The four EOC subtypes had distinct abnormalities in the metabolism of amino acids, lipids, and steroids, according to further pathway analysis using the top 57 dysregulated metabolic characteristics. According to them, metabolomic profiling could be a potent approach to support histology in classifying EOC subgroups. In the initial phases of ovarian cancer, Samridhi et al. [23] communicated the identification. A thorough strategy and the exploitation of the dataset to increase the likelihood of accurate categorization were employed. The dataset was enhanced using thorough pre-processing and data augmentation methods utilizing available internet images. The dataset’s size was increased, and it was made diverse. The aim was to capture various malignant appearances and reduce biases. The augmented images were categorized using a set of six cutting-edge classifiers that were used in MATLAB. A holdout method for cross-validation was used to evaluate the effectiveness of the classifiers. The experiment displayed outcomes with a remarkable 99% accuracy rate, highlighting the efficiency of the approach in spotting ovarian cancer in its early stages. There is enormous potential for better prognoses and treatment results that the authors observed in the early detection of ovarian cancer. The authors added to the growing body of knowledge to combat ovarian cancer by highlighting the significance of extending and diversifying datasets and utilizing advanced classification techniques. The need for early intervention in minimizing sneaky disease mortality was stressed. The inability to effectively identify the Ovarian Cancer subtype is a critical issue. Moreover, existing approaches could be better. The proposed Attention Embedder model has drawn much historical interest in solving this challenge, consistently pulling academics to this area of research.

3. Dataset Analysis and Discussion

We have acquired a new and dynamic dataset from Kaggle, which is currently being utilized in an ongoing contest [24]. This dataset is particularly remarkable, comprising high-resolution images. The source of these images is a collaborative effort involving more than 20 prestigious medical centers. The actual number of overall images that we produce is 1051. Within this dataset, we encounter two distinct categories of images: whole-slide images (WSI) and tissue microarray (TMA) images. The WSI category features images captured at a substantial 20× magnification, rendering them considerably large and intricate. In contrast, the TMAs, or tissue microarray images, are smaller in scale, measuring approximately 4000 by 4000 pixels, but we boast a remarkable 40× magnification for them. Although this dataset offers effective, useful information, its astonishing 550 GB size is somewhat overwhelming. Therefore, importing this enormous amount of data would take a long time. However, the information and insights that could be derived from this varied and substantial dataset promise to be extremely valuable for our ongoing study and contest participation. Several ovarian cancer subtypes are represented in our dataset, including high-grade serous carcinoma (HGSC), clear-cell carcinoma (CC), endometrioid carcinoma (EC), low-grade serous carcinoma (LGSC), and mucinous carcinoma (MC). In Figure 2, we display the percentage of five subtypes of ovarian cancer that our dataset yielded.

The presence of differences in source origin, picture quality, size, slide staining processes, and other related elements highlights the need to implement a dependable and adaptable error-handling approach. Regarding the revised training data, a total of 112,609 images were used for 20× magnification. The data were divided using an 80-20 split ratio, with 89,068 images allocated for training and 22,521 images for validation. At a magnification of 40×, the total number of images obtained was 17,851. These images were divided into two sets: a training set consisting of 13,281 images (about 80% of the total) and a validation set consisting of 3570 images (approximately 20% of the total). We conducted testing using a set of 2000 independent images. Specifically, 1000 images were selected for testing at 20× magnification, and an additional 1000 images were chosen for testing at 40× magnification, as shown in Table 1. During the study of the dataset, it was observed that there was a notable data imbalance related to EC, which is considered an abnormality. We removed this subtype to handle this issue and optimize the system’s efficacy. Instead, our approach will concentrate on identifying the other four ovarian cancer subtypes, providing more reliable and resilient outcomes.

3.1. Outlier Expression of Five Subtypes

Outliers are seen when looking at the height dimension in Figure 3. There is an outlier for CC in the first position; similarly, an outlier for MC is also present in the first position. On the other hand, the outlier for EC is precisely located at 100,000 pixels in width, while the outlier for MC is positioned at the beginning. A scatter plot displaying the five kinds of ovarian cancer is shown in Figure 4. Notably, the endometrioid, mucinous, and clear-cell carcinoma dataset contains three outliers designated as O1, O2, and O3. Here, the highest outlier for MC was discovered, along with the lowest outlier for CC.

Figure 4 shows a scatter plot of these three outliers, connected to the subtypes above, for a more in-depth visualization. In a two-dimensional space or array, the horizontal axis is often denoted as the “x” dimension, while the vertical axis is called the “y” dimension. This graphic presents empirical support for the existence of clusters via the identification of distinct locations. Typically, a two-dimensional area comprises discrete points that may be seen when aggregating into a cluster. This enables the identification of the points that are part of a particular cluster and their corresponding positions within it. The clusters were observed, and several clusters were subsequently recognized as outliers, namely O1, O2, and O3. Including its facilitation in locating clusters justifies our study’s x and y dimensions.

3.2. Data Preprocessing Processes

We employed data augmentation techniques to address the data imbalance issue, which involved cropping images to augment the dataset. This approach effectively mitigated the problem of overfitting. We divided a large image into 128 × 128 pixel patches through “patch size division,” creating a dataset of 130,460 unique patches. This strategy offered valuable resources for cancer research and helped with comprehending the subtleties of ovarian cancer. Our dataset opens up new possibilities for ovarian cancer research and treatment. To guarantee the stability and applicability of our model, we handled the dataset with great care. Normalization is a critical stage in this process that involves using a method to normalize the values of the pixels in the images. Our deep learning model for Ovarian Cancer Subtype Classification and Outlier Detection depends critically on the stability and reliability of the normalized training dataset we obtained. There are several techniques for normalizing images. One method is scaling pixel values to fit into [0, 1] or [−1, 1] ranges. For example, scaling an 8-bit picture to [0, 1] requires dividing its pixel values by 255. Another method is standardization, which involves converting each picture channel’s signal into a random variable with a mean of 0 and a variance of 1, where m is the sample mean and s is the sample variance, as shown in Equation (1).

\tilde{x} = \frac{(x - μ)}{σ}

(1)

We also used picture encoding, which transforms visual input into a digital format required for transmission, storage, and computational analysis. This encoding technique converts the relationships between color and intensity within each pixel into a NumPy array, allowing computers to comprehend and alter visual content. This facilitates various applications, including data processing and visual communication, which are essential to our data processing workflow.

In our data processing pipeline, image leveling comes next after image encoding. Improving image quality entails changing brightness and contrast. The main goal is to distribute visual elements evenly so that we can be understood more easily. This method dramatically improves the overall image quality of our dataset. Image leveling balances the visuals by making minute brightness and contrast modifications in the presence of fluctuations in lighting, exposure, or contrast. Our analytical efforts are aided by this technique, which also improves aesthetics and unveils previously hidden elements. In conclusion, picture leveling is an essential phase that ensures image enhancement and standardization, laying the groundwork for subsequent analysis and research.

Figure 5 shows the divided image from the original images containing the ovarian cancer subtype.

In this part, we summarize the dataset, noting how its subtypes are distributed and outlining the preprocessing methods used to get a leveled dataset. We also explore how outliers are produced and our methods to deal with them. The two different forms of data and the techniques utilized to improve the classification of and outlier detection in ovarian cancer subtypes are covered in the last section.

4. Research Methodology and Implementation

Our implementation is built around a complex methodology focusing on ovarian cancer subtype classification and outlier detection. We have developed an excellent technique for classifying the various subtypes of ovarian cancer by utilizing the strengths of an Attention Embedder model. This model, renowned for its incredible speed, can quickly process and categorize ovarian cancer subtypes. Its complicated structure, which is made up of several blocks that function well together, is what makes it unique. Together, these parts strengthen the model, enabling it to carry out the dual duties of ovarian cancer classification and outlier detection with incredibly high accuracy and speed. Essentially, our method takes care of the crucial duty of accurately classifying ovarian cancer subtypes and the crucial job of spotting any outliers in the dataset. This robust and complex model demonstrates our dedication to increasing the effectiveness and speed of ovarian cancer research and diagnostics. Figure 6 illustrates both the architecture and operation of the Attention Embedder model.

4.1. Execution of Attention Embedder Model

A deep learning architecture that incorporates the idea of attention is known as an attention embedding model, an attention-based model, or an attention mechanism. The numerous machine learning and deep learning applications that primarily depend on attention and embeddings include computer vision and natural language processing, to name just two. There are two parts to this model’s working process.

When making classifications or choices, the model can concentrate on pertinent details by using attention mechanisms to weigh various parts of an image. For instance, while creating captions for images, the model can utilize attention to concentrate on the most crucial elements of the picture while coming up with a description.
Image embeddings are compact, lower-dimensional representations of images that capture their content. These embeddings are often learned through CNNs, a neural network for image processing. These embeddings are useful for various tasks, including image retrieval, similarity analysis, etc.

An attention embedding model combines attention mechanisms to concentrate on pertinent areas of the image and image embeddings to provide condensed, insightful image data representations. Combining these two factors enables the model to process images efficiently and produce output based on attended and embedded data.

4.2. Encoder-Decoder Process for Attention Layer

The encoder in the provided architecture has two encoder layers and accepts 128 × 128 pictures as input. The first encoder uses a 3 × 3 kernel with 128 filters, whereas the second encoder uses a 3 × 3 kernel with 256 filters. This layout takes into account images of various resolutions. A main encoder block is then given the outputs of the first and second encoders. Conv2D (convolutional layer), Batch Normalization, and ReLU activation are some of the processes carried out by this block. Features from the input image are extracted and processed using these operations. Two separable convolutional layers are usually in the main encoder block, followed by batch normalization and ReLU activation. These convolutional layers have changeable filter counts and kernel sizes that can be adjusted when invoking the algorithm. The encoder then returns the output after processing through the primary encoder block, which can be used by later network components, such as encoder 3 or other layers in our architecture. The encoder network is tasked with transforming an input signal

x \in X \subset R^{d_{o}}

into a feature space denoted as

z \in Z \subset R^{d_{k}} .

Conversely, the decoder takes this feature map as its input processes it, and generates an output referred to as

y \in Y \subset R^{d_d L}

. In this scenario, a symmetric configuration is employed, which entails both the encoder and decoder sharing an equal number of layers, typically denoted as

κ

. Furthermore, to maintain symmetry, the input dimensions for the encoder layer, often labeled as

ξ^{l}

align with the output dimensions for the decoder layer, denoted as

D^{l}

. This symmetrical design ensures a balance between the encoding and decoding processes, contributing to the overall coherence and performance of the network as

ξ^{l} : R^{d_{l - 1}} \mapsto R^{d l}, d^{l} : R^{d l} \mapsto R^{d_{l - 1}}

(2)

Here, the notation

l \in [k]

with

[n]

represents a transpose operation, and

1, . . . . ., n

pertains to the input from the

j_{t h}

channel, which possesses a dimension of

d_{o} .

As a result, the cumulative input dimension is denoted as

q_{l - 1}

At the

l_{t h}

layer of the encoder, the convolution operation is employed to generate an output with channels.

ξ^{l - 1} = [ξ_{1}^{l - 1 T} . . . . . ξ_{q l - 1}^{l - 1 T}] \in R^{d_{l - 1}},

(3)

In this scenario, ^T signifies a transpose operation, while

ξ_{j}^{l - 1} \in R^{m_{l - 1}}

represents the input from the

j_{t h}

channel with a dimension of

m_{l - 1} .

As a result, the combined input dimension is defined as

d_{l - 1} : = m_{l - 1} q_{l - 1} .

In the

l_{t h}

layer of the encoder, the convolution operation

ξ_{j}^{l} = σ (Φ^{l T} \sum_{k = 1}^{q_{l - 1}} (ξ_{k}^{l - 1} ⊛ ψ_{j, k}^{- l})), j \in [q_{1}]

(4)

is utilized to produce an output with

q_{l}

channels. In this context,

ξ_{j}^{l} \in R^{m_{l}}

pertains to the output of the

j_{t h}

channel after convolutional filtering using r-tap filters, denoted as

ψ_{j, k}^{- l} \in R^{r}

and a pooling operation

Φ^{l T} \in R^{m_{l} * m_{l - 1}} .

Meanwhile,

σ (.)

denotes the Element-wise Rectified Linear Unit (ReLU) operation. To provide further clarity,

ψ_{j, k}^{- l} \in R^{r}

represents the r-tap convolutional kernel, which is convolved with the

k_{t h}

input to contribute to the output of the

j_{t h}

channel, denoted as ⊛. Circular convolution is employed through the utilization of periodic boundary conditions, which helps avoid the need for the specific handling of the convolution at the boundary. In this context, “v” denotes the flipped version of the vector “v”.

ξ^{l} : = σ (E^{l T} ξ^{l - 1}) = [ξ^{l T} . . . . ξ_{q l}^{l T}]

(5)

Here,

E^{l} \in R^{d_{l - 1} * d_{l}}

is the result of the computation involving

E^{l} = [\begin{array}{l} Φ^{l} ⊛ ψ_{1, 1}^{l} & . . . . . . & Φ^{l} ⊛ ψ_{q_{l}, 1}^{l} \\ Φ^{l} ⊛ ψ_{1, q_{l - 1}}^{l} & . . . . . . & Φ^{l} ⊛ ψ_{q_{l}, q_{l - 1}}^{l} \end{array}]

(6)

with

Φ^{l} ⊛ ψ_{i, j}^{l} : = [ϕ_{1}^{l} ⊛ ϕ_{i, j}^{l} . . . . . . ϕ_{m_{l} ϕ^{l}}^{l} ⊛ ψ_{i, j}^{l}]

(7)

On the contrary, the input signal for the l-th layer of the decoder is sourced from

q_{l}

channel inputs, denoted as

ξ_{j}^{l - 1} = σ (\sum_{k = 1}^{q 1} ((Φ^{l} ξ_{k}^{l} + χ_{k}^{l}) ⊛ ψ_{j, k}^{l}))

(8)

while the filter matrix for the Decoder branch is symbolized as

D^{l}

, which is further represented by

D^{l} = [\begin{array}{l} I_{m_{l - 1}} ⊛ ψ_{1, 1}^{l} & . . . . . . . & I_{m_{l - 1}} ⊛ ψ_{q l, 1}^{l} \\ I_{m_{l - 1}} ⊛ ψ_{q_{l}, 1}^{l} & . . . . . . . & I_{m_{l - 1}} ⊛ ψ_{q l, q_{l - 1}}^{l} \end{array}]

(9)

In the overall process, the encoder takes the input first and encodes the input data. The decoder then uses the encoded representation to recreate the original input inside the model.

4.3. Attention Layer Work for Model

The scaled dot-product attention mechanism, often used in transformer designs, is crucial in many deep learning models, particularly in natural language processing. The model formula for the attention mechanism is defined as follows. Assume we have a collection of input vectors or sequences:

A collection of query vectors, Q, with the following dimensions: batch_size, seq_length, and model.
A group of key vectors with the following dimensions: batch_size, seq_length, and d_model.
A collection of value vectors, V, having the following dimensions: batch_size, seq_length, and model.

The self-attention mechanism computes the attention scores for each pair of the query and key vectors, which are then used to weight the value vectors and produce the output. The mathematical formula for the scaled dot-product attention is as follows:

1.: Compute the raw (unscaled) attention scores, A:

$A = \frac{Q * K^{T}}{\sqrt{d_k}}$

(10)

Here, “ $\hat{T}$ ” denotes the transpose of K, and “sqrt(d_k)” is a scaling factor; d_k stands for the key vector’s dimension. This scaling is used to prevent the dot products from getting too large, which can result in too small gradients during training.
2.: Apply a softmax function to normalize the attention scores across the sequence length, producing the attention weights, W:

$W = s o f t m a x (A)$

(11)
3.: Use the attention weights to weight the value vectors, V, and obtain the output, O:

$O = W * V$

(12)

An attention layer distributes relevance scores to various sequence segments by multiplying input vectors by a weight matrix. It creates a weighted sum by element-wisely summing these scores with the encoder’s output. An attention function (like softmax) is used with this total to produce a probability distribution that reflects the relevance of the elements. In essence, attention computes relevance weights for each input piece by combining input with encoder output. Each query in the batch goes through these procedures, which produce an output tensor O with the same shape as the input value tensor V. Every query in the batch can use the self-attention method concurrently. The model can learn which input sequence elements are most pertinent to each query thanks to the attention mechanism, making it a potent tool for various sequence-based tasks like machine translation, text synthesis, and more.

4.4. Classification of Ovarian Cancer Subtypes

There are three encoders—Encoder 2, Encoder 3, and a Decoder—in a neural network with an attention mechanism. A completely linked layer processes the combined value that the attention function creates by combining the outputs of Encoders 2 and 3. Here’s a more concise representation of the process:

Encoder 2 and Encoder 3 outputs are combined using an attention mechanism.
The combined value from the attention mechanism is passed through a fully connected layer.

Assume there are “n” neurons in the layer below and “m” neurons in the present (completely linked) layer. Each link between a neuron in the previous layer and a neuron in the current layer has a weight (W) and a bias (b) associated with it. For each neuron in the current layer (i), the output (z_i) can be computed as

Z_{i} = \sum_{j}^{n} (x_{j} * w_{i j}) + b_{i}

(13)

where the following are true:

Neurons in the present layer are indexed from 1 to m by a number called i.
From 1 to n, j represents the neuron’s index in the layer before it.
$x_{j}$ the $j_{t h}$ neuron in the previous layer produced as $W_{i j}$ of the link between the $i_{t h}$ and $j_{t h}$ neurons in the current layer is known as the output. The bias for the $i_{t h}$ neuron in the current layer is denoted by the letter as $b_{i}$

This method is repeated for each neuron in the current layer to compute the outputs for the entire layer. The subsequent layer in the neural network can then take its output from this ultimately linked layer as an input. Then, dropout is used. The regularization method called dropout is frequently employed in neural networks to avoid overfitting. During training, it randomly deactivates (drops out) a certain percentage of neurons. Dropout’s main goal is to keep the network from depending too much on any one neuron, which will encourage more robust feature learning and lessen overfitting.

For each neuron, generate a binary mask d of the same shape as the neuron’s output. This mask d is typically drawn from a Bernoulli distribution with a probability p (dropout probability), where p is the fraction of neurons to keep.
Compute the dropout output during training by element-wisely multiplying the input x by the dropout mask d:

$o u t p u t_d u r i n g_t r a i n i n g = x * d$

(14)
During inference or testing, all neurons are used, so dropout isn’t applied. The output is unchanged from the input:

$I n p u t_d u r i n g_i n f e r e n c e = x$

(15)

4.5. Model Training, Evaluation, and Selection

In our quest for Ovarian Cancer Subtype Classification and Outlier Detection, we systematically evaluated seven prominent CNN models: VGG16, VGG19, ResNet50, ResNet101, InceptionResNetV2, InceptionV3, and DenseNet121. Remarkably, the Attention Embedder model outperformed all others. We meticulously explored different training batch sizes, including 8 and 16, settling on 16 as the most effective choice. To enhance model efficiency, we employed five different optimizers, including SGD, Adam, RMSprop, Nadam, and Adamax, with Nadam yielding the highest efficiency. We compared Categorical Cross-Entropy and Kullback–Leibler Divergence for loss function selection, with Categorical Cross-Entropy proving the most efficient. Our experimentation extended to kernel sizes, considering both 3 × 3 and 5 × 5 kernels. The 3 × 3 kernel size emerged as the optimal choice for our purposes. The learning rate played a pivotal role in our model’s performance. We tested values of 0.1, 0.001, 0.0001, 0.00001, and 0.002, with 0.0001 delivering the best results. Lastly, we fine-tuned our model with dropout values of 0.1, 0.2, 0.3, and 0.5, finding that a dropout value of 0.2 produced the most favorable outcome. All of these essential components improve the efficiency and viability of our proposed Attention Embedder model. Table 2 lists the parameters of our experimental model.

An Attention Embedder model is utilized for both the classification of ovarian cancer subtypes and the identification of outliers during the process. To accomplish these goals, this paradigm is essential.

5. Result Analysis

The investigation of the ovarian cancer subtypes, including subtype categorization and outlier detection, is thoroughly summarized in this section. We offer insights into the model’s performance by displaying accuracy data for 20×- and 40×-magnification images. We also show the ROC graphs for both picture scales our Attention Embedder model produced. We investigate the model’s loss and, for clarity, divide it into two distinct images. Additionally, we include precision, recall, F1-score, support metrics, and confusion matrices for both 20×- and 40×-magnification images. The outputs of the models that were used are included in our findings. We give illustrations of the ovarian cancer subtypes and their accompanying classification graphics to help readers comprehend.

5.1. Accuracy and Loss of Ovarian Cancer Subtype for 20×-Magnification Images

We gauge the accuracy of our Ovarian Cancer Subtype Classification and Outlier Detection working model by considering a set of critical metrics: True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN). These values collectively illuminate the model’s performance in accurately categorizing Ovarian Cancer Subtype Classification and Outlier Detection.

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(16)

True Positive (TP): These are instances where the model correctly identifies and classifies an ovarian cancer subtype as positive.

True Negative (TN): These instances indicate the model’s success in accurately recognizing and categorizing an ovarian cancer subtype as negative.

False Positive (FP): In this scenario, the model wrongly identifies the ovarian cancer subtype as positive when it should have been categorized as negative.

False Negative (FN): Conversely, this happens when the model incorrectly identifies an ovarian cancer subtype as negative instead of categorizing it as positive.

We acquire a thorough knowledge of the model’s accuracy and effectiveness in ovarian cancer classification by examining these metrics.

Our pursuit of ovarian cancer subtype classification and outlier detection has yielded remarkable results. The Attention Embedder model demonstrates an impressive training accuracy of 96.42%, while the validation accuracy stands at a commendable 95.10%. This achievement has been attained through meticulous training over 20 epochs. Figure 7 and Figure 8 provide a brief yet informative overview of our results. The five ovarian cancer subtypes are classified using 20× magnification in this picture, including a summary of the training accuracy, validation accuracy, training loss, validation loss, and ROC curve.

5.2. Accuracy and Loss of Ovarian Cancer Subtype for 40× Images

We have achieved essential strides in pursuing ovarian cancer subtype classification and outlier detection with 40× quality photos. When employing the Attention Embedder model, we specifically got a training accuracy of 93.45% and a validation accuracy of 94.90%. These accomplishments took place during 20 epochs. Additionally, we calculate the loss. These outcomes highlight how well our model correctly classifies ovarian cancer subtypes and identifies outliers in the dataset. Figure 9 and Figure 10 provide a brief yet informative overview of our results. To classify the five subtypes of ovarian cancer at 40× magnification, it is critical to know the training accuracy, validation accuracy, training loss, validation loss, and ROC curve. These metrics are all summarized in this figure.

5.3. Precision, Recall, F1-Score, and Support for 20×-Magnification Images

These perceptive indicators offer a thorough comprehension of the model’s performance over the range of ovarian cancer subtype classification and outlier detection.

The finding techniques for precision, recall, F1-score, and support are as follows:

Precision: Machine learning uses precision as a key parameter to assess the accuracy of correct classifications. It determines the proportion of instances that were correctly identified as positive to all of the instances that the model classified as positive, demonstrating the model’s accuracy in identifying positive situations.

P r e c i s i o n = \frac{T P}{T P + F P}

(17)

Recall: A crucial parameter in statistics and machine learning is recall, sometimes called sensitivity or true positive rate. It measures how well the model can distinguish between the dataset’s overall number of positive cases and the actual positive instances.

R e c a l l = \frac{T P}{T P + F N}

(18)

F1-score: The F1-score, also called the F1-measure, is a metric frequently used in statistics and machine learning to provide a fair assessment of a model’s performance, particularly in circumstances where precision and recall need to be balanced.

f 1 s c o r e = 2 * \frac{P r e c i s i o n * R e c a l l}{P r e c i s i o n + R e c a l l}

(19)

Each ovarian subtype’s precision, recall, and F1-score were carefully assessed as part of our thorough analysis of the ovarian cancer subtype:

For the ovarian cancer subtype “Clear-Cell Carcinoma (CCC)” (index 0), the precision stands at 0.94, recall at 0.77, and F1-score at 0.76.

Regarding the ovarian cancer subtype “Endometrioid Carcinoma(EC)” (index 1), we observe a precision of 0.80, recall of 0.79, and F1-score of 0.90.

The precision, recall, and F1-score for the ovarian cancer subtype “High-Grade Serous Carcinoma (HGSC)” (index 2) are all 0.93.

For the ovarian cancer subtype “Low-Grade Serous Carcinoma (LGSC)” (index 3), the precision value is 0.81, recall stands at 0.83, and the score registers as 0.0.82.

Lastly, for the ovarian cancer subtype “Molecular Carcinoma (MC)” (index 4), the precision value is 0.92, recall stands at 0.78, and the score registers as 0.0.82.

Table 3 displays the precision, recall, and F1-score of the ovarian cancer subtype as a 20×-magnification image that we observed during the model testing phase.

5.4. Precision, Recall, and F1-Scores for 40× Images

Additionally, we assess the model’s precision, recall, F1-score, and support using a 40×-picture dataset. The precision, recall, and F1-score for the ovarian cancer subtype “Clear-Cell Carcinoma (CC)” (index 0) are 0.96, 0.88, and 0.92, respectively.

Regarding the ovarian cancer subtype “Endometrioid Carcinoma(EC)” (index 1), we observe a precision of 0.99, recall of 0.83, and F1-score of 0.90.

The precision, recall, and F1-score values for the ovarian cancer subtype HGSC (index 2) are 0.94, 0.99, and 0.97, respectively.

For the ovarian cancer subtype “Low-Grade Serous Carcinoma (LGSC)” (index 3), the precision value is 0.93, recall stands at 0.94, and the score registers as 0.93.

Lastly, for the ovarian cancer subtype “Molecular Carcinoma (MC)” (index 4), the precision value is 0.94, recall stands at 0.87, and the score registers as 0.91.

Table 4 displays the precision, recall, and F1-score of the ovarian cancer subtype with a 40× image we observed during the model testing phase.

5.5. Confusion Matrix of 20×- and 40×-Magnification Images

A crucial machine learning technique for evaluating the effectiveness of a classification model is a confusion matrix. It thoroughly describes the model’s classifications and how well they correspond to actual class labels. The matrix is typically organized as a table with rows and columns representing the classified and actual classes. The blue box represents the confusion class most likely to create confusion, while the confusion class least likely to cause confusion is represented by the green box.

There are four main parts to the confusion matrix:

1.: True Positives (TP): Instances that were appropriately rated as positive.
2.: True Negatives (TN): Instances that were accurately categorized as harmful.
3.: False Positives (FP): Instances that are falsely categorized as positive when they are negative.
4.: False Negatives (FN): Situations that are good when they should be regarded as negative.

Figure 11 displays the model’s confusion matrix of 20×- and 40×-magnification images. The confusion matrix helps calculate various metrics like precision, recall, F1-score, and accuracy. It is an essential tool to gain insights into a model’s strengths and weaknesses, particularly in identifying where the model might be making errors and which classes are more challenging to classify accurately.

5.6. Performance Ratings of Different Models That We Examined for 20×-Magnification Images

In our study, we investigated various deep learning and machine learning models to categorize subtypes of ovarian cancer using 20× high-resolution images. In particular, we have evaluated models like VGG16, VGG19, ResNet50, InceptionV3, ResNet101, InceptionResNetV2, and DenseNet121 to evaluate their performance. Our primary goal in this analysis was to assess their accuracy and validation accuracy measures. InceptionV3 had the lowest accuracy of the models we evaluated, whereas our suggested model, the Attention Embedder, had the best accuracy, highlighting its potential as a promising solution for this crucial task. Table 5 displays various models we implemented for classifying ovarian cancer subtypes and identifying outliers for 20×-magnification images and the proposed model.

5.7. Performance Ratings of Different Models That We Examined for 40× Images

We expanded our analysis to 40×-magnification images as part of our ongoing investigation into classifying ovarian cancer subtypes. In this step, we re-evaluated the same set of models that we had previously evaluated using 20×-magnification images—namely, VGG16, VGG19, ResNet50, ResNet101, InceptionResNetV2, InceptionV3, and DenseNet121. Notably, our findings were in agreement with the earlier findings. While our suggested model, the Attention Embedder, continued to shine with the highest accuracy, InceptionV3 displayed the lowest accuracy once more. This confirms that our suggested model effectively handles this vital task even when using higher-quality pictures. Table 6 displays various models, we implemented for classifying ovarian cancer subtypes and identifying outliers for 40×-magnification images, as well as the proposed model.

5.8. K-Fold Cross-Validation

This study used a rigorous k-fold cross-validation approach to assess the models constructed using 20× and 40× images comprehensively. Specifically, we employed a ten-fold cross-validation methodology to evaluate the performance of the models presented in Table 7.

The examination of our 20× images yielded a noteworthy finding in fold-9, where a high accuracy rate of 99.75% was seen. Additionally, the validation accuracy closely corresponded with this result, reaching 99.40%. In contrast, the fold-2 of this dataset had the least favorable performance, with an accuracy of 87.95%. However, the validation accuracy remained somewhat higher at 88.40%.

After analyzing our 40× photos, we found that fold-8 had an impressive accuracy rate of 99.26%. Additionally, this fold also had a vital validation accuracy of 98.60%. However, fold-2 had the least satisfactory performance among the datasets, with an accuracy rate of 84.11% and a corresponding validation accuracy of 81.59%.

The accuracy and validation accuracy metrics were used to measure the effectiveness of the models. Upon thorough examination of the outcomes, our rigorous research definitively established the lack of any instances of overfitting within the models.

5.9. Fine-Tuned Hyperparameter Implementation

Fine-tuning hyperparameters is an advanced and more effective technique for solving overfitting problems than the K-fold method. We implement the fine-tuned hyperparameter technique using “unit,” starting with a minimum range of 32 and a maximum range of 256. Each step increases by 32 units. The “Dropout” technique is set with a minimum value of 0.1 and a maximum value of 0.5, with each step increasing by 0.1. Additionally, we use the “Learning Rate” with a minimum range of 0.0001 and a maximum range of 0.01, sampled from a log function. Lastly, the “Dense” technique uses a minimum value of 64 and a maximum range of 256, with each step increasing by 32. Implementing the fine-tuned hyperparameter technique is a superior solution for overfitting problems, making our model more robust and reliable. Our model exhibits remarkable testing accuracies, measuring 93.56% at 20× magnification and 91.37% at 40×. These reliable and high-accuracy results show that our Attention Embedder model is resilient and does not overfit. It confirms our model’s robustness and emphasizes its adaptability and dependability at various scales. The best parameter for fine-tuned hyperparameter implementation for 20× and 40× images is displayed in Table 8.

5.10. Comparison of Existing and Proposed Model Results

Through our extensive investigation, we comprehensively compared various established models designed to classify subtypes of ovarian cancer and detect anomalies. Our ultimate objective in the competition for categorizing ovarian cancer subtypes and identifying outliers was to achieve precise and accurate results. We examined various innovative models designed expressly for classifying ovarian cancer subtypes, consulting a wealth of relevant literature. Each of these models provided a unique and different method for achieving ovarian cancer subtype classification and outlier detection, encompassing a variety of architectural styles and methodological approaches. In this regard, we propose a comparative analysis that contrasts the performance and results of our novel Attention Embedder model with those of existing methods for ovarian cancer diagnosis. This thorough analysis enables us to evaluate the advantages and disadvantages of our model in contrast with the existing ones, finally illuminating its suitability for tackling the complex problem of ovarian cancer subtype classification and outlier detection. Table 9 displays the many existing methods for categorizing ovarian cancer and the model and accuracy of the suggested system.

5.11. Classification of Ovarian Cancer Subtype Cells

We set out on a well-structured classification path for the subtypes of ovarian cancer as part of our classification strategy. Our rigorously trained Attention Embedder model, a crucial part of our effort to understand the complexities of ovarian cancer, is at the center of this procedure. Our dataset, consisting of images taken at two distinct resolutions—20× and 40×—is a data gold mine. Our efforts are built on this extensive dataset, which enables us to investigate and unearth crucial insights. We proceed with the analysis and use the power of both determinations to gather insightful findings that the study will guide. The classification of ovarian cancer subtypes, an objective with significant medical implications, is at the core of our mission. In addition, our model has an additional, equally important feature: outlier detection. By enabling us to recognize and report uncommon and potentially abnormal data points, this feature increases the overall reliability of our forecasts. Figure 12 shows how effective our model is. We are pleased to share the subtype classification for ovarian cancer here. Our algorithm successfully divided the data into four subtypes, fulfilling its promise of accuracy and precision. This success demonstrates the strength of our model and shows significant promise for developing ovarian cancer research and diagnostics. In Figure 12, we show the classification of the ovarian cancer cells of five subtypes.

6. Discussion

Ovarian carcinoma is the most deadly malignancy of the female reproductive system. The five most prevalent ovarian cancer subtypes are high-grade serous, clear-cell, endometrioid, low-grade serous, and mucinous. There are sporadic subtypes or outliers. Each has different cellular morphologies, aetiologies, molecular and genetic profiles, and clinical features. Subtype-specific treatment is becoming increasingly common, but data science may help identify subtypes. Pathologists must assess subgroups to diagnose ovarian cancer. This causes problems like observer differences and diagnostic reproducibility. Both well-developed and disadvantaged areas lack gynecologic cancer pathologists. Specialist pathologists are few in underdeveloped areas. Our team initiated a novel study to analyze Ovarian Cancer Subtype Classification and Outlier Detection. Our subtype categories included HGSC, CC, EC, LGSC, and MC. Using the Attention Embedder model, which combines Encoder and Decoder blocks with an Attention Layer, we correctly classified ovarian cancer subtypes and identified outliers. Our model construction and optimization outcomes are remarkable. After comprehensive testing, we found that the training accuracy was 96.42%, and the validation accuracy was 95.10% for 20×-magnification images and 93.45% and 94.90% for 40×-magnification images. Using our dataset, we thoroughly compared many Kaggle competition models, each using different techniques. The results provide essential insights into how various models perform on our particular dataset by showcasing a variety of models and their corresponding accuracies. The various model results and accuracy on our dataset are shown in Table 10.

Resolving the imbalance of data in our dataset has been crucial in reducing the occurrence of overfitting issues. We have established a reliable data augmentation method to address this problem. The introduction of variants into the current data aims to diversify the dataset and improve the balance between the various classes. In addition, we have included image cropping in our data augmentation approach. We can efficiently expand the quantity and diversity of the collection by cropping and modifying the photographs. This enhances the training data and creates a more balanced dataset, which results in a more reliable and broadly applicable model. In addition to image cropping, our attempts to rectify data imbalance through data augmentation have greatly enhanced the model’s capacity to manage overfitting problems, improving its overall performance and predictive powers. We have decided to use the Nadam optimizer to improve our system’s efficiency. The efficiency with which this optimizer can train deep learning models is well known. We have carefully chosen 0.0001 as the learning rate for the Nadam optimizer in our implementation. By ensuring that our system converges efficiently and avoiding problems like delayed convergence or overshooting, this learning rate selection aims to optimize the training process and eventually increase the accuracy and performance of the model. Our research encountered limitations due to our use of high-resolution images, necessitating substantial computational resources for more extensive experimentation, which unfortunately exceeded our available budget. We work for batch sizes of 8 and 16. Our model could perform better for gaps in computing resources when we raise the batch size. We utilized the 8 and 16 batch sizes. As a result of the Graphics Processing Unit limitations, we cannot expand the batch size in our system. This investigation concluded that batch size 16 produced the best outcomes. This decision was motivated by balancing model performance and computational efficiency while adhering to resource constraints to produce the best results. The accuracy of our Ovarian Cancer Subtype Classification and Outlier Detection method is spectacular. Our team effectively categorized and identified outliers among the five ovarian cancer subtypes. Our work shows the rising importance of linguistic variety and medical sector sensitivity in ovarian cancer subtype classification and outlier identification, as well as our commitment to leading deep learning research. Our imaginative approach to future research and relevant applications may excite the medical community.

7. Conclusions and Future Research

This study introduced the innovative Attention Embedder model for Ovarian Cancer Subtype Classification and Outlier Detection, offering notable implications for the medical field. In this research, our main objective was to identify the five subtypes of ovarian cancer—CC, EC, HGSC, LGSC, and MC. Our model performed exceptionally well using a varied dataset that included two image categories: TMA images at 40× magnification and WSI images at 20× magnification. For 20× images, it obtained an exceptional training accuracy rate of 96.42% with a validation accuracy of 95.10%. TMA (40×) images achieved an exceptional training accuracy rate of 93.45% with a validation accuracy of 94.90%. These outcomes highlight the dependability and resilience of our approach. Our method uses visual aids such as accuracy and loss graphs, which provide information about the training dynamics of the model. We also provide an extensive analysis of the models that we employed. More information about our model’s performance than just accuracy and loss measures is available thanks to a new confusion matrix. By automating the subtype classification of ovarian cancer and detecting outliers, our research substantially impacts the medical profession and effectively advances patient outcomes and medical procedures. In the future, the model’s structure will be changed for better performance and faster learning. We want to enhance our computing resources and increase the batch size to optimize the efficiency of our system. We aim to enhance the efficiency of our system by working with real-time ovarian cancer data.

Author Contributions

Methodology, S.M.N.N. and S.M.M.R.S.; Software, S.M.N.N. and S.M.M.R.S.; Validation, S.M.N.N., S.M.M.R.S., M.S. and S.A.; Formal analysis, S.M.M.R.S. and M.A.H.; Investigation, S.M.N.N., S.M.M.R.S. and M.A.H.; Resources, S.M.N.N. and S.M.M.R.S.; Data curation, S.M.M.R.S. and M.A.H.; Writing—original draft, S.M.N.N., S.M.M.R.S. and M.A.H.; Writing—review & editing, S.M.N.N., S.M.M.R.S., M.A.H., M.M.K., M.S. and S.A.; Visualization, S.M.N.N. and S.M.M.R.S.; Supervision, M.F.M. and M.M.K.; Project administration, M.S. and S.A.; Funding acquisition, M.S. and S.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research is funded by the Researchers Supporting Project Number (RSPD2024R1027), King Saud University, Riyadh, Saudi Arabia.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data available in a publicly accessible repository that does not issue doi’s publicly available datasets were analyzed in this study. This data can be found here: https://www.kaggle.com/competitions/UBC-OCEAN/data, accessed on 19 November 2023.

Acknowledgments

The authors extend their appreciation to King Saud University for funding this research through Researchers Supporting Project Number (RSPD2024R1027), King Saud University, Riyadh, Saudi Arabia.

Conflicts of Interest

There is no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
OC	Ovarian Cancer
ROC	Receiver Operating Characteristic
WSI	Whole-Slide Images
TMA	Tissue Microarray
HGSC	High-Grade Serous Carcinoma
EC	Endometrioid Carcinoma
LGSC	Low-Grade Serous Carcinoma
CC	Clear-Cell Carcinoma
MC	Mucinous Carcinoma
DRAS-MIL	Discriminative Region Active Sampling for Multiple Instance Learning
PheWAS	Phenome-Wide Association Research
OC-GRS	Ovarian Cancer genetic risk scores
AHSO	Adaptive Harmony Search Optimization
FFNN	FeedForward Neural Networks
EOCL	Emergency Operations Center
HGSOC	High-Grade Serous Ovarian Cancer
OS	Overall Survival
SPME	Solid-Phase Microextraction
ALDOA	Aldolase A
TP	True Positives
TN	True Negatives
FP	False Positives
FN	False Negatives
CCC	Clear-Cell Carcinoma
ROC	Receiver Operating Characteristic
AUC	Area Under Curve

References

Takahashi, A.; Hong, L.; Chefetz, I. How to win the ovarian cancer stem cell battle: Destroying the roots. Cancer Drug Resist. 2020, 3, 1021. [Google Scholar] [CrossRef]
American Cancer Society. Key Statistics for Ovarian Cancer. 2023. Available online: www.bbc.com/news/business-58769351 (accessed on 13 September 2023).
Yang, J.P.; Ullah, A.; Su, Y.N.; Otoo, A.; Adu-Gyamfi, E.A.; Feng, Q.; Wang, Y.X.; Wang, M.J.; Ding, Y.B. Glycyrrhizin ameliorates impaired glucose metabolism and ovarian dysfunction in a polycystic ovary syndrome mouse model. Biol. Reprod. 2023, 109, ioad048. [Google Scholar] [CrossRef]
Sung, H.; Ferlay, J.; Siegel, R.L.; Laversanne, M.; Soerjomataram, I.; Jemal, A.; Bray, F. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2021, 71, 209–249. [Google Scholar] [CrossRef]
Su, Y.N.; Wang, M.J.; Yang, J.P.; Wu, X.L.; Xia, M.; Bao, M.H.; Ding, Y.B.; Feng, Q.; Fu, L.J. Effects of Yulin Tong Bu formula on modulating gut microbiota and fecal metabolite interactions in mice with polycystic ovary syndrome. Front. Endocrinol. 2023, 14, 1122709. [Google Scholar] [CrossRef]
Jayson, G.C.; Kohn, E.C.; Kitchener, H.C.; Ledermann, J.A. Ovarian cancer. Lancet 2014, 384, 1376–1388. [Google Scholar] [CrossRef]
Hinchcliff, E.; Westin, S.N.; Herzog, T.J. State of the science: Contemporary front-line treatment of advanced ovarian cancer. Gynecol. Oncol. 2022, 166, 18–24. [Google Scholar] [CrossRef]
Cree, I.A.; White, V.A.; Indave, B.I.; Lokuhetty, D. Revising the WHO classification: Female genital tract tumours. Histopathology 2020, 76, 151–156. [Google Scholar] [CrossRef]
Prat, J.; FIGO Committee on Gynecologic Oncology. Staging classification for cancer of the ovary, fallopian tube, and peritoneum. Int. J. Gynecol. Obstet. 2014, 124, 1–5. [Google Scholar] [CrossRef] [PubMed]
Chen, S.; Chen, Y.; Yu, L.; Hu, X. Overexpression of SOCS4 inhibits proliferation and migration of cervical cancer cells by regulating JAK1/STAT3 signaling pathway. Eur. J. Gynaecol. Oncol. 2021, 42, 554–560. [Google Scholar]
Peres, L.C.; Cushing-Haugen, K.L.; Köbel, M.; Harris, H.R.; Berchuck, A.; Rossing, M.A.; Schildkraut, J.M.; Doherty, J.A. Invasive epithelial ovarian cancer survival by histotype and disease stage. JNCI J. Natl. Cancer Inst. 2019, 111, 60–68. [Google Scholar] [CrossRef] [PubMed]
Prat, J.; Mutch, D.G. Pathology of cancers of the female genital tract including molecular pathology. Int. J. Gynecol. Obstet. 2018, 143, 93–108. [Google Scholar] [CrossRef] [PubMed]
Prat, J.; D’Angelo, E.; Espinosa, I. Ovarian carcinomas: At least five different diseases with distinct histological features and molecular genetics. Hum. Pathol. 2018, 80, 11–27. [Google Scholar] [CrossRef] [PubMed]
Köbel, M.; Kalloger, S.E.; Boyd, N.; McKinney, S.; Mehl, E.; Palmer, C.; Leung, S.; Bowen, N.J.; Ionescu, D.N.; Rajput, A.; et al. Ovarian carcinoma subtypes are different diseases: Implications for biomarker studies. PLoS Med. 2008, 5, e232. [Google Scholar] [CrossRef] [PubMed]
Breen, J.; Allen, K.; Zucker, K.; Hall, G.; Orsi, N.M.; Ravikumar, N. Efficient subtyping of ovarian cancer histopathology whole slide images using active sampling in multiple instance learning. arXiv 2023, arXiv:2302.08867. [Google Scholar]
Mulugeta, A.; Lumsden, A.l.; Madakkatel, I.; Stacey, D.; Lee, S.H.; Maenpaa, J.; Oehler, M.; Hypponen, E. Phenome-wide association study of ovarian cancer identifies common comorbidities and reveals shared genetics with complex diseases and biomarkers. medRxiv 2023. [Google Scholar] [CrossRef]
Tang, Q.; Zhang, H.; Tang, R. Identification of two immune subtypes and four hub immune-related genes in ovarian cancer through multiple analysis. Medicine 2023, 102, e35246. [Google Scholar] [CrossRef]
Lu, S.; Liu, S.; Hou, P.; Yang, B.; Liu, M.; Yin, L.; Zheng, W. Soft Tissue Feature Tracking Based on DeepMatching Network. CMES Comput. Model. Eng. Sci. 2023, 136, 363–379. [Google Scholar]
Elhoseny, M.; Bian, G.B.; Lakshmanaprabu, S.; Shankar, K.; Singh, A.K.; Wu, W. Effective features to classify ovarian cancer data in internet of medical things. Comput. Netw. 2019, 159, 147–156. [Google Scholar] [CrossRef]
Marcišauskas, S.; Ulfenborg, B.; Kristjansdottir, B.; Waldemarson, S.; Sundfeldt, K. Univariate and classification analysis reveals potential diagnostic biomarkers for early stage ovarian cancer Type 1 and Type 2. J. Proteom. 2019, 196, 57–68. [Google Scholar] [CrossRef]
Wisztorski, M.; Aboulouard, S.; Roussel, L.; Duhamel, M.; Saudemont, P.; Cardon, T.; Narducci, F.; Robin, Y.M.; Lemaire, A.S.; Bertin, D.; et al. Fallopian tube lesions as potential precursors of early ovarian cancer: A comprehensive proteomic analysis. Cell Death Dis. 2023, 14, 644. [Google Scholar] [CrossRef]
Olkowicz, M.; Rosales-Solano, H.; Kulasingam, V.; Pawliszyn, J. SPME-LC/MS-based serum metabolomic phenotyping for distinguishing ovarian cancer histologic subtypes: A pilot study. Sci. Rep. 2021, 11, 22428. [Google Scholar] [CrossRef]
Singh, S.; Maurya, M.K.; Singh, N.P. STRAMPN: Histopathological image dataset for ovarian cancer detection incorporating AI-based methods. Multimed. Tools Appl. 2023, 1–22. [Google Scholar] [CrossRef]
UBC Ovarian Cancer Subtype Classification and Outlier Detection (UBC-OCEAN). Available online: https://www.kaggle.com/competitions/UBC-OCEAN/data (accessed on 19 November 2023).
Ahamad, M.M.; Aktar, S.; Uddin, M.J.; Rahman, T.; Alyami, S.A.; Al-Ashhab, S.; Akhdar, H.F.; Azad, A.; Moni, M.A. Early-Stage Detection of Ovarian Cancer Based on Clinical Data Using Machine Learning Approaches. J. Pers. Med. 2022, 12, 1211. [Google Scholar] [CrossRef]
Akazawa, M.; Hashimoto, K. Artificial intelligence in ovarian cancer diagnosis. Anticancer. Res. 2020, 40, 4795–4800. [Google Scholar] [CrossRef]
Grimley, P.M.; Liu, Z.; Darcy, K.M.; Hueman, M.T.; Wang, H.; Sheng, L.; Henson, D.E.; Chen, D. A prognostic system for epithelial ovarian carcinomas using machine learning. Acta Obstet. Gynecol. Scand. 2021, 100, 1511–1519. [Google Scholar] [CrossRef]
Juwono, F.H.; Wong, W.; Pek, H.T.; Sivakumar, S.; Acula, D.D. Ovarian cancer detection using optimized machine learning models with adaptive differential evolution. Biomed. Signal Process. Control 2022, 77, 103785. [Google Scholar] [CrossRef]
Hwangbo, S.; Kim, S.I.; Kim, J.H.; Eoh, K.J.; Lee, C.; Kim, Y.T.; Suh, D.S.; Park, T.; Song, Y.S. Development of machine learning models to predict platinum sensitivity of high-grade serous ovarian carcinoma. Cancers 2021, 13, 1875. [Google Scholar] [CrossRef]
Ma, J.; Yang, J.; Jin, Y.; Cheng, S.; Huang, S.; Zhang, N.; Wang, Y. Artificial intelligence based on blood biomarkers including CTCs predicts outcomes in epithelial ovarian cancer: A prospective study. OncoTargets Ther. 2021, 14, 3267–3280. [Google Scholar] [CrossRef]
Urase, Y.; Nishio, M.; Ueno, Y.; Kono, A.K.; Sofue, K.; Kanda, T.; Maeda, T.; Nogami, M.; Hori, M.; Murakami, T. Simulation study of low-dose sparse-sampling ct with deep learning-based reconstruction: Usefulness for evaluation of ovarian cancer metastasis. Appl. Sci. 2020, 10, 4446. [Google Scholar] [CrossRef]
Yue, Z.; Sun, C.; Chen, F.; Zhang, Y.; Xu, W.; Shabbir, S.; Zou, L.; Lu, W.; Wang, W.; Xie, Z.; et al. Machine learning-based LIBS spectrum analysis of human blood plasma allows ovarian cancer diagnosis. Biomed. Opt. Express 2021, 12, 2559–2574. [Google Scholar] [CrossRef]

Figure 1. WSI, bags, instances (patches), and scales are introduced briefly before we extract 128 × 128-pixel image patches from WSIs at different scales. In the framework of how the study problem is formulated, class labels for individual examples are not noticed, but class labels for groups of instances referred to as bags are. It is essential to highlight that every bag contains patches removed at different scales, making it easier to identify several locations of interest in differently scaled images.

Figure 2. Percentage of five types of ovarian cancer.

Figure 3. The box plot visually presents data for different cancer subtypes, providing insight into their distribution and highlighting the presence of outliers within the entire dataset. Outliers are data points that fall significantly outside the normal range of values and may warrant further investigation or consideration in the analysis.

Figure 4. In the case of ovarian cancer subtyping, where five different subtypes coexist, the scatter plot graphically illustrates the existence of outliers.

Figure 5. The high-resolution image cropping process, which shows how precise and detailed regions of interest are extracted from the original image to enable more thorough analysis and improved visual representation, is illustrated in this figure (20× and 40×).

Figure 6. The Attention Embedder approach, a novel approach to improve feature extraction and the collection of relevant information described in this paper, is illustrated in Figure 6. This method uses the attention process. This leads to improved model performance and a better understanding of complex data patterns.

Figure 7. The loss of training and validation accuracy and validity over different epochs for 20× WSI is presented in this figure.

Figure 8. ROC—Top four class AUC scores for performance evaluation of Attention Embedder.

Figure 9. This figure presents the training and validation accuracy and the validation loss at different epochs for a dataset of 40× tissue microarrays (TMAs). Visualized data track the model’s performance and convergence over the training process, providing valuable insight into its learning dynamics.

Figure 10. Attention Embedder’s performance is evaluated using ROC—top four class AUC scores.

Figure 11. The confusion matrix for image classification at 20× WSI and 40× TMA magnification is shown in this figure. To help assess classification accuracy and potential areas for improvement, these matrices provide a visual representation of the model’s performance in classifying images.

Figure 12. Classification of ovarian cancer subtype cell images (20× and 40×).

Table 1. The entire dataset divided into training, testing, and validation sets is summarized in this table.

Types of Image	Training	Validation	Testing	Total Images
20×	89,088	22,521	1,000	1,12,609
40×	13,281	3,570	1,000	17,851

Table 2. Hyperparameter variables are investigated to obtain the best network for each experimental model.

Hyperparameters	Optimization Space
Model	VGG16, VGG19, ResNet50, ResNet101, InceptionResNetV2, InceptionV3, DenseNet121
Batch Size	8, 16
Optimizer	SGD, Adam, RMSprop, Nadam, Adamax
Loss Functions	Categorical cross-entropy, Kullback–Leibler divergence
Kernel Size	3 × 3, 5 × 5
Learning Rate	0.1, 0.001, 0.0001, 0.00001, 0.002
Dropout	0.1, 0.2, 0.3, 0.5

Table 3. Five subtypes of ovarian cancer: implemented model’s precision, recall, and F1-scores showing for 20×-magnification images.

Types of Cancer	Precision	Recall	F1-Score
Clear-Cell	0.94	0.77	0.76
Endometrioid Carcinoma	0.80	0.79	0.90
High-Grade Serous Carcinoma	0.93	0.92	0.93
Low-Grade Serous Carcinoma	0.81	0.83	0.82
Molecular Carcinoma	0.92	0.78	0.82

Table 4. Five subtypes of ovarian cancer: implemented model’s precision, recall, and F1-scores showing for 40× images.

Types of Cancer	Precision	Recall	F1-Score
Clear-Cell	0.96	0.88	0.92
Endometrioid Carcinoma	0.99	0.83	0.90
High-Grade Serous Carcinoma	0.94	0.99	0.97
Low-Grade Serous Carcinoma	0.93	0.94	0.93
Molecular Carcinoma	0.94	0.87	0.91

Table 5. The table shows an overview of the results using different models on 20×-magnification photos. The tabulated data offer a thorough understanding of the performance of the models and their relative effectiveness in the study’s context.

Model/Classfire	Accuracy	Val_Accuracy
VGG16	81.76%	79.54%
VGG19	84.72%	80.29%
ResNet50	74.42%	69.56%
ResNet101	69.81%	62.93%
InceptionResNetV2	69.35%	70.69%
InceptionV3	68.88%	65.21%
DenseNet121	86.75%	79.35%
Attention Embedder (Proposed)	96.42%	95.10%

Table 6. An overview of the results employing several models on photographs with 40× TMA magnification is displayed in the table. The tabulated data provide a comprehensive picture of the performance and relative effectiveness of the models within the study setting.

Model/Classfire	Accuracy	Val_Accuracy
VGG16	82.81%	80.89%
VGG19	82.29%	77.85%
ResNet50	68.51%	63.59%
ResNet101	62.17%	65.76%
InceptionResNetV2	68.98%	69.69%
InceptionV3	72.44%	74.74%
DenseNet121	84.02%	79.50%
Attention Embedder (Proposed)	93.45%	94.90%

Table 7. Comparative performance evaluation using tissue microarray images (TMA) at 40× and whole-slide images (WSI) at 20× magnification using K-fold cross-validation.

Fold	20×_Accuracy	20×_Val_Accuracy	40×_Accuracy	40×_Val_Accuracy
Fold 1	90.33%	92.00%	92.00%	93.40%
Fold 2	87.95%	88.40%	84.11%	81.59%
Fold 3	96.28%	96.60%	97.48%	97.60%
Fold 4	97.64%	96.60%	98.24%	96.39%
Fold 5	93.11%	91.00%	90.66%	91.60%
Fold 6	98.97%	97.00%	95.39%	93.99%
Fold 7	99.31%	99.40%	82.93%	83.60%
Fold 8	99.00%	99.40%	99.26%	98.60%
Fold 9	99.75%	99.40%	88.91%	88.80%
Fold 10	93.13%	92.00%	90.97%	92.59%

Table 8. The table displays the ideal hyperparameters for 20× and 40× magnifications after hyperparameter tuning. It also presents the testing accuracy of the model.

20× Best Hyperparameters	40× Best Hyperparameters
Unit 1: 64	Unet 1: 224
Unit 2: 224	Unet 2: 64
Unit 3: 64	Unet 3: 32
Unit 4: 128	Unet 4: 256
Unit 5: 32	Unet 5: 192
Dense: 160	Dense: 128
Dropout: 0.4	Dropout: 0.5
Learning_rate: 0.000212916243290288	Learning_rate: 0.00015159396261453322
Testing Accuracy: 93.56%	Testing Accuracy: 91.37%

Table 9. Within the research study, the table provides insight into the performance differences and advances brought about by the new approach. It does this by providing a comparative analysis of the results between the existing model and the proposed model.

Ref.	Dataset Type	Model or Method	Accuracy or Classification
[25]	Ovarian Cancer	LGBM	91%
[26]	Ovarian Cancer	XGBoost	80%
[27]	Ovarian Carcinomas	EACCD	76.05%
[28]	Ovarian Cancer	K-Nearest	97.24%
[29]	Ovarian Cancer	Five-Fold Cross-Validation	AUC of 74.10%
[30]	Ovarian Cancer	Random Forest	79.60%, 80.09%
[31]	Ovarian Cancer	U-Net	95%
[32]	Blood Plasma	ML	71.4%, 86.5%
[Proposed]	Ovarian Cancer Subtype	Attention Embedder	96.42%

Table 10. An insightful comparison of Kaggle competition model outcomes tailored to our specific dataset.

Model	Accuracy
CNN	43.11%
VGG16	70.98%
MobileNet	43.36%
K-Fold	87.00%
EfficientNet	82.06%
Attention Embedder (Proposed)	96.42%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nobel, S.M.N.; Swapno, S.M.M.R.; Hossain, M.A.; Safran, M.; Alfarhood, S.; Kabir, M.M.; Mridha, M.F. RETRACTED: Modern Subtype Classification and Outlier Detection Using the Attention Embedder to Transform Ovarian Cancer Diagnosis. Tomography 2024, 10, 105-132. https://doi.org/10.3390/tomography10010010

AMA Style

Nobel SMN, Swapno SMMR, Hossain MA, Safran M, Alfarhood S, Kabir MM, Mridha MF. RETRACTED: Modern Subtype Classification and Outlier Detection Using the Attention Embedder to Transform Ovarian Cancer Diagnosis. Tomography. 2024; 10(1):105-132. https://doi.org/10.3390/tomography10010010

Chicago/Turabian Style

Nobel, S. M. Nuruzzaman, S M Masfequier Rahman Swapno, Md. Ashraful Hossain, Mejdl Safran, Sultan Alfarhood, Md. Mohsin Kabir, and M. F. Mridha. 2024. "RETRACTED: Modern Subtype Classification and Outlier Detection Using the Attention Embedder to Transform Ovarian Cancer Diagnosis" Tomography 10, no. 1: 105-132. https://doi.org/10.3390/tomography10010010

Article Menu

RETRACTED: Modern Subtype Classification and Outlier Detection Using the Attention Embedder to Transform Ovarian Cancer Diagnosis

Abstract

1. Introduction

2. Related Works

3. Dataset Analysis and Discussion

3.1. Outlier Expression of Five Subtypes

3.2. Data Preprocessing Processes

4. Research Methodology and Implementation

4.1. Execution of Attention Embedder Model

4.2. Encoder-Decoder Process for Attention Layer

4.3. Attention Layer Work for Model

4.4. Classification of Ovarian Cancer Subtypes

4.5. Model Training, Evaluation, and Selection

5. Result Analysis

5.1. Accuracy and Loss of Ovarian Cancer Subtype for 20×-Magnification Images

5.2. Accuracy and Loss of Ovarian Cancer Subtype for 40× Images

5.3. Precision, Recall, F1-Score, and Support for 20×-Magnification Images

5.4. Precision, Recall, and F1-Scores for 40× Images

5.5. Confusion Matrix of 20×- and 40×-Magnification Images

5.6. Performance Ratings of Different Models That We Examined for 20×-Magnification Images

5.7. Performance Ratings of Different Models That We Examined for 40× Images

5.8. K-Fold Cross-Validation

5.9. Fine-Tuned Hyperparameter Implementation

5.10. Comparison of Existing and Proposed Model Results

5.11. Classification of Ovarian Cancer Subtype Cells

6. Discussion

7. Conclusions and Future Research

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI