Next Article in Journal
Colposcopy Accuracy and Diagnostic Performance: A Quality Control and Quality Assurance Survey in Italian Tertiary-Level Teaching and Academic Institutions—The Italian Society of Colposcopy and Cervico-Vaginal Pathology (SICPCV)
Next Article in Special Issue
Clinical Research of Lupus Retinopathy: Quantitative Analysis of Retinal Vessels by Optical Coherence Tomography Angiography in Patients with Systemic Lupus Erythematosus
Previous Article in Journal
An Adaptive Early Stopping Technique for DenseNet169-Based Knee Osteoarthritis Detection Model
Previous Article in Special Issue
Deep Learning in Optical Coherence Tomography Angiography: Current Progress, Challenges, and Future Directions
 
 
Article
Peer-Review Record

Counteracting Data Bias and Class Imbalance—Towards a Useful and Reliable Retinal Disease Recognition System

Diagnostics 2023, 13(11), 1904; https://doi.org/10.3390/diagnostics13111904
by Adam R. Chłopowiec 1,†, Konrad Karanowski 1,*,†, Tomasz Skrzypczak 2, Mateusz Grzesiuk 1, Adrian B. Chłopowiec 1 and Martin Tabakov 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Diagnostics 2023, 13(11), 1904; https://doi.org/10.3390/diagnostics13111904
Submission received: 27 March 2023 / Revised: 22 May 2023 / Accepted: 25 May 2023 / Published: 29 May 2023
(This article belongs to the Special Issue Deep Learning Applications in Ophthalmology)

Round 1

Reviewer 1 Report

To ensure real life clinical environment and the problem of biased medical image data, the authors merged 22 publicly available datasets. And they utilize SOTA popular models, such as ConvNext, RegNet and ResNet, to recognize most of the examined eye diseases. This problem is promising, especially how to deal with the real-life clinical environment. 

 

(1) However, this article does not address the core issues that arise from the adoption of merging multiple datasets. Instead, the authors only directly focus on the problem of bias and class imbalance. The core problem should not be ignored. 

(2) But as the author stated in Page2 “the utilized data was less biased.” But the authors also say that they will solve the bias problem. This is not very clearly. I think the writing is confusing.

 

(3) Class imbalance is a routine problem in the field of medical image analysis. 

This paper does not show the specific problem for the scenario of merging multiple different datasets. 

 

(4) Especially for class imbalance, conventional solutions are reweighting and resampling etc. The author does not use reweighting and resampling methods here, but uses transfer learning methods. However, transfer learning is mainly used to solve small data problems. Here, the technical solution and rationality of how to use the transfer learning to solve the class imbalance problem need to be further elaborated.

 

(5) Another thing, I think the domain shift should been consider because of multiple different datasets with various imaging protocols, different scanners, different goals, and so on. Maybe the core problems originating from merging multiple datasets is necessary and could not be ignored completely. 

 

(6) The writing is not very clearly. For example, in the Introduction section, there is a lot of introductory knowledge, but less explanation of the core issues, resulting in unclear writing lines. According to the title, data bias and class imbalance should be emphasized. 

 

Overall, the problem is interesting and appealing. However, the model designs, experimental designs and solutions lack feasibility and innovation. 

Currently, this article is not suitable for acceptance. 

Author Response

Comments from Reviewer 1

  • Comment 1: However, this article does not address the core issues that arise from the adoption of merging multiple datasets. Instead, the authors only directly focus on the problem of bias and class imbalance. The core problem should not be ignored.


Response: Thank you for this comment. Core idea of our paper related directly to data is to address the problem of data bias by gathering a large and diverse dataset and handle a class imbalance problem that arises because of merging multiple datasets.  Could you please elaborate further what do You mean as a core problem?

 

The basic assumption of our research was to merge large and diverse datasets. This is exactly how we wanted to address the problem of biased data, as data obtained from various laboratories are biased (due to image acquisition hardware used, protocols etc.). By merging them, we minimize this bias and maximize the data generality, which is critical in practice.  This is the only way to receive objective and representative information about deep models’ classification results in terms of the pathologies considered.

The second issue is the problem of imbalanced data. The data merging itself does not solve this problem, as data imbalance is potentiated. With respect to the “core problems” of data merging could you please elaborate on what exactly do you mean?

 

 

  • Comment 2: But as the author stated in Page2 “the utilized data was less biased.” But the authors also say that they will solve the bias problem. This is not very clearly. I think the writing is confusing.


Response: Thank you for this comment. We agree that the statement “the utilized data was less biased” is not very clear. Gathering a large and diverse dataset is a procedure of data bias mitigation. The sentence mentioned was changed accordingly to better reflect it: “Utilized data mitigated problem of data bias...”. Additionally, a relevant reference was added.

 

  • Comment 3: Class imbalance is a routine problem in the field of medical image analysis. This paper does not show the specific problem for the scenario of merging multiple different datasets.


Response: Thank you for pointing this out. Merging multiple different datasets naturally potentiates class imbalance problem due to different quantities of images belonging to each class represented in each dataset which are shown in Table 2. The scenario represented in this paper includes a two-stage learning procedure, with data selection criteria described in section “Model training”.

 

  • Comment 4: Especially for class imbalance, conventional solutions are reweighting and resampling etc. The author does not use reweighting and resampling methods here, but uses transfer learning methods. However, transfer learning is mainly used to solve small data problems. Here, the technical solution and rationality of how to use the transfer learning to solve the class imbalance problem need to be further elaborated.


Response: Thank you for this comment. We adjusted description of our procedure in “Model Training” section to better reflect application of both transfer learning and two-stage learning procedures. We also extended explanation of two-stage learning procedure and a comparison to two-phase learning. A procedure named “two-stage learning” described in this paper is a similar technique to a two-phase learning described in [Johnson, J.M., Khoshgoftaar, T.M., 2019. Survey on deep learning with class imbalance. Journal of Big Data 6, 1–54.] which falls under category of resampling techniques. It differs in a way it uses data in both stages although it should still be considered a resampling technique. It combines ideas of thresholding and pre-training on excess data. Reweighting has also been used which is described in sections “Pre-Training” and “Fine tuning”: “we decided to use weighted cross entropy loss with weights 1 and 2...”. Please see lines 270-275.

 

  • Comment 5: Another thing, I think the domain shift should been consider because of multiple different datasets with various imaging protocols, different scanners, different goals, and so on. Maybe the core problems originating from merging multiple datasets is necessary and could not be ignored completely.


Response: Thank you for pointing this out. We agree that experiments researching domain shift problem would be interesting in this area although we decided to exclude them from scope of our paper. That is because a proper, satisfying, and thorough evaluation of domain shift would require a precise annotation of used cameras, scanners, ethnicities, and other metadata per image in our dataset, which unfortunately was unavailable. In our work we gathered datasets as they are, to mitigate bias stemming from various imaging protocols, different scanners and so on. This helps development of a usable and reliable retinal disease recognition system that would work well in various clinics. A research paper on a problem of domain shift would be a good topic for a new manuscript.

 

  • Comment 6: The writing is not very clearly. For example, in the Introduction section, there is a lot of introductory knowledge, but less explanation of the core issues, resulting in unclear writing lines. According to the title, data bias and class imbalance should be emphasized.


Response: We appreciate your comment, and we acknowledge that we did not give enough emphasis on the issues of data bias and class imbalance in our Introduction. To address this, we have included additional information on these topics. By merging multiple datasets, our models can be developed in a data-agnostic manner, which means that they are independent of the type of camera used or the ethnicity of the patient.

                                                                               

Comment 7: Overall, the problem is interesting and appealing. However, the model designs, experimental designs and solutions lack feasibility and innovation.

Response: Thank you for this comment. Two main innovations of this work leading towards useful and reliable retinal disease recognition system are:
1. Mitigating data bias by merging various datasets.
2. Solving a problem of class imbalance that potentiates because of the merging via transfer learning and two-stage learning.
Our work presents results of classification of the most distressing retinal diseases on basis of multiple publicly available datasets, without performing evaluation on private datasets gathered in controlled environment. We believe that this information would be important to the academic community.

 

Reviewer 2 Report

In this study, the authors aim to address the inaccuracies and bias in the currently available studies for automated classification and recognition of retinal diseases using deep learning based analysis of fundus photographs. 

They've merged 22 publicly available datasets (making it more reproducible) to create a giant dataset which is more real-life, generalisable and diverse. Also, data augmentation has been done to normalise and standardise images from multiple ethnicities captured using multiple diverse cameras by various retinal experts. Data imbalance in the form of class imbalance has been minimised by separation of pre training and fine tuning of data. By excluding cataract (non-retinal disease), diseases like myopia and hypertensive retinopathy, and rare diseases like retinitis pigmentosa, the authors have tried to focus only on the most distressing diseases like DR, AMD and glaucoma. After the implementation of all these steps in their proposed model, the authors claim improved reliability of the automated retinal disease recognition system.

However, the study has certain flaws/demerits:

1. As admitted by the authors themselves, there was no validation of the model by ophthalmologist, hence there was no comparison of results (accuracy, AUC, etc.) with manual classification, which is a standard method of evaluating the robustness of such a model.

2. The authors have utilised only the classic modality of fundus photographs in their model of automated retinal disease diagnosis.

However, several recent studies have shown that structural OCT, quantitative OCTA metrics when coupled with fundus photographs achieve much higher accuracy and AUC than manual and other established methods of diagnosis. Hence, OCT and OCTA have become critically important in the automated evaluation of DR, AMD and glaucoma.

To the extent that the application of automated analysis of OCTA of the retina has been extended to diagnosis of systemic diseases. Various neurological diseases like Alzheimers and multiple sclerosis and even cardiovascular diseases are now being diagnosed using automated analysis of retinal OCTA.

In addition, newer techniques of automated quantitative retinal imaging like differential artery vein (AV) analysis and OCTA flow speed mapping technology are evolving. Their utility needs to be explored with the aim of making automated retinal disease recognition more robust and accurate. 

Author Response

Comments from Reviewer 2

  • Comment 1: As admitted by the authors themselves, there was no validation of the model by ophthalmologist, hence there was no comparison of results (accuracy, AUC, etc.) with manual classification, which is a standard method of evaluating the robustness of such a model.

 

Response: Thank you for pointing this out. We agree with this comment. In our study only peer-review proved publicly available datasets were utilized in the model development. Accordingly, to dataset technical notes, we assumed that all images had validated disease classification. All of them could be assumed to be a ground truth. No new unvalidated images were included. This assumption is comparable to standard verification method. However, we agree that model validation by independent ophthalmologist would be helpful. We would incorporate your valuable comment in future research.

 

  • Comment 2: The authors have utilised only the classic modality of fundus photographs in their model of automated retinal disease diagnosis. However, several recent studies have shown that structural OCT, quantitative OCTA metrics when coupled with fundus photographs achieve much higher accuracy and AUC than manual and other established methods of diagnosis. Hence, OCT and OCTA have become critically important in the automated evaluation of DR, AMD and glaucoma. To the extent that the application of automated analysis of OCTA of the retina has been extended to diagnosis of systemic diseases. Various neurological diseases like Alzheimers and multiple sclerosis and even cardiovascular diseases are now being diagnosed using automated analysis of retinal OCTA. In addition, newer techniques of automated quantitative retinal imaging like differential artery vein (AV) analysis and OCTA flow speed mapping technology are evolving. Their utility needs to be explored with the aim of making automated retinal disease recognition more robust and accurate.

 

Response: We agree with this comment. Thank you for pointing this out. This study was developed only with public access databases. Public access databases with fundus photographs are more prevalent and have instant one-click access. Significantly less databases with retinal OCT images are published. Those datasets have less images and are not always easily accessible. (Khan, Saad M et al. The Lancet Digital Health, Volume 3, Issue 1, e51 - e66). We agree that several recent studies have shown that structural OCT, quantitative OCTA metrics when coupled with fundus photographs achieve much higher accuracy and AUC than manual and other established methods of diagnosis. However, this study aimed to develop model that really approximated real-life environment. To ensure this generalizability we gathered the highest possible number of images collected with the most prevalent retinal imaging modality. The OCT was excluded as significantly less images were available. Moreover, combining OCT and plain fundus images requires significantly different methodological approach, frameworks, and technological solutions. Although OCT and plain images coupling is potentially beneficial research topic, changes that should be done to include OCT in the current paper are too substantial. It is a good topic for a new manuscript.

 

Reviewer 3 Report

The authors present their results of collecting and merging 22 publicly available fundus datasets from around the world containing images of the most prevalent, negatively influencing the quality-of-life, diseases (age-related macular degeneration, glaucoma, and diabetic retinopathy), and images of heathy subjects. They addressed class imbalance problem and biased image data. ConvNextTiny achieved the best results (accuracy, specificity, sensitivity, F1-score, AUC) except ResNet50 for glaucoma detection. This cumulative dataset seems to be suitable as a very acceptable screening method. Excellent manuscript.

 

Line 187: 1 for normal eyes is okay, but what is the background to assign glaucoma to 0.9? less than DR and AMD?

Line 328: …Due to infinite possibilities….probably better: Due to almost infinite, …or: due to numerous…

Author Response

Comments from the Reviewer 3

  • Comment 1: Line 187: 1 for normal eyes is okay, but what is the background to assign glaucoma to 0.9? less than DR and AMD?

 

Response: Thank you for this comment. We choose the weights for each class through the experimentation process. The weights described in the manuscript turned out to be the best out of our searching space.

 

  • Comment 2: Line 328: …Due to infinite possibilities….probably better: Due to almost infinite, …or: due to numerous…

 

Response: Thank you for this comment. Mentioned sentence was corrected accordingly. “Due to almost infinite possibilities…”

Round 2

Reviewer 1 Report

First of all, thank you very much for seeing the revised version and I have carefully read the author's reply.

(1) But there is a problem that I feel a little disappointing.  The  current revised version is mainly to modify the order of citations, and I think this is not a suitable modified version. Perhaps a more appropriate modified version which addresses each of the reviewers' concerns is needed. In other words, in addition to the author's reply, every concern should be addressed in the main manuscript. 

(2) Second, it is recommended to state the limitations of the research in the Discussion section.  

(3) Third, the author still hasn't solved my confusion. Transfer learning and two-stage methods cannot solve the class imbalance problem. If the authors think their methods lie in the resampling. So, It is recommended to add experimental results compared with other class imbalance methods. 

(4) Fourth, in the Introduction section, class imbalance and bias are still not fully and detailed elaborated. And how to solve bias problem in your paper are also should be emphasized. 

(5) The descriptions of main contributions are should be clearly stated in the section of Introduction. 

I am so sorry. I still hold my idea that this is not a proper version for publication. 

Author Response

Comment 1: But there is a problem that I feel a little disappointing.  The current revised version is mainly to modify the order of citations, and I think this is not a suitable modified version. Perhaps a more appropriate modified version which addresses each of the reviewers' concerns is needed. In other words, in addition to the author's reply, every concern should be addressed in the main manuscript. 


Response: In the version after first revision, we have made several significant improvements to the manuscript. We have provided a more detailed explanation, in the Materials and Methods section, clarifying why merging multiple datasets mitigates data bias, particularly in the context of medical data. Moreover, we have taken reviewers' suggestion into account and expanded the Introduction section to include the impact, that collecting different datasets has on the experiments we have performed. In Model Training section we have elaborated on the type of resampling procedure we have applied in our experiments. Therefore, we have made an effort to address the concerns raised by the reviewers to the best of our ability, taking into consideration the limitations of our study design.

 

Comment 2: Second, it is recommended to state the limitations of the research in the Discussion section. 

Response: In the last paragraph of Discussion section, we state the following limitations:

  • The lack of overlapping diseases which could appear in clinical scenarios,
  • The lack of validation of performance in comparison to an ophthalmologist,
  • An inability to validate whether the datasets had a consistent image classification guideline,
  • Limited datasets access.

Therefore, to the best of our knowledge, limitations are well stated.

Comment 3: Third, the author still hasn't solved my confusion. Transfer learning and two-stage methods cannot solve the class imbalance problem. If the authors think their methods lie in the resampling. So, It is recommended to add experimental results compared with other class imbalance methods. 


Response: Thank you for this comment. It is indeed a valuable insight. We acknowledge that transfer learning, loss function weighting and two-stage methods aren’t the only possible solutions to the class imbalance problems, although they are indeed valuable tools in mitigating them. To further explore alternative solutions, we have conducted additional experiments comparing these methods with other techniques for handling class imbalance. We have incorporated relevant subsections in both the Materials and Methods - “Verification of Other Resampling Methods” - and the Results - “Comparison of Resampling Methods” - sections. We hope this new information will provide a more comprehensive perspective on the various methods and their effects in class imbalance problems.

 

Comment 4: Fourth, in the Introduction section, class imbalance and bias are still not fully and detailed elaborated. And how to solve bias problem in your paper are also should be emphasized. 

Response: Thank you for this comment. In the Introduction section we have explained several problems concerning data bias in published studies. We describe why it’s important to gather diverse database and reasons for data bias in different scenarios. We have mentioned class imbalance handling techniques and the motivations to use them, on which we elaborate in the following sections.

 

Comment 5: The descriptions of main contributions are should be clearly stated in the section of Introduction. I am so sorry. I still hold my idea that this is not a proper version for publication. 


Response: Thank you for this comment. We have described our main contributions in the last paragraph of Introduction section:

  1. Creating an image recognition model for retinal disease screening in ageing, developed countries.
  2. Mitigating data bias by merging various datasets.
  3. Solving the problem of class imbalance that potentiates because of the merging via transfer learning and two-stage learning.

Author Response File: Author Response.docx

Reviewer 2 Report

The authors have made an effort to incorporate methodology changes recommended by one of the reviewers. They accept that OCT and OCTA can be utilised to increase robustness of the study and agree to include it in their future research projects. 

Author Response

Dear Reviewer, 

Thank you for your valuable comments and time spent on our paper. It was pleasure to get your review. 

Best wishes

Authors

Round 3

Reviewer 1 Report

No

Back to TopTop