Next Article in Journal
Modeling Strategies for Crude Oil-Induced Fouling in Heat Exchangers: A Review
Next Article in Special Issue
Gout Staging Diagnosis Method Based on Deep Reinforcement Learning
Previous Article in Journal
Synthesis of Aluminum Nitride Using Sodium Aluminate as Aluminum Source
 
 
Article
Peer-Review Record

GCCSwin-UNet: Global Context and Cross-Shaped Windows Vision Transformer Network for Polyp Segmentation

Processes 2023, 11(4), 1035; https://doi.org/10.3390/pr11041035
by Jianbo Zhu 1,2, Mingfeng Ge 2,*, Zhimin Chang 2 and Wenfei Dong 1,2
Reviewer 1:
Reviewer 2:
Reviewer 3: Anonymous
Processes 2023, 11(4), 1035; https://doi.org/10.3390/pr11041035
Submission received: 14 March 2023 / Revised: 21 March 2023 / Accepted: 24 March 2023 / Published: 29 March 2023

Round 1

Reviewer 1 Report

A great well written article.

Author Response

Special thanks to you for your good comments. We have revised the manuscript considering the comments of each reviewer and editor.

 

Reviewer 2 Report

The submitted manuscript addresses a current scientific problem. The proposed approach is attractive and may generate interest in the scientific community. Here are some comments on the manuscript:

The authors should specify the problem statement.

The Abstract, Introduction, and Related works should explicitly mention the image submitted as input, which is from a colonoscopy video. Other research methods that use medical images (CT, ultrasound, MRI, etc.) can also be used to identify the pathology under consideration, which may contradict this domain.

The IMRAD rules dictate that all research findings are detailed and illustrated in the results section. In the discussion section, the authors analyze their findings, compare them to related research on the article's subject, highlight the limitations of their conclusions, and propose future research directions. Thus, it is recommended that the comparison with other approaches be moved to the discussion section by restructuring Section 4.

The conclusion section should be expanded to include numerical results obtained in the study, limitations of the proposed method, and future research prospects.

Overall, the manuscript can be accepted with minor revisions based on the reviewer's feedback.

Author Response

  1. Other research methods that use medical images can also be used to identify the pathology under consideration, which may contradict this domain.

Response: By analysing previous studies and the drawbacks of traditional segmentation methods, we designed GCCSwin-UNet to address these problems. In Scetion 1, We have listed three shortcomings: 1) Traditional CNNs segmentation algorithms only use global feature information from the last encoder block, which can lead to loss of local feature information in the intermediate layers; 2) Traditional global self-attention mechanisms are complex to compute, while local self-attention mechanisms can limit feature information interaction and do not allow for integrated global and local computation. 3) Traditional segmentation algorithms focus only on the overall mass distribution of the lesion, ignoring edge and shadow trends, resulting in ambiguous segmentation results for effective diagnosis. We then designed GCM, CSwin Transformer and LPEM to address the above three issues in a targeted manner. Other medical imaging methods have not been specifically designed for polyps, similar to CT and MRI, which can also be used to identify polyps, but their accuracy and effectiveness are not as good as our method. We therefore consider that there is no contradiction in the fact that GCCSwin-UNet is the method used to study polyp segmentation.

 

 

  1. Thus, it is recommended that the comparison with other approaches be moved to the discussion section by restructuring Section 4.

Response: Considering the Reviewer's suggestion, we have restructured section 4 by moving the results of the methodological comparison of quantitative and qualitative analysis to the discussion section, and embellishes on the limitations of the model and future research directions.

These are the corresponding changes in Section Discussion: Quantitative analysis showed that the Transformer-based segmentation structure performs better than the traditional CNNs approach. In contrast to Swin-UNet and PNS-Net, which also uses the Transformer block as the backbone structure, the actual segmentation results for polyp lesion images are significantly better than them. Quantitative experimental results show that conventional methods suffer from poor segmentation and blurred and incomplete edge segmentation for large polyps, and blurred boundaries and poor shape prediction and error noise for small polyps. In contrast, the GCCSwin-UNet segmentation is more precise, covering the lesion area more comprehensively and with well-defined edges. In summary, the GCCSwin-UNet model has a stronger ability to interact with semantic information globally and over long distances, and is better than other methods for detail extraction, resulting in better segmentation results.

 

  1. The conclusion section should be expanded

Response: Considering the Reviewer's suggestion, we have modified the conclusion section to include numerical results obtained in the study and future research prospects.

These are the corresponding changes in Section 5: This paper presents a framework called GCCSwin-UNet for polyp segmentation based on Vision Transformer . Unlike traditional CNNs approaches, we incorporate the Transformer idea into the encoder-decoder structure and use the CSwin-Transformer block for representation learning, which not only enhances the information interaction between patches, but also reduces the computational complexity. The auxiliary modules GCM and LPEM are designed. GCM fuses multi-scale feature information at the encoder end to compensate for the loss of global information during downsampling and improve the accuracy of polyp localisation; LPEM acts directly on the channel dimension to focus on the target detail location during feature extraction and thus improve the edge segmentation of the polyp region. In the experimental section analysis, the quantitative and qualitative comparison experiments and the statistical tests of our method is conducted. The results show that our method achieves the best performance (Dice = 86.37, MIoU = 83.19 on Kvasir-SEG, Dice = 91.26, MIoU = 84.65 on CVC-ClinicDB) and that it is statistically significant(ρ>0.05). Finally, we visualize the segmentation results to demonstrate the effectiveness of our proposed method. We hope that this study will provide an inspiration for future clinical polyp segmentation researches and explore more and more powerful segmentation models.

 

Special thanks to you for your good comments. We have revised the manuscript considering the comments of each reviewer and editor

 

Reviewer 3 Report

Dear authors.

Thank you for the opportunity to review this paper and congratulations on your work.

The manuscript introduces a framework called GCCSwin-UNet which aims to up performs better than its previous ones in polyp segmentation.

The abstract is of adequate size, represents the manuscript and mentions the main results.  

The keywords give a clear lead about the research described in the paper.

Introduction and related work sections are very well explained, with references that help to introduce and understand the research topic, also related work section establishes the differences among previous ones and author´s proposal.

Methods section is very complete, with a good explanation of the steps and processes followed in the development of GCCSwin-UNet. 

The Results section presents the findings, accompanied by statistical analyses to demonstrate the performance of GCCSwin-UNet, and also a comparison with previous proposals using two databases. Here, it would be value if authors give a opinion about table 2, specifically in column ACC% and database Kvasir-SEG, due to PraNet give a higher value than GCCSwin-UNet, in this metric.

Discussion section is reasonable, shows some limitations (the use of realistic and valid clinical data) and gives a description about future work.

Conclusion section is a good resume about findings and contributions.

Congratulations on your work.

Best regards.

Author Response

  1. it would be value if authors give an opinion about table 2, specifically in column ACC% and database Kvasir-SEG, due to PraNet give a higher value than GCCSwin-UNet, in this metric.

Response: The quantitative results were performed under the same experimental data and parameter setting conditions, and in Kvasir-SEG the acc% of PraNet was indeed 0.2% higher than in GCCSwin-UNet, but both our Dice and MIoU were significantly higher than PraNet (2.57% and 1.99% respectively). The segmentation model will focus more on Dice and MIoU, which are two metrics that better highlight the model's lesion segmentation recognition performance.

 

Special thanks to you for your good comments. We have revised the manuscript considering the comments of each reviewer and editor

 

Back to TopTop