Next Article in Journal
A Real-Time Computer Vision Based Approach to Detection and Classification of Traffic Incidents
Next Article in Special Issue
Toward Morphologic Atlasing of the Human Whole Brain at the Nanoscale
Previous Article in Journal
X-Wines: A Wine Dataset for Recommender Systems and Machine Learning
 
 
Article
Peer-Review Record

An Automatic Generation of Heterogeneous Knowledge Graph for Global Disease Support: A Demonstration of a Cancer Use Case

Big Data Cogn. Comput. 2023, 7(1), 21; https://doi.org/10.3390/bdcc7010021
by Noura Maghawry 1,*, Samy Ghoniemy 1,*, Eman Shaaban 2 and Karim Emara 2
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3:
Big Data Cogn. Comput. 2023, 7(1), 21; https://doi.org/10.3390/bdcc7010021
Submission received: 2 December 2022 / Revised: 17 January 2023 / Accepted: 18 January 2023 / Published: 24 January 2023
(This article belongs to the Special Issue Big Data System for Global Health)

Round 1

Reviewer 1 Report

This paper proposes a framework for automatic knowledge graph generation integrating two standardized medical ontologies- Human Disease Ontology (DO) and Symptom Ontology (SYMP) using a medical online website and encyclopedia. The framework and methodologies adopted for automatically generating this knowledge graph integrated two standardized ontologies, human disease and symptom ontologies, that were not integrated till now. A subgraph for cancer terms is extracted where a knowledge graph of cancer is created representing cancer diseases, their symptoms, prevention factors and risk factors. Overall, this paper is not well-written and should be carefully revised. Major issues are:

1. Some figures are lack clarity, very vague and hard to follow. 

2. The introduction section is in a mess and should be reorganized to highlight the motivation and contributions.

3. The experimental analysis is difficult to understand. A more detailed analysis should be introduced. 

Author Response

  1. Some figures are lack clarity, very vague and hard to follow. 
  • The figures are updated with higher resolution.
  1. The introduction section is in a mess and should be reorganized to highlight the motivation and contributions.
  • The introduction section are re-organized, motivation and contributions should be clearer now. The whole paper has been proof read.
  1. The experimental analysis is difficult to understand. A more detailed analysis should be introduced. 
  • This part has been modified.

Reviewer 2 Report

The authors present a novel approach to link diseases with symptoms using UMLS ontology. I have several questions:

1. Why Mayo website is chosen? It is not motivated in the paper and there are other sources to mine these interlinks, e.g. PubMed.

2. The authors use accuracy as a metric to evaluate their model. It seems to be not enough, since there are important errors of types I&II. I suggest to use classic measures of Precision, Recall, and F1.

3. The topic seems to be pretty well studied, e.g. [1-4]. It would be nice to have a comparison with some of the already presented methodologies. 

In closing, I would like to say that the problem raised is interesting, but for now the paper lacks adequate evaluation and comparison.

References:

1. Okumura, T., & Tateisi, Y. (2012, April). A lightweight approach for extracting disease-symptom relation with metamap toward automated generation of disease knowledge base. In International Conference on Health Information Science (pp. 164-172). Springer, Berlin, Heidelberg.

2. Ruan, T., Wang, M., Sun, J., Wang, T., Zeng, L., Yin, Y., & Gao, J. (2016, December). An automatic approach for constructing a knowledge base of symptoms in Chinese. In 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (pp. 1657-1662). IEEE.

3. Oberkampf, H., Gojayev, T., Zillner, S., Zühlke, D., Auer, S., & Hammon, M. (2015, May). From Symptoms to Diseases–Creating the Missing Link. In European Semantic Web Conference (pp. 652-667). Springer, Cham.

4. Hassan, M., Makkaoui, O., Coulet, A., & Toussaint, Y. (2015, July). Extracting disease-symptom relationships by learning syntactic patterns from dependency graphs. In BioNLP 15 (p. 184).

Author Response

  1. Why Mayo website is chosen? It is not motivated in the paper and there are other sources to mine these interlinks, e.g. PubMed.
  • The motivation to use MayoClinic website is now added in “Entity Linking and Integration to Standardized Ontologies” subsection.
  1. The authors use accuracy as a metric to evaluate their model. It seems to be not enough, since there are important errors of types I&II. I suggest to use classic measures of Precision, Recall, and F1.
  • A table showing the precision, recall and F1 score values using different methodologies is now added in the Experimental Results, evaluation and discussion section. Further discussions are also added to indicate the strengths of the generated knowledge graph.
  1. The topic seems to be pretty well studied, e.g. [1-4]. It would be nice to have a comparison with some of the already presented methodologies. 
  • A table is added now to show the comparison of different studies in the Related work section, and we discussed how our framework differs from them in the related work section and in the results and discussion section.

Reviewer 3 Report

The content of the article is well described in both scientific and technical terms. My suggestions to improve the article are as follows: 

1- The advantages and disadvantages of previous models/frameworks could be tabulated in the related work section to highlight the importance of the proposed framework.

2- Please enrich the results and discussion section and highlight the results of your work compared to previous studies.

Author Response

1) The advantages and disadvantages of previous models/frameworks could be tabulated in the related work section to highlight the importance of the proposed framework.

  • A table is added now to show the comparison of different studies in the Related work section, and we discussed how our framework differs than them in the related work section and in results and discussion section.

2) Please enrich the results and discussion section and highlight the results of your work compared to previous studies.

  • A table showing the precision, recall and F1 score values using different methodologies is now added in the Results and Discussion section. Further discussions are also added to indicate the strengths of the generated knowledge graph.

Round 2

Reviewer 1 Report

I have read the response. The revised version has addressed most of my issues. 

Author Response

  • Section 5 is now added that discusses the comparison between the other approaches generating Knowledge graphs. The comparison is based on 13 dimensions.
  • The conclusion and Future work section has been updated.
  • More relevant references have been added and used in the introduction and background section to support the research.

Reviewer 2 Report

The revised version of the paper includes a table with comparison of existing approaches in related work section and a bit more information in the results section. But unfortunately it is not enough.

1) There is still no comparison with other approaches of knowledge graph linkage.

2) The authors report results on Pattern Matcher, NER, and BioSentVec in comparison. But as far as I understand from the method description, all of these models are the parts of the framework itself. Thus section 4 could be interpreted as ablation study (although there are possible improvements in it), and the main results are missing. 

3) By main results I mean the quality of overall framework in comparison to other approaches.

Author Response

1) There is still no comparison with other approaches of knowledge graph linkage.

Table 1, Table 4, and Table 5 discuss comparisons with other research work and focused on whether the research generated a  knowledge graph or not and whether the nodes are linked to the standardized ontologies under study. The methodologies used by each study and the datasets used are also stated in table1. Statements with relevant references stating why we were encouraged to use BERT-based model for entity linking are added in the “Entity Linking and Integration” subsection. More level of detail would need a separate research paper, that we are planning to write, as this will need comparison on different levels starting from data collection and data sets used for KG linking till approaches prior to deep learning and after using deep learning-based approaches.

2) The authors report results on Pattern Matcher, NER, and BioSentVec in comparison. But as far as I understand from the method description, all of these models are the parts of the framework itself. Thus section 4 could be interpreted as ablation study (although there are possible improvements in it), and the main results are missing. 

Section 4 has been updated focusing on the experimental results, and section 5 is added for the overall evaluation of the main results.

3) By main results I mean the quality of overall framework in comparison to other approaches.

Section 5 is now added that discusses the comparison between the other approaches generating Knowledge graphs. The comparison is based on 13 dimensions.

Round 3

Reviewer 2 Report

The authors added more visualizations of their approach in Section 3 and added quantitative comparison with other approaches in Section 5.

I think that visualizations are helpful for understanding of the approach itself. 

I am not satisfied with quantitative comparison. At least the overall system quality (not the quality of different modules) should be presented. If the comparison with existing approaches is impossible, then the authors could compare with their system variants, since they are already presented in Section 4.

Author Response

  • I am not satisfied with quantitative comparison. At least the overall system quality (not the quality of different modules) should be presented. If the comparison with existing approaches is impossible, then the authors could compare with their system variants, since they are already presented in Section 4.

Further experiments results are added to show the evaluation of the percentages of diseases and symptoms interlinked from the overall linked knowledge graph based on dictionary-based models (Phrase-Matcher) versus BERT models with different thresholds, and the BILSTM with CRF-based models with different thresholds.

Figures 12 ad 13 are added for illustrating these results.

Back to TopTop