Next Article in Journal
GO-E-MON: A New Online Platform for Decentralized Cognitive Science
Previous Article in Journal
Fusion of Moment Invariant Method and Deep Learning Algorithm for COVID-19 Classification
 
 
Article
Peer-Review Record

Screening of Potential Indonesia Herbal Compounds Based on Multi-Label Classification for 2019 Coronavirus Disease

Big Data Cogn. Comput. 2021, 5(4), 75; https://doi.org/10.3390/bdcc5040075
by Aulia Fadli 1, Wisnu Ananta Kusuma 1,2,*, Annisa 1, Irmanida Batubara 2,3 and Rudi Heryanto 2,3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Big Data Cogn. Comput. 2021, 5(4), 75; https://doi.org/10.3390/bdcc5040075
Submission received: 7 November 2021 / Revised: 27 November 2021 / Accepted: 6 December 2021 / Published: 9 December 2021
(This article belongs to the Topic Machine and Deep Learning)

Round 1

Reviewer 1 Report

The article concerns applications of deep learning techniques in medicine-related problems of detecting the interactions of drug compounds with disease-related proteins.

My major concern about the present work is that although it deals with some sort of classification algorithms based on neural networks, the possible significance of the results lies in the field of medicine and, therefore, the work must be submitted to a journal related to medicine data.

More precise:

1. The authors start (already in the abstract) with a notion of a database fingerprint. However, there is not even a simple explanation of what it is. Do the authors assume, that the readers of the journal should be familiar with it?

2. Fig. 5 shows a notable difference of SAE-DNN and DNN models only for the case of MACCS fingerprint which the authors exclude a step later from the further consideration as one showing low performance in terms of metrics used. What is the significance of SAE-DNN versus DNN then?

3. My main objection relates to Table 7 and conclusions. Table 7 shows that, based on various fingerprints, the SAE-DNN model gives, in general, different results of proteins related to specific compounds. We can see that sometimes Circular and Daylight fingerprints give similar results. However, nothing is said on how this relates to reality? The model was able to predict something, but there is no confirmation that these predictions are any good for say treatment of the disease? Of course, such kinds of conclusions must be interpreted by specialists in medicine. But for now, I cannot see any significance of the results in terms of machine learning techniques either.

Conclusion: The presented work does not contain any significant scientific results neither in the field of big data, nor in the field of machine learning. It should be submitted to a journal related to medicine or medicine data. 

Author Response

Dear Reviewers

Thank you for the constructive comment to improve our manuscript. Here are our responds on your comments. Please see the detailed below. We are waiting for your positive responds.

Responses for reviewer comments

Reviewer 1:

 

No

Reviewer comment

Author responds

 

The article concerns applications of deep learning techniques in medicine-related problems of detecting the interactions of drug compounds with disease-related proteins.

My major concern about the present work is that although it deals with some sort of classification algorithms based on neural networks, the possible significance of the results lies in the field of medicine and, therefore, the work must be submitted to a journal related to medicine data

In our opinion, this paper is still relevant to this journal because the scope of this journal includes “Machine Learning and its Applications in Medicine, Biology, Industry, Manufacturing, Security, Education, etc.”

 

1

The authors start (already in the abstract) with a notion of a database fingerprint. However, there is not even a simple explanation of what it is. Do the authors assume, that the readers of the journal should be familiar with it?

We have added explanation regarding database fingerprint in data preprocessing subsection in the Materials and Method section, page 4.

The explanation is:

“The first stage of data pre-processing is the feature extraction on the compound. Feature extraction aims to form a representation of compounds’ chemical structure. One of the most commonly used feature extraction process to represent compounds’ chemical structure is molecular fingerprints. Molecular fingerprints simplicate the chemical information in compounds through binary vectors [12].

The feature extraction process used four fingerprints with two different types, substructure-based fingerprint (PubChem fingerprint and MACCS fingerprint) and topological fingerprint (daylight fingerprint and circular fingerprint). In a substructure-based fingerprint, an array is formed to represent the chemical substructure of a compound, with each substructure assigned to a specific location in the array. For each substructure that occurs in the compound, the position of the corresponding substructure in the fingerprint vector is 1; otherwise, the position of the substructure is 0 [28]. Topology-based fingerprints are formed by analyzing the number of molecular fragments that emerge from a specified path or radius of a molecule, then each path or radius is encrypted with a hash. The bit value in topology-based fingerprint array is 1 if there is a molecular fragment at a certain path length or radius; otherwise, it is 0 [29].”

 

2

Fig. 5 shows a notable difference of SAE-DNN and DNN models only for the case of MACCS fingerprint which the authors exclude a step later from the further consideration as one showing low performance in terms of metrics used. What is the significance of SAE-DNN versus DNN then?

We have added explanation regarding database fingerprint in data preprocessing subsection in the Materials and Method, page 3-4.

We have added explanation regarding the significance of using SAE-DNN vs DNN in Performance comparison between SAE-DNN and DNN only subsection in Results section, page 9-10.

The explanation is as follows:

“Even though SAE-DNN produces slightly better performance than DNN without pre-training, the use of SAE for DNN pre-training has several advantages, including preventing layer activation outputs from exploding or vanishing during the training of the DL technique [38] and helping DNN achieve better convergence and better generalization power [30]. One way to analyze the generalization performance of learning algorithms is the stability of its prediction performance [39]. Figure 6 shows the stand-ard deviation of SAE-DNN and DNN without pre-training process metrics.

In general, SAE-DNN produced a lower standard deviation for all metrics compared to DNN without pre-training. A lower standard deviation value indicates that the model's performance is more stable for each fold in the cross-validation process. These results imply that the SAE-DNN model has better generalization power than the DNN without pre-training.”

3

My main objection relates to Table 7 and conclusions. Table 7 shows that, based on various fingerprints, the SAE-DNN model gives, in general, different results of proteins related to specific compounds. We can see that sometimes Circular and Daylight fingerprints give similar results. However, nothing is said on how this relates to reality?

We have added explanation regarding similar results on circular and daylight fingerprints prediction results in Herbal Compound Prediction subsection in Results section, page 17-18.

The explanation is as follows:

“There are several similarities from the prediction results of the SAE-DNN model using daylight fingerprint and circular fingerprint. These results can occur because both fingerprints belong to topology-based fingerprints. The difference between these two fingerprints lies in creating array fingerprints where daylight builds an array of fingerprints based on a specified path of a molecule while circular builds an array based on the specified radius of a molecule [29].”

4

The model was able to predict something, but there is no confirmation that these predictions are any good for say treatment of the disease? Of course, such kinds of conclusions must be interpreted by specialists in medicine. But for now, I cannot see any significance of the results in terms of machine learning techniques either

We have added several references regarding similar compounds that emerged from the prediction results of the three SAE-DNN models (Circular, daylight, PubChem) and its effect on the Herbal Compound Prediction subsection in Results section, page 18.

The explanation is as follows:

“Regarding similar compounds that emerged from the prediction results of the three SAE-DNN models, these compounds were predicted to have interactions with COVID-19 proteins. This is supported by several literature studies showing that Hyperoside, Aloin, Rhamnetin, Laurotetanine, and Isoquercetin compounds have anti-inflammatory properties [40–45], which can help fight the hyper inflammation process in COVID-19 patients. For Garcimangosone d, its efficacy on COVID-19 or the inflammatory process in the body is not yet known. This compound is found in the Garcinia mangostana plant which is a plant commonly used to treat various disease such as inflammation and fever in several Asian countries [46].”

 

Reviewer 2 Report

CoronaVirus Infectious Disease 2019 (COVID-19) is a highly contagious infectious disease caused by Severe Acute Respiratory Syndrome CoronaVirus 2 (Sars-CoV-2). COVID-19 has been declared a pandemic since March 2020 by WHO. Drug discovery process is crucial to fight COVID-19, as well as the algorithms and methods useful to predict Drug-Target Interaction (DTI). In recent years, Deep Learning algorithms have been generally used as the main method to extract knowledge from heterogenous resources (e.g., medical images, clinical data, ontologies).

In this paper, author present a multilabel DTI prediction by using an algorithm based on Stack AutoEncoder Deep Neural Network (SAE-DNN). In my opinion, Introduction provides a sufficiently description of the issue on DTI prediction.

The algorithm built its model analyzing and extracting the related features of interest from three datasets: a protein dataset obtained from GeneCards, a drug-target dataset from SuperTarget and DrugBank, and an herbal compound dataset retrieved from HerbalDB. I suggest citing GeneCards, DrugBank, SuperTarget, and HerbalDB, as well as all sources mentioned in this and other sections. In my opinion, author could clarify the use of these datasets respect to other available in literature.

The DTI multilabel prediction presented by authors is based on three main steps: data preprocessing, SAE-DNN modeling, and post-processing. Data preprocessing consists of a feature extraction process also including a data-integration steps between mentioned sources. According to authors, the compound feature extraction process produces 881 attributes for the PubChem fingerprint, 1024 attributes for the daylight fingerprint, 166 attributes for the MACCS fingerprint and 1,024 attributes for the circular fingerprint.

Post-processing concerns the model evaluation. In this step the following metrics are used: accuracy, precision, recall, and f-measure.

Result section provides a good description about the method presented in this paper, but it should be improved providing a critical comparison between SAE-DNN and other methods related to neural-network to justify this choice. A comparison between the presented solution and other available in literature may be also interesting.

Furthremore, I suggest extending the introduction, for instance author might mention more information about machine-learning and deep-learning approaches, e.g., in https://doi.org/10.1145/3216122.3216127 a framework for the decomposition and features extraction is presented.

Author Response

Dear Reviewers

Thank you for the constructive comment to improve our manuscript. Here are our responds on your comments. Please see the detailed below. We are waiting for your positive responds.

Responses for reviewer comments

Reviewer 2:

No

Reviewer comment

Author responds

1

CoronaVirus Infectious Disease 2019 (COVID-19) is a highly contagious infectious disease caused by Severe Acute Respiratory Syndrome CoronaVirus 2 (Sars-CoV-2). COVID-19 has been declared a pandemic since March 2020 by WHO. Drug discovery process is crucial to fight COVID-19, as well as the algorithms and methods useful to predict Drug-Target Interaction (DTI). In recent years, Deep Learning algorithms have been generally used as the main method to extract knowledge from heterogenous resources (e.g., medical images, clinical data, ontologies)

Thank you for the positive comments that support the importance of this study

2

In this paper, author present a multilabel DTI prediction by using an algorithm based on Stack AutoEncoder Deep Neural Network (SAE-DNN). In my opinion, Introduction provides a sufficiently description of the issue on DTI prediction.

Thank you for the positive comments

3

I suggest citing GeneCards, DrugBank, SuperTarget, and HerbalDB, as well as all sources mentioned in this and other sections. In my opinion, author could clarify the use of these datasets respect to other available in literature.

We already added citation regarding GeneCards, DrugBank, SuperTarget and HerbalDB in Dataset subsection in Materials and Methods section, page 3.

4

Result section provides a good description about the method presented in this paper, but it should be improved providing a critical comparison between SAE-DNN and other methods related to neural-network to justify this choice. A comparison between the presented solution and other available in literature may be also interesting

We already several comparisons between the presented solution and other methods from literature in Comparison with other approaches from the literature subsection in Results section, page 11.

The explanation is as follows:

“From a methodological point of view, some recent studies regarding DTI prediction commonly used a binary classification approach. In our proposed method, DTI prediction is done using the multilabel classification approach and takes several advantages over using the binary classification approach. First, the proposed method does not require a process of balancing data between positive data and negative data to achieve fair results, whereas the existing binary classification approach needs to randomly sample the negative DTI in order to balance the data, such as in research [3], which can result in false-negative rates and bias in the model results [15]. Second, the proposed method does not require to include a feature extraction process on protein data which can decrease data dimensions and speed up the training process.

From a machine learning performance point of view, we compare the DTI prediction performance between SAE-DNN and other deep learning models implemented in research [40] and [41]. Although these studies used a binary approach to predict DTI, comparisons can be made by looking at the model's performance in predicting positive classes. In terms of DTI, only the positive class is considered validated information, while the negative class cannot be validated due to the lack of experimental data on drug-target pairs [42]. Therefore, the comparison is done using recall and f-measure metrics. SAE-DNN outperforms other deep learning such as standard artificial neural network (ANN) and deep belief network (DBN) method from Research [40] with the best f-measure of 0.89368 compared to standard ANN with f-measure of 0.88 and DBN with an f-measure of 0.885. SAE-DNN also outperforms the proposed ComboNet method[36] with the best recall of 0.918 compared to the ComboNet recall of 0.8.”

4

Furthremore, I suggest extending the introduction, for instance author might mention more information about machine-learning and deep-learning approaches, e.g., in https://doi.org/10.1145/3216122.3216127 a framework for the decomposition and features extraction is presented

Thank you for your suggestion to cite this paper (https://doi.org/10.1145/3216122.3216127), but in our opinion this paper is not relevant enough to our topic. Said paper related to feature extraction on image, while feature extraction process in our paper related to graph approach. However, we added several reference papers regarding machine learning and graph approaches in DTI prediction in Introduction section, page 2.

The added references are as follows:

“One of the newest approaches in predicting DTI in drug repurposing is the feature-based chemogenomics approach. The feature-based chemogenomics approach utilizes feature information of drug compounds and diseases proteins represented in a set of descriptors to predict compound-protein interactions [9]. Research [10] used graph approach to create DTI features by combining several information such as drug-drug similarity, drug-disease association, protein-disease association and protein-protein interaction to construct a heterogenous graph and captures the topological properties of each graph node. However, this method cannot predict the interaction of new drugs or targets. Another method that can be used to create DTI features is using protein descriptors for protein features and molecular fingerprint for compound features. Protein descriptors created protein features by analyzing its amino acid sequences [11] while molecular fingerprint simplifies chemical information in com-pounds by analyzing the molecular structure into a graph and representing it through binary vectors [12]. Research [13] proposed a method called DeepConv-DTI that ap-plies convolutional neural network (CNN) in predicting binary classification DTI using amino acid composition (AAC) as protein features and circular fingerprints as com-pound features by analyzing compound’s molecule as a graph. Choosing the right type of fingerprint to represent the features of the compound is important in the process of searching for potential drugs [14]”

 

 

 

Round 2

Reviewer 1 Report

In the current version, the authors addressed all my comments, and now the paper looks considerably better for a general reader. 

I have two minor comments for the current version:
1. When explaining how fingerprints are built, it would be nice to include, besides references to research papers, references to the technical documentation (probably, on the websites of the corresponding databases) of the algorithms that generate the mentioned fingerprints for certain chemical formulas.
2. A similar comment regarding conclusions and usefulness of the results: it may happen, that besides mentioned studies (I mean references in Table 7), there is some kind of open database that contains already collected knowledge on Herbal Compounds and their known interactions with proteins? If there is so, it could be a nice source for "checking the validity" of this kind of studies.

Author Response

Reviewer comment

Author responds

When explaining how fingerprints are built, it would be nice to include, besides references to research papers, references to the technical documentation (probably, on the websites of the corresponding databases) of the algorithms that generate the mentioned fingerprints for certain chemical formulas.

We have added references to the technical documentation of fingerprints used in this study in Dataset and Data Preprocessing subsection in Materials and Method, page 3-4, (Line 149-152 and 171-173)

 

The explanation is as follows:

“Feature extraction on the compound was carried out with four fingerprints with two different types: substructural fingerprint (PubChem fingerprint [28] and MACCS fingerprint (or referred as MDL keys [29]) and topological fingerprint (daylight finger-print [30] and circular fingerprint (ECFPs) [31]).”

 

References mentioned in the explanation refer to technical documentation of fingerprints used in this study

A similar comment regarding conclusions and usefulness of the results: it may happen, that besides mentioned studies (I mean references in Table 7), there is some kind of open database that contains already collected knowledge on Herbal Compounds and their known interactions with proteins? If there is so, it could be a nice source for "checking the validity" of this kind of studies.

There are some open databases that contains collected knowledge on compounds and their protein target, such as DrugBank and STITCH. However, these databases lack information regarding herbal compounds due to lack of experiments related to herbal compounds. Thus, checking the validity for the interaction of herbal compounds with predicted protein in this study or another study related to herbal compounds DTI remain difficult. We already added the explanation regarding this problem on the Herbal Compound Prediction subsection in Results section, page 18, line 539-547.

The explanation is as follows:

“Usually, some certain databases such as DrugBank and Stitch [59] (http://stitch.embl.de/) can be used to verify compound-protein interaction based on the databases collection of knowledge compounds and their known interactions with proteins. However, the herbal compounds from the prediction results of the three SAE-DNN models in this study have not been found to have interactions with predicted proteins in these databases, thus its compound-protein interactions still cannot be verified. This is due to the lack of experiments related to herbal compounds. Further research is needed to verify compound-protein interactions and determine the potential value of herbal compounds from SAE-DNN prediction results in this study.”

Reviewer 2 Report

Thanks to author for comments. In my opinion the paper can be approved. However, I suggest extending the background, discretionally. 

Author Response

Reviewer comment

Author responds

Thanks to author for comments. In my opinion the paper can be approved. However, I suggest extending the background, discretionally.

Thank you for your approval regarding our paper

Back to TopTop