ijms-logo

Journal Browser

Journal Browser

Chemoinformatics and Bioinformatics Tools in Structure-Activity Modelling in Molecular Sciences 2.0

A special issue of International Journal of Molecular Sciences (ISSN 1422-0067). This special issue belongs to the section "Molecular Informatics".

Deadline for manuscript submissions: 31 July 2024 | Viewed by 13655

Special Issue Editor


E-Mail Website
Guest Editor
NMR Centre, The Ruđer Bošković Institute, 10000 Zagreb, Croatia
Interests: chemoinformatics; structural bioinformatics; structure–activity modeling; QSAR; QSPR; molecular modeling; computational chemistry; molecular structural biophysics; development of model validation algorithms; variable selection algorithms; classification modeling; chance accuracy estimation; development of accuracy parameters; computational research in bioprospecting research; protein structure analysis and prediction
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

At a time of universal digitization of data in various fields of research, including molecular sciences, there are more and more studies modeling continuous or classification endpoints (activities/properties) of molecules. In doing so, endpoints of molecules are most often classified (digitized) into two classes—active or inactive, and the classification is often carried out by grouping data into three or more classes.

Quantitative structure–activity/property relationships (QSAR/QSPR) are the most common, but not the only, forms of structure–endpoint models in molecular sciences. The accuracy of models is expressed by validation procedures, and many quality parameters are defined in the OECD document related to regulatory structure–activity models for the purpose of health and environmental protection [1]. In this document, the accuracy parameters of classification models are very sparsely presented. However, numerous accuracy parameters are used today, and those used for classification models are calculated from the confusion matrix elements [2]. There is also an increased need for a better definition of procedures for validation of regulatory structure–activity models in the OECD document [1]. Their application in environmental and health protection (toxicity, bioavailability, sorption, biodegradability, etc.) has been defined by EU REACH regulations [3].

The development of structure–activity modeling of different types of endpoints of molecules (usually various types of biological activities) is accelerated using chemoinformatics and bioinformatics tools, servers, algorithms, and databases developed for small molecules and proteins.

The research activities in the development of novel chemoinformatics and bioinformatics tools are particularly important topics for this Special Issue, such as the development of:

  • Valuable databases, servers, and data mining tools;
  • Drug or lead structure identification or dereplication approaches used in bioprospecting research;
  • Structure optimization tools;
  • Molecular descriptors;
  • Modeling and variable selection algorithms;
  • Computational model validation methods;
  • Multivariate linear and nonlinear methods;
  • Machine learning and deep learning algorithms;
  • Predictive or descriptive structure–activity models;
  • Different visualization tools;
  • Protein–ligand (target/small compound) interactions;
  • Protein-protein interactions;
  • Molecular docking, etc.

All these topics are of the highest importance for structure–activity modeling in molecular sciences.

This Special Issue aims to collect relevant contributions (papers) belonging to one or more of the topics listed above (and those related to them), which are important for the acceleration of structure–activity research in molecular sciences. Applications aimed at modeling a broad spectrum of chemical, biological, pharmaceutical, biochemical, and environmentally relevant activities and properties of molecules are also appreciated.

All forms of scientific articles covering mentioned or related topics are welcomed, i.e., original papers, reviews, and communications. 

[1] Guidance Document on the Validation of (Quantitative) Structure-Activity Relationship [(Q)SAR] Models, [https://www.oecd.org/env/guidance-document-on-the-validation-of-quantitative-structure-activity-relationship-q-sar-models-9789264085442-en.htm]

[2] D. M. W. Powers, Evaluation: from precision, recall and f-measure to roc, informedness, markedness & correlation, J. Machine Learning Techn., 2011, 2, 37-63

[3] Regulation (EC) No 1907/2006: REACH - Registration, Evaluation, Authorisation and Restriction of Chemicals. [http://ec.europa.eu/enterprise/sectors/chemicals/reach/index_en.htm].

Dr. Bono Lučić
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. International Journal of Molecular Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. There is an Article Processing Charge (APC) for publication in this open access journal. For details about the APC please see here. Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • chemoinformatics tools
  • bioinformatics tools
  • structure–activity modeling
  • structure–property modeling
  • QSAR
  • QSPR
  • drug/structure identification in bioprospecting research
  • molecular docking
  • molecular interactions
  • protein-protein interactions
  • development of algorithms
  • databases and web servers
  • data mining
  • structure representation and optimization
  • molecular descriptors
  • modelling of health and environmentally relevant endpoints/activities/properties
  • toxicity
  • carcinogenicity
  • computational methods
  • model validation approaches
  • multivariate modeling
  • predictive modeling
  • descriptive modeling
  • classification modeling
  • machine learning
  • deep learning
  • structure visualization

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

11 pages, 461 KiB  
Article
Merging Counter-Propagation and Back-Propagation Algorithms: Overcoming the Limitations of Counter-Propagation Neural Network Models
by Viktor Drgan, Katja Venko, Janja Sluga and Marjana Novič
Int. J. Mol. Sci. 2024, 25(8), 4156; https://doi.org/10.3390/ijms25084156 - 09 Apr 2024
Viewed by 271
Abstract
Artificial neural networks (ANNs) are nowadays applied as the most efficient methods in the majority of machine learning approaches, including data-driven modeling for assessment of the toxicity of chemicals. We developed a combined neural network methodology that can be used in the scope [...] Read more.
Artificial neural networks (ANNs) are nowadays applied as the most efficient methods in the majority of machine learning approaches, including data-driven modeling for assessment of the toxicity of chemicals. We developed a combined neural network methodology that can be used in the scope of new approach methodologies (NAMs) assessing chemical or drug toxicity. Here, we present QSAR models for predicting the physical and biochemical properties of molecules of three different datasets: aqueous solubility, acute fish toxicity toward fat head minnow, and bio-concentration factors. A novel neural network modeling method is developed by combining two neural network algorithms, namely, the counter-propagation modeling strategy (CP-ANN) with the back-propagation-of-errors algorithm (BPE-ANN). The advantage is a short training time, robustness, and good interpretability through the initial CP-ANN part, while the extension with BPE-ANN improves the precision of predictions in the range between minimal and maximal property values of the training data, regardless of the number of neurons in both neural networks, either CP-ANN or BPE-ANN. Full article
Show Figures

Figure 1

28 pages, 6051 KiB  
Article
The Impact of Data on Structure-Based Binding Affinity Predictions Using Deep Neural Networks
by Pierre-Yves Libouban, Samia Aci-Sèche, Jose Carlos Gómez-Tamayo, Gary Tresadern and Pascal Bonnet
Int. J. Mol. Sci. 2023, 24(22), 16120; https://doi.org/10.3390/ijms242216120 - 09 Nov 2023
Cited by 1 | Viewed by 1413
Abstract
Artificial intelligence (AI) has gained significant traction in the field of drug discovery, with deep learning (DL) algorithms playing a crucial role in predicting protein–ligand binding affinities. Despite advancements in neural network architectures, system representation, and training techniques, the performance of DL affinity [...] Read more.
Artificial intelligence (AI) has gained significant traction in the field of drug discovery, with deep learning (DL) algorithms playing a crucial role in predicting protein–ligand binding affinities. Despite advancements in neural network architectures, system representation, and training techniques, the performance of DL affinity prediction has reached a plateau, prompting the question of whether it is truly solved or if the current performance is overly optimistic and reliant on biased, easily predictable data. Like other DL-related problems, this issue seems to stem from the training and test sets used when building the models. In this work, we investigate the impact of several parameters related to the input data on the performance of neural network affinity prediction models. Notably, we identify the size of the binding pocket as a critical factor influencing the performance of our statistical models; furthermore, it is more important to train a model with as much data as possible than to restrict the training to only high-quality datasets. Finally, we also confirm the bias in the typically used current test sets. Therefore, several types of evaluation and benchmarking are required to understand models’ decision-making processes and accurately compare the performance of models. Full article
Show Figures

Figure 1

16 pages, 1330 KiB  
Article
Utilization of Supervised Machine Learning to Understand Kinase Inhibitor Toxophore Profiles
by Andrew A. Bieberich and Christopher R. M. Asquith
Int. J. Mol. Sci. 2023, 24(6), 5088; https://doi.org/10.3390/ijms24065088 - 07 Mar 2023
Cited by 1 | Viewed by 1554
Abstract
There have been more than 70 FDA-approved drugs to target the ATP binding site of kinases, mainly in the field of oncology. These compounds are usually developed to target specific kinases, but in practice, most of these drugs are multi-kinase inhibitors that leverage [...] Read more.
There have been more than 70 FDA-approved drugs to target the ATP binding site of kinases, mainly in the field of oncology. These compounds are usually developed to target specific kinases, but in practice, most of these drugs are multi-kinase inhibitors that leverage the conserved nature of the ATP pocket across multiple kinases to increase their clinical efficacy. To utilize kinase inhibitors in targeted therapy and outside of oncology, a narrower kinome profile and an understanding of the toxicity profile is imperative. This is essential when considering treating chronic diseases with kinase targets, including neurodegeneration and inflammation. This will require the exploration of inhibitor chemical space and an in-depth understanding of off-target interactions. We have developed an early pipeline toxicity screening platform that uses supervised machine learning (ML) to classify test compounds’ cell stress phenotypes relative to a training set of on-market and withdrawn drugs. Here, we apply it to better understand the toxophores of some literature kinase inhibitor scaffolds, looking specifically at a series of 4-anilinoquinoline and 4-anilinoquinazoline model libraries. Full article
Show Figures

Scheme 1

16 pages, 2323 KiB  
Article
MSEDDI: Multi-Scale Embedding for Predicting Drug—Drug Interaction Events
by Liyi Yu, Zhaochun Xu, Meiling Cheng, Weizhong Lin, Wangren Qiu and Xuan Xiao
Int. J. Mol. Sci. 2023, 24(5), 4500; https://doi.org/10.3390/ijms24054500 - 24 Feb 2023
Cited by 2 | Viewed by 1657
Abstract
A norm in modern medicine is to prescribe polypharmacy to treat disease. The core concern with the co-administration of drugs is that it may produce adverse drug—drug interaction (DDI), which can cause unexpected bodily injury. Therefore, it is essential to identify potential DDI. [...] Read more.
A norm in modern medicine is to prescribe polypharmacy to treat disease. The core concern with the co-administration of drugs is that it may produce adverse drug—drug interaction (DDI), which can cause unexpected bodily injury. Therefore, it is essential to identify potential DDI. Most existing methods in silico only judge whether two drugs interact, ignoring the importance of interaction events to study the mechanism implied in combination drugs. In this work, we propose a deep learning framework named MSEDDI that comprehensively considers multi-scale embedding representations of the drug for predicting drug—drug interaction events. In MSEDDI, we design three-channel networks to process biomedical network-based knowledge graph embedding, SMILES sequence-based notation embedding, and molecular graph-based chemical structure embedding, respectively. Finally, we fuse three heterogeneous features from channel outputs through a self-attention mechanism and feed them to the linear layer predictor. In the experimental section, we evaluate the performance of all methods on two different prediction tasks on two datasets. The results show that MSEDDI outperforms other state-of-the-art baselines. Moreover, we also reveal the stable performance of our model in a broader sample set via case studies. Full article
Show Figures

Figure 1

15 pages, 1614 KiB  
Article
Toward Quantitative Models in Safety Assessment: A Case Study to Show Impact of Dose–Response Inference on hERG Inhibition Models
by Fjodor Melnikov, Lennart T. Anger and Catrin Hasselgren
Int. J. Mol. Sci. 2023, 24(1), 635; https://doi.org/10.3390/ijms24010635 - 30 Dec 2022
Cited by 1 | Viewed by 1445
Abstract
Due to challenges with historical data and the diversity of assay formats, in silico models for safety-related endpoints are often based on discretized data instead of the data on a natural continuous scale. Models for discretized endpoints have limitations in usage and interpretation [...] Read more.
Due to challenges with historical data and the diversity of assay formats, in silico models for safety-related endpoints are often based on discretized data instead of the data on a natural continuous scale. Models for discretized endpoints have limitations in usage and interpretation that can impact compound design. Here, we present a consistent data inference approach, exemplified on two data sets of Ether-à-go-go-Related Gene (hERG) K+ inhibition data, for dose–response and screening experiments that are generally applicable for in vitro assays. hERG inhibition has been associated with severe cardiac effects and is one of the more prominent safety targets assessed in drug development, using a wide array of in vitro and in silico screening methods. In this study, the IC50 for hERG inhibition is estimated from diverse historical proprietary data. The IC50 derived from a two-point proprietary screening data set demonstrated high correlation (R = 0.98, MAE = 0.08) with IC50s derived from six-point dose–response curves. Similar IC50 estimation accuracy was obtained on a public thallium flux assay data set (R = 0.90, MAE = 0.2). The IC50 data were used to develop a robust quantitative model. The model’s MAE (0.47) and R2 (0.46) were on par with literature statistics and approached assay reproducibility. Using a continuous model has high value for pharmaceutical projects, as it enables rank ordering of compounds and evaluation of compounds against project-specific inhibition thresholds. This data inference approach can be widely applicable to assays with quantitative readouts and has the potential to impact experimental design and improve model performance, interpretation, and acceptance across many standard safety endpoints. Full article
Show Figures

Figure 1

17 pages, 4850 KiB  
Article
Maximizing the Performance of Similarity-Based Virtual Screening Methods by Generating Synergy from the Integration of 2D and 3D Approaches
by Ningning Fan, Steffen Hirte and Johannes Kirchmair
Int. J. Mol. Sci. 2022, 23(14), 7747; https://doi.org/10.3390/ijms23147747 - 13 Jul 2022
Cited by 1 | Viewed by 1591
Abstract
Methods for the pairwise comparison of 2D and 3D molecular structures are established approaches in virtual screening. In this work, we explored three strategies for maximizing the virtual screening performance of these methods: (i) the merging of hit lists obtained from multi-compound screening [...] Read more.
Methods for the pairwise comparison of 2D and 3D molecular structures are established approaches in virtual screening. In this work, we explored three strategies for maximizing the virtual screening performance of these methods: (i) the merging of hit lists obtained from multi-compound screening using a single screening method, (ii) the merging of the hit lists obtained from 2D and 3D screening by parallel selection, and (iii) the combination of both of these strategies in an integrated approach. We found that any of these strategies led to a boost in virtual screening performance, with the clearest advantages observed for the integrated approach. On test sets for virtual screening, covering 50 pharmaceutically relevant proteins, the integrated approach, using sets of five query molecules, yielded, on average, an area under the receiver operating characteristic curve (AUC) of 0.84, an early enrichment among the top 1% of ranked compounds (EF1%) of 53.82 and a scaffold recovery rate among the top 1% of ranked compounds (SRR1%) of 0.50. In comparison, the 2D and 3D methods on their own (when using a single query molecule) yielded AUC values of 0.68 and 0.54, EF1% values of 19.96 and 17.52, and SRR1% values of 0.20 and 0.17, respectively. In conclusion, based on these results, the integration of 2D and 3D methods, via a (balanced) parallel selection strategy, is recommended, and, in particular, when combined with multi-query screening. Full article
Show Figures

Figure 1

19 pages, 1445 KiB  
Article
Protein–Protein Interaction Prediction for Targeted Protein Degradation
by Oliver Orasch, Noah Weber, Michael Müller, Amir Amanzadi, Chiara Gasbarri and Christopher Trummer
Int. J. Mol. Sci. 2022, 23(13), 7033; https://doi.org/10.3390/ijms23137033 - 24 Jun 2022
Cited by 4 | Viewed by 4597
Abstract
Protein–protein interactions (PPIs) play a fundamental role in various biological functions; thus, detecting PPI sites is essential for understanding diseases and developing new drugs. PPI prediction is of particular relevance for the development of drugs employing targeted protein degradation, as their efficacy relies [...] Read more.
Protein–protein interactions (PPIs) play a fundamental role in various biological functions; thus, detecting PPI sites is essential for understanding diseases and developing new drugs. PPI prediction is of particular relevance for the development of drugs employing targeted protein degradation, as their efficacy relies on the formation of a stable ternary complex involving two proteins. However, experimental methods to detect PPI sites are both costly and time-intensive. In recent years, machine learning-based methods have been developed as screening tools. While they are computationally more efficient than traditional docking methods and thus allow rapid execution, these tools have so far primarily been based on sequence information, and they are therefore limited in their ability to address spatial requirements. In addition, they have to date not been applied to targeted protein degradation. Here, we present a new deep learning architecture based on the concept of graph representation learning that can predict interaction sites and interactions of proteins based on their surface representations. We demonstrate that our model reaches state-of-the-art performance using AUROC scores on the established MaSIF dataset. We furthermore introduce a new dataset with more diverse protein interactions and show that our model generalizes well to this new data. These generalization capabilities allow our model to predict the PPIs relevant for targeted protein degradation, which we show by demonstrating the high accuracy of our model for PPI prediction on the available ternary complex data. Our results suggest that PPI prediction models can be a valuable tool for screening protein pairs while developing new drugs for targeted protein degradation. Full article
Show Figures

Figure 1

Back to TopTop