# P3CMQA: Single-Model Quality Assessment Using 3DCNN with Profile-Based Features

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Featurization

#### 2.1.1. Making Residue-Level Bounding Box

#### 2.1.2. Atom-Type Features

#### 2.1.3. Evolutionary Information

#### 2.1.4. Predicted Local Structure

#### 2.2. 3DCNN Training

#### 2.2.1. Network Architecture

#### 2.2.2. Label and Score Integration

#### 2.2.3. Parameters

#### 2.2.4. Training Process

#### 2.3. Dataset

#### 2.4. Performance Evaluation

- The average Pearson correlation coefficient for each target
- The average Spearman correlation coefficient for each target
- The average GDT_TS loss for each target
- The average Z-score for each target

## 3. Results and Discussion

#### 3.1. Training Result for Each Feature

#### 3.2. Comparison with Other Methods on CASP Datasets

## 4. Web Tool

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Abbreviations

MQA | Model Quality Assessment |

P3CMQA | Profile-based three-dimensional Convolutional neural network for protein structure Model Quality Assessment |

GDT_TS | Global Distance Test Total Score |

lDDT | Local Distance Diferrence Test |

PSSM | Position Specific Scoring Matrix |

SS | Secondary Structure |

RSA | Relative Solvent Accessibility |

## Appendix A

#### Appendix A.1

Type | Description | Residue:Atom |
---|---|---|

1 | Sulfur/selenium | CYS:SG, MET:SD, MSE:SE |

2 | Nitrogen (amide) | ASN:ND2, GLN:NE2, backbone N (including N-terminal) |

3 | Nitrogen (aromatic) | HIS:ND1/NE1, TRP:NE1 |

4 | Nitrogen (guanidinium) | ARG:NE/NH * |

5 | Nitrogen (ammonium) | LYS:NZ |

6 | Oxygen (carbonyl) | ASN:OD1, GLN:OE1, backbone O (except C-terminal) |

7 | Oxygen (hydroxyl) | SER:OG, THR:OG1, TYR:OH |

8 | Oxygen (carboxyl) | ASP:OD *, GLU:OE *, C-terminal O, C-terminal OXTc |

9 | Carbon (sp2) | ARG:CZ, ASN:CG, ASP:CG, GLN:CD, GLU:CD, backbone C |

10 | Carbon (aromatic) | HIS:CG/CD2/CE1, PHE:CG/CD */CE */CZ, TRP:CG/CD */CE */CZ */CH2, TYR:CG/CD */CE */CZ |

11 | Carbon (sp3) | ALA:CB, ARG:CB/CG/CD, ASN:CB, ASP:CB, CYS:CB, GLN:CB/CG, GLU:CB/CG, HIS:CB, ILE:CB/CG */CD1, LEU:CB/CG/CD *, LYS:CB/CG/CD/CE, MET:CB/CG/CE, MSE:CB/CG/CE, PHE:CB, PRO:CB/CG/CD, SER:CB, THR:CB/CG2, TRP:CB, TYR:CB, VAL:CB/CG *, backbone CA |

12 | Occupancy | *:* |

13 | Backbone | *:N, *:CA, *:C |

14 | CA | *:CA |

#### Appendix A.2

Layer Name | Output Shape | Detail |
---|---|---|

Input | $14\times 28\times 28\times 28$ | |

Conv3D | $128\times 25\times 25\times 25$ | Batch Normalization, PReLU |

Conv3D | $256\times 22\times 22\times 22$ | Batch Normalization, PReLU |

Conv3D | $256\times 11\times 11\times 11$ | Batch Normalization, PReLU |

Conv3D | $512\times 8\times 8\times 8$ | Batch Normalization, PReLU |

Conv3D | $512\times 6\times 6\times 6$ | Batch Normalization, PReLU |

Conv3D | $1024\times 3\times 3\times 3$ | Batch Normalization, PReLU |

Global Average Pooling | 1024 | |

Linear | 1024 | Batch Normalization, PReLU |

Linear | 256 | Batch Normalization, PReLU |

Linear | 1 |

#### Appendix A.3

Atom-Type Features | Evolutionary Information | Predicted Local Structure | Pearson | Spearman | Loss | Z-Score | AUC |
---|---|---|---|---|---|---|---|

✓ | ✗ | ✗ | $0.757$ | $0.645$ | $8.518$ | $4.244$ | $0.878$ |

✓ | ✓ | ✗ | $0.834$ | $0.729$ | $9.860$ | $4.239$ | $0.923$ |

✓ | ✗ | ✓ | $0.847$ | $0.724$ | $3.883$ | $4.742$ | $0.944$ |

✗ | ✓ | ✓ | $0.858$ | $0.742$ | $4.818$ | $4.666$ | $0.948$ |

✓ | ✓ | ✓ | $\mathbf{0.865}$ | $\mathbf{0.751}$ | $\mathbf{2.519}$ | $\mathbf{4.866}$ | $\mathbf{0.956}$ |

#### Appendix A.4

**Table A4.**The average Pearson correlation coefficient for each category of targets on CASP13 dataset.

Method | FM (12 Targets) | FM/TBM (15 Targets) | TBM (37 Targets) |
---|---|---|---|

Proposed | $\mathbf{0.757}$ (−) | $\mathbf{0.812}$ (−) | $\mathbf{0.822}$ (−) |

Sato-3DCNN (AMSGrad) | $0.663$ ($\mathbf{2.44}\times {\mathbf{10}}^{-\mathbf{3}}$) | $0.730$ ($\mathbf{4.27}\times {\mathbf{10}}^{-\mathbf{3}}$) | $0.797$ ($\mathbf{8.47}\times {\mathbf{10}}^{-\mathbf{3}}$) |

ProQ3D | $0.626$ ($\mathbf{9.28}\times {\mathbf{10}}^{-\mathbf{3}}$) | $0.689$ ($\mathbf{3.05}\times {\mathbf{10}}^{-\mathbf{4}}$) | $0.712$ ($\mathbf{9.81}\times {\mathbf{10}}^{-\mathbf{7}}$) |

SBROD | $0.633$ ($1.61\times {10}^{-2}$) | $0.628$ ($\mathbf{4.27}\times {\mathbf{10}}^{-\mathbf{4}}$) | $0.720$ ($\mathbf{6.23}\times {\mathbf{10}}^{-\mathbf{6}}$) |

VoroMQA | $0.579$ ($\mathbf{1.46}\times {\mathbf{10}}^{-\mathbf{3}}$) | $0.661$ ($\mathbf{8.54}\times {\mathbf{10}}^{-\mathbf{4}}$) | $0.724$ ($\mathbf{2.32}\times {\mathbf{10}}^{-\mathbf{5}}$) |

**Figure A1.**Swarm plot and box plot of the Pearson correlation coefficient for each target on CASP13. The x-axis represents the Pearson correlation coefficient, and the y-axis represents the method. A point represents a target, and the color of the point represents the category of the target.

## References

- Senior, A.W.; Evans, R.; Jumper, J.; Kirkpatrick, J.; Sifre, L.; Green, T.; Qin, C.; Žídek, A.; Nelson, A.W.; Bridgland, A.; et al. Protein structure prediction using multiple deep neural networks in the 13th Critical Assessment of Protein Structure Prediction (CASP13). Proteins
**2019**, 87, 1141–1148. [Google Scholar] [CrossRef] [PubMed][Green Version] - Kryshtafovych, A.; Schwede, T.; Topf, M.; Fidelis, K.; Moult, J. Critical assessment of methods of protein structure prediction (CASP)—Round XIII. Proteins Struct. Funct. Bioinform.
**2019**, 87, 1011–1020. [Google Scholar] [CrossRef][Green Version] - Hou, J.; Wu, T.; Cao, R.; Cheng, J. Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13. Proteins Struct. Funct. Bioinform.
**2019**, 87, 1165–1178. [Google Scholar] [CrossRef] [PubMed][Green Version] - Derevyanko, G.; Grudinin, S.; Bengio, Y.; Lamoureux, G. Deep convolutional networks for quality assessment of protein folds. Bioinformatics
**2018**, 34, 4046–4053. [Google Scholar] [CrossRef] [PubMed] - Sato, R.; Ishida, T. Protein model accuracy estimation based on local structure quality assessment using 3D convolutional neural network. PLoS ONE
**2019**, 14, e0221347. [Google Scholar] [CrossRef] [PubMed][Green Version] - Ray, A.; Lindahl, E.; Wallner, B. Improved model quality assessment using ProQ2. BMC Bioinform.
**2012**, 13. [Google Scholar] [CrossRef][Green Version] - Uziela, K.; Shu, N.; Wallner, B.; Elofsson, A. ProQ3: Improved model quality assessments using Rosetta energy terms. Sci. Rep.
**2016**, 6, 33509. [Google Scholar] [CrossRef] - Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res.
**1997**, 25, 3389–3402. [Google Scholar] [CrossRef] [PubMed][Green Version] - Magnan, C.N.; Baldi, P. SSpro/ACCpro 5: Almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity. Bioinformatics
**2014**, 30, 2592–2597. [Google Scholar] [CrossRef][Green Version] - Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 7–9 July 2015; Proceedings of Machine Learning Research; Bach, F., Blei, D., Eds.; PMLR: Lille, France, 2015; Volume 37, pp. 448–456. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1026–1034. [Google Scholar] [CrossRef][Green Version]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef][Green Version]
- Mariani, V.; Biasini, M.; Barbato, A.; Schwede, T. IDDT: A local superposition-free score for comparing protein structures and models using distance difference tests. Bioinformatics
**2013**, 29, 2722–2728. [Google Scholar] [CrossRef][Green Version] - Zemla, A. LGA: A method for finding 3D similarities in protein structures. Nucleic Acids Res.
**2003**, 31, 3370–3374. [Google Scholar] [CrossRef][Green Version] - Reddi, S.J.; Kale, S.; Kumar, S. On the Convergence of Adam and Beyond. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Moult, J.; Fidelis, K.; Kryshtafovych, A.; Rost, B.; Hubbard, T.; Tramontano, A. Critical assessment of methods of protein structure prediction—Round VII. Proteins: Struct. Funct. Bioinform.
**2007**, 69, 3–9. [Google Scholar] [CrossRef] - Moult, J.; Fidelis, K.; Kryshtafovych, A.; Rost, B.; Tramontano, A. Critical assessment of methods of protein structure prediction-Round VIII. Proteins Struct. Funct. Bioinform.
**2009**, 77, 1–4. [Google Scholar] [CrossRef] - Moult, J.; Fidelis, K.; Kryshtafovych, A.; Tramontano, A. Critical assessment of methods of protein structure prediction (CASP)—round IX. Proteins Struct. Funct. Bioinform.
**2011**, 79, 1–5. [Google Scholar] [CrossRef][Green Version] - Moult, J.; Fidelis, K.; Kryshtafovych, A.; Schwede, T.; Tramontano, A.; Topf, M.; Fidelis, K.; Moult, J.; Fidelis, K.; Kryshtafovych, A.; et al. Critical assessment of methods of protein structure prediction (CASP)—round x. Proteins Struct. Funct. Bioinform.
**2014**, 82, 1–6. [Google Scholar] [CrossRef] [PubMed][Green Version] - Moult, J.; Fidelis, K.; Kryshtafovych, A.; Schwede, T.; Tramontano, A.; Topf, M.; Fidelis, K.; Moult, J.; Fidelis, K.; Kryshtafovych, A.; et al. Critical assessment of methods of protein structure prediction (CASP)—Round XII. Proteins Struct. Funct. Bioinform.
**2018**, 86, 7–15. [Google Scholar] [CrossRef] [PubMed] - Krivov, G.G.; Shapovalov, M.V.; Dunbrack, R.L. Improved prediction of protein side-chain conformations with SCWRL4. Proteins Struct. Funct. Bioinform.
**2009**, 77, 778–795. [Google Scholar] [CrossRef][Green Version] - Uziela, K.; Hurtado, D.M.; Shu, N.; Wallner, B.; Elofsson, A. ProQ3D: Improved model quality assessments using deep learning. Bioinformatics
**2017**, 33, 1578–1580. [Google Scholar] [CrossRef] [PubMed][Green Version] - Karasikov, M.; Pagès, G.; Grudinin, S. Smooth orientation-dependent scoring function for coarse-grained protein quality assessment. Bioinformatics
**2019**, 35, 2801–2808. [Google Scholar] [CrossRef][Green Version] - Olechnovič, K.; Venclovas, Č. VoroMQA: Assessment of protein structure quality using interatomic contact areas. Proteins Struct. Funct. Bioinform.
**2017**, 85, 1131–1145. [Google Scholar] [CrossRef] - Funk, S. RMSprop loses to SMORMS3—9Beware the Epsilon! 2015. Available online: https://sifter.org/~simon/journal/20150420.html (accessed on 25 February 2021).
- Rose, A.S.; Bradley, A.R.; Valasatava, Y.; Duarte, J.M.; Prlić, A.; Rose, P.W. NGL viewer: Web-based molecular graphics for large complexes. Bioinformatics
**2018**, 34, 3755–3758. [Google Scholar] [CrossRef] [PubMed][Green Version]

**Figure 1.**Overall workflow of this work. First, a bounding box is generated for each residue from the coordinate information of the model structure. Then, 14-dimensional atom-type features are obtained from the model structure. In addition, 20-dimensional sequence profile features and 4-dimensional local structure features are generated from the sequences. These features are then input to the three-dimensional convolutional neural network to predict a local score for each residue. Finally, the local scores are averaged to obtain a global score for the entire model.

**Figure 2.**The input page of the web tool. The email address and the model structure in PDB format or mmCIF format are required inputs, and the sequence in FASTA format is an optional input. You can check the number of running jobs and the number of waiting jobs.

**Figure 3.**The output page of the prediction results. The predicted score for the whole model, the predicted score for each residue, and the three-dimensional structure colored by the local score is shown. The parts colored in blue represent high local scores, and the parts colored in red represent low local scores. The results can be downloaded in multiple formats.

Dataset | Number of Targets | Number of Model Structures per Target | |
---|---|---|---|

Train | Train | 337 | $69.5$ |

Validation | 85 | $271.7$ | |

Test | CASP12 | 51 | $149.9$ |

CASP13 | 66 | $149.9$ |

Atom-Type Features | Evolutionary Information | Predicted Local Structure | Pearson (Validation) |
---|---|---|---|

✓ | ✗ | ✗ | $0.757$ |

✓ | ✓ | ✗ | $0.834$ |

✓ | ✗ | ✓ | $0.847$ |

✗ | ✓ | ✓ | $0.858$ |

✓ | ✓ | ✓ | $\mathbf{0}.\mathbf{865}$ |

Method | Pearson | Spearman | Loss | Z-Score |
---|---|---|---|---|

Proposed | $\mathbf{0.856}$ | $\mathbf{0.782}$ | $\mathbf{4.319}$ | $\mathbf{1}.\mathbf{240}$ |

(−) | (−) | (−) | (−) | |

Sato-3DCNN (AMSGrad) | $0.746$ | $0.675$ | $5.530$ | $1.139$ |

($\mathbf{4.67}\times {\mathbf{10}}^{-\mathbf{9}}$) | ($\mathbf{5.31}\times {\mathbf{10}}^{-\mathbf{7}}$) | ($4.89\times {10}^{-1}$) | ($4.99\times {10}^{-1}$) | |

ProQ3D | $0.750$ | $0.672$ | $7.989$ | $0.922$ |

($\mathbf{8.18}\times {\mathbf{10}}^{-\mathbf{9}}$) | ($\mathbf{3.41}\times {\mathbf{10}}^{-\mathbf{7}}$) | ($\mathbf{4.82}\times {\mathbf{10}}^{-\mathbf{3}}$) | ($\mathbf{7.38}\times {\mathbf{10}}^{-\mathbf{3}}$) | |

SBROD | $0.682$ | $0.612$ | $7.063$ | $0.967$ |

($\mathbf{9.87}\times {\mathbf{10}}^{-\mathbf{10}}$) | ($\mathbf{1.87}\times {\mathbf{10}}^{-\mathbf{7}}$) | ($3.47\times {10}^{-2}$) | ($4.23\times {10}^{-2}$) | |

VoroMQA | $0.671$ | $0.592$ | $7.649$ | $0.963$ |

($\mathbf{1.11}\times {\mathbf{10}}^{-\mathbf{9}}$) | ($\mathbf{1.77}\times {\mathbf{10}}^{-\mathbf{9}}$) | ($4.30\times {10}^{-2}$) | ($4.30\times {10}^{-2}$) |

Method | Pearson | Spearman | Loss | Z-Score |
---|---|---|---|---|

Proposed | $\mathbf{0}.\mathbf{797}$ | $\mathbf{0}.\mathbf{757}$ | $\mathbf{5}.\mathbf{708}$ | $\mathbf{1}.\mathbf{264}$ |

(−) | (−) | (−) | (−) | |

Sato-3DCNN (AMSGrad) | $0.748$ | $0.703$ | $6.527$ | $1.167$ |

($\mathbf{1.09}\times {\mathbf{10}}^{-\mathbf{5}}$) | ($\mathbf{1.84}\times {\mathbf{10}}^{-\mathbf{5}}$) | ($4.44\times {10}^{-1}$) | ($3.93\times {10}^{-1}$) | |

ProQ3D | $0.686$ | $0.638$ | $9.482$ | $0.990$ |

($\mathbf{1.42}\times {\mathbf{10}}^{-\mathbf{9}}$) | ($\mathbf{2.03}\times {\mathbf{10}}^{-\mathbf{10}}$) | ($2.16\times {10}^{-2}$) | ($2.29\times {10}^{-2}$) | |

SBROD | $0.674$ | $0.637$ | $10.014$ | $0.930$ |

($\mathbf{1.95}\times {\mathbf{10}}^{-\mathbf{9}}$) | ($\mathbf{3.38}\times {\mathbf{10}}^{-\mathbf{9}}$) | ($\mathbf{2.29}\times {\mathbf{10}}^{-\mathbf{4}}$) | ($\mathbf{5}.\mathbf{99}\times {\mathbf{10}}^{-\mathbf{4}}$) | |

VoroMQA | $0.676$ | $0.624$ | $12.105$ | $0.786$ |

($\mathbf{2.38}\times {\mathbf{10}}^{-\mathbf{9}}$) | ($\mathbf{2.52}\times {\mathbf{10}}^{-\mathbf{11}}$) | ($\mathbf{1.73}\times {\mathbf{10}}^{-\mathbf{3}}$) | ($\mathbf{1.15}\times {\mathbf{10}}^{-\mathbf{3}}$) |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Takei, Y.; Ishida, T.
P3CMQA: Single-Model Quality Assessment Using 3DCNN with Profile-Based Features. *Bioengineering* **2021**, *8*, 40.
https://doi.org/10.3390/bioengineering8030040

**AMA Style**

Takei Y, Ishida T.
P3CMQA: Single-Model Quality Assessment Using 3DCNN with Profile-Based Features. *Bioengineering*. 2021; 8(3):40.
https://doi.org/10.3390/bioengineering8030040

**Chicago/Turabian Style**

Takei, Yuma, and Takashi Ishida.
2021. "P3CMQA: Single-Model Quality Assessment Using 3DCNN with Profile-Based Features" *Bioengineering* 8, no. 3: 40.
https://doi.org/10.3390/bioengineering8030040