Methods for the Refinement of Protein Structure 3D Models

Adiyaman, Recep; McGuffin, Liam James

doi:10.3390/ijms20092301

Open AccessReview

Methods for the Refinement of Protein Structure 3D Models

by

Recep Adiyaman

and

Liam James McGuffin

^*

School of Biological Sciences, University of Reading, Reading RG6 6AS, UK

^*

Author to whom correspondence should be addressed.

Int. J. Mol. Sci. 2019, 20(9), 2301; https://doi.org/10.3390/ijms20092301

Submission received: 2 April 2019 / Revised: 24 April 2019 / Accepted: 7 May 2019 / Published: 9 May 2019

(This article belongs to the Special Issue Feature Annual Reviews in Molecular Sciences 2019)

Download

Browse Figures

Versions Notes

Abstract

:

The refinement of predicted 3D protein models is crucial in bringing them closer towards experimental accuracy for further computational studies. Refinement approaches can be divided into two main stages: The sampling and scoring stages. Sampling strategies, such as the popular Molecular Dynamics (MD)-based protocols, aim to generate improved 3D models. However, generating 3D models that are closer to the native structure than the initial model remains challenging, as structural deviations from the native basin can be encountered due to force-field inaccuracies. Therefore, different restraint strategies have been applied in order to avoid deviations away from the native structure. For example, the accurate prediction of local errors and/or contacts in the initial models can be used to guide restraints. MD-based protocols, using physics-based force fields and smart restraints, have made significant progress towards a more consistent refinement of 3D models. The scoring stage, including energy functions and Model Quality Assessment Programs (MQAPs) are also used to discriminate near-native conformations from non-native conformations. Nevertheless, there are often very small differences among generated 3D models in refinement pipelines, which makes model discrimination and selection problematic. For this reason, the identification of the most native-like conformations remains a major challenge.

Keywords:

protein model refinement; tertiary structure prediction; molecular dynamics simulations; energy functions; model quality estimates; Critical Assessment of techniques for Structure Prediction (CASP)

Graphical Abstract

1. Introduction

The determination of three-dimensional protein structures at an atomic resolution is the key to unlocking an understanding of biological functions and the molecular mechanisms of diseases [1,2]. Although the established experimental methods, such as X-ray crystallography [3,4,5,6,7], Nuclear Magnetic Resonance (NMR) [8,9], and cryo-electron microscopy [9,10], may enable the determination of 3D atom coordinates at high accuracies, they are far from matching the pace of new genetic data, due to their high cost and laborious processes in the cloning, expression, and purification stages [11,12,13,14]. Accurate in silico protein modelling is comparatively cheaper and faster than experimental determination methods, and helps us to bridge the gap between the known sequences and available structures. Furthermore, in silico modelling is often able to provide detailed structure representations at an atomic level [1,2,15,16,17,18,19,20].

In silico prediction of protein structures consists of three main stages, starting with: (1) predicting 3D models by template-based modelling (TBM) and free modelling (FM); continuing with (2) the assessment of the predicted 3D models; and ending with (3) the refinement of the predicted 3D models [16,21]. The prediction of 3D models from amino acid sequences has made significant progress towards the accurate determination of native structures, especially with the use of templates from known structures of homologous proteins, and the progress has been well-documented in the last 25 years of the CASP experiments [22,23,24,25,26,27]. In general, 3D modelling can be divided into two broad categories (in terms of the usage, or not, of a known template structure): TBM and FM [16]. TBM [26,28,29] methods are able to generate reliable 3D models, based on the available known structures, by copying the relative atom coordinates from the aligned residues through sequence-structure alignments; such approaches have been found to be the most successful for tertiary protein structure prediction, by far [5,18,30,31,32]. If there is a high similarity between the target sequence and the template from the protein data bank [33,34], then the predictions are likely to be highly accurate [18,21,30,35]. In addition, the increasing number of available structures determined by advanced experimental techniques allows for an increasingly higher coverage of protein structures [36,37,38,39].

In the cases where no suitable templates are available for generating predicted 3D models, then template-free modelling (FM), or ab initio modelling, is used to predict the models by relying on physical, chemical, and thermodynamic principles [16]. However, the accuracy of the 3D models produced by FM has often been much lower than those produced by TBM and, historically, FM methods have only been accurate in modelling small protein structures, up to 100 residues [16]. TBM and FM approaches may generate hundreds of 3D structure “decoys” in different alternative conformations [16,40]. Model Quality Assessment Programs (MQAPs) have been used to determine the most native-like 3D model among the decoys, by giving local and global scores, which can be used to estimate model accuracy [12,41,42,43].

The accuracy of the predicted 3D models is a critical factor for detailed mechanistic studies, such as drug design, protein docking, and the prediction of protein function. Furthermore, pharmaceutical applications often require structures close to experimental levels of accuracy [5,30,32,44,45,46,47,48,49,50,51]. Although the success of TBM and FM modelling has been observed in the CASP experiments, often the predicted 3D models are not without flaws—particularly those from FM methods—and they may still have some local and global errors, including: irregular contacts or hydrogen bonds, clashes, and unusual bond angles and lengths in the predicted 3D models [26,42,52,53]. The errors in the predicted 3D structures also limit the usage of the models for further studies. The necessity for increasing the accuracy of the predicted tertiary structures and the correction of the errors described above has led to development of methods for the refinement of 3D models [5,43,54].

The refinement of 3D models of proteins has emerged as the last milestone of the structure prediction journey to reach parity with experimental accuracy [55,56]. Refining 3D models often helps to bring them closer to native structures by modifying the secondary structure units and repacking sidechains [54]. However, ironically, refinement approaches can also lead to a degradation in the quality of models. Knowing whether a model has been improved or made worse remains a major challenge for developers of 3D model refinement methods [57,58]. Consistent beneficial refinement of predicted 3D models is necessary for many in silico studies, ranging from drug discovery to protein design [47,50,59,60,61,62,63].

Typically, the refinement of predicted 3D models involves two principal stages: Sampling and scoring [5,53] (see Figure 1). For successful refinement, firstly, the sampling approaches have to be able to generate at least some alternative 3D models that are closer to the native structure than the initial model and, secondly, the generated 3D models must be accurately scored, in order to facilitate identification of those that are closest to the native structure [5]. The sampling and scoring approaches can also be applied in an iterative cycle, in order to find a pathway towards a more consistent refinement. However, both the sampling and scoring of improved models remains elusive, and the consistent refinement of predicted 3D models has not yet been witnessed in the CASP experiments. The refinement category itself has seen more limited success in the CASP experiments, compared with the tertiary structure prediction and quality assessment categories [5,53,54]. However, it must be emphasised that the refinement of the typical predicted 3D models produced by standard prediction servers is often much more successful than the refinement of the models selected by the CASP assessors for the refinement category, as the CASP “refinement targets” may have already been refined during other modelling pipelines [54,58].

In the following sections, we will outline the alternative methods used for both sampling and scoring. We will describe, compare, and contrast the different strategies and discuss the merits and pitfalls of each approach.

2. Sampling Strategies

Two broad approaches are used in the sampling stage: the fully-automated server-based programs and the non-server-based, highly central processing unit (CPU)-intensive programs, such as Molecular Dynamics (MD) simulations (also known as manual/human refinement methods in CASP) [43,64,65]. The sampling approaches may include the various combinations of knowledge-based methods [32,41,47,52,64,66,67,68,69,70,71,72], Monte Carlo simulations [68,69,70,73,74,75,76,77], physics-based potentials [69,70,78,79,80,81,82,83], and MD simulations [32,43,48,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93], in order to sample near-native conformations.

Automated and rapid server-based refinement methods are generally based on side-chain optimisation and energy minimisation. Server-based approaches are practical, as they are often based on utilising the knowledge of protein structures, particularly specific interactions between residues and atoms, and they require less computational effort [43,56,57,58]. The generation of 3D models with automated server-based strategies is often more conservative and risk-averse, compared to the more computationally-intensive manual approaches, which often utilise MD-based approaches, as seen in the recent CASP experiments. Furthermore, the more conservative servers performed well in both CASP8 and CASP9, and the structural deviations among the generated sampled models were not as great as those observed in sampled models from the more computationally-intensive manual approaches [56,57,58,89]). On the other hand, these early conservative servers were not as successful as the non-server MD-based methods in the cases where the starting models were of poor quality, and where there was more room for improvement [5,53,64,65,83].

Since CASP10, the non-server-based highly CPU intensive methods, which have mainly relied upon MD simulations using physics-based force fields, parallel computing on graphics processing units (GPUs) and/or CPUs, and smart constraints, have become more widely-used to generate sample 3D models that are closer to the native structures [5,53,64,65,94]. MD simulations also provide important information about dynamic aspects of the structure [29,32,48,69,80].

A leading MD-based refinement approach, using a physics-based potential, was developed by the Shaw group [90,91,95] and tested in CASP9. However, they used a simulation time of 100 µs for each target, which was subsequently found to be unnecessarily long. Furthermore, structural deviations were also observed due to force-field inaccuracies and the lack of guidance towards the native basin during MD simulations [48,90,91,95].

In CASP10, the Feig group also developed a physics-based sampling approach using MD simulations, and managed to refine large proteins with shorter simulation times [32]. The MD-based protocol from the Feig group made significant progress towards a more consistent refinement with the usage of an improved force field, the application of C-alpha restraints, and an ensemble averaging stage under explicit solvent conditions [32,64]. However, the approach used by Feig was still extremely CPU intensive, requiring 75,000 core hours (12 days on 256 cores) to refine a single 3D model, and so it was not broadly applicable for the sort of large-scale analysis typically required by servers or proteomic pipelines [32].

With the growing availability of GPU/CPU computing [55,96], most of the top-performing groups in CASP12 also used MD-based sampling strategies [48,53,87,96,97,98,99,100,101]. Nevertheless, the sampling of alternative refinement models through MD simulations still brings about a high computational cost, particularly for large protein targets. Additionally, there remains a need for improved force fields to consistently increase the accuracy beyond that of the starting model, particularly where the starting model is already of high accuracy [5].

Force field accuracy is an important component of molecular simulations, as the chosen force field determines how the potential atomic interactions are modelled in molecular systems. The optimal parameters of force fields used in the simulations are determined from datasets of experimental structures [5,102]. Recently, popular force fields, such as the Chemistry at Harvard Macromolecular Mechanics (CHARMM) c22/CMAP [103] and c36 [97] versions and the AMBER ff14SB [99] and AMBER12SB [104,105] force fields, have been used in different sampling approaches, which included Monte Carlo and Molecular Dynamics simulations in the refinement pipeline [56,77,94,106]. However, all force fields are imperfect and cannot yet be relied upon to consistently generate models that are closer to experimental structures. There is plenty of room for improvement in force field development. Perhaps the main challenge is the further development of the parameter optimization strategies for the potential energy functions [32,48,69,78].

Due to the use of imperfect force fields, molecular dynamics simulations also suffer from lack of guidance for producing sample models that trend towards the native structure [69,78]. The usage of smart restraints has been a key factor in ensuring that the refinement models do not deviate away from the native structure [32,48]. However, there is a balance to be made, as the application of restraints may limit the extent of the refinement sampling; very strong restraints may just allow sampling of conformations that are close the starting model, instead of allowing a trend towards the native state [48]. Research has shown that the application of restraints is crucial, particularly where the initial model is highly accurate. It has also been observed that unrestrained MD simulations quickly drive the initial models away from the native structure [48,53,78,80,90,107]. Furthermore, the strength of the applied restraints has been found to be a significant parameter, in terms of increasing the quality of the sampled models, but it is interesting to note that weaker, rather than stronger, C-alpha restraints have often performed better [32,48,53,108].

In most cases, the restraints have generally been applied on all C-alphas, but different kinds of restraints, based on prior knowledge [5,81,109], specific regions [5,81,109,110], and local quality assessment [5,88,111], have also been applied by groups participating in CASP experiments. The application of partial restraints can also give the sampling approaches more “wiggle room” to improve the quality beyond that of the initial models. The determination of which specific parts of a model are in need of more refinement, based, for example, on local quality estimates, may provide more reliable guidance for MD simulations [88,111,112]. Based on this principle, our group (the McGuffin group) has developed a new local quality assessment guided restraint strategy, which we used in CASP13. The strategy depends on the predicted per-residue accuracy scores produced by ModFOLD7. The regions of the starting models that are predicted to be close to the native structure are used as restraints for the MD simulations (Figure 2). Flat-bottom potential widths of 2–4 Å were also applied by the Feig group in CASP13, as a new restraint strategy which performed better than weak harmonic positional restraints [94,113]. The new restraint strategies that were applied in CASP13 showed a promising step towards a more consistent refinement.

The predicted residue–residue contacts have also made significant improvements to protein structure prediction strategies, particularly during the CASP13 experiment [114,115]. This valuable information has helped to increase the accuracy of the predicted 3D models. Furthermore, accurate information regarding predicted pairwise distances might also provide very valuable guidance for a more consistent refinement.

Sampling Protocols

The refinement sampling strategies, described above, have been developed by expert groups participating in the CASP experiment and most of the more intensive methods are not straight-forward to deploy for general biologists. However, many of the groups have also developed web servers and/or stand-alone tools, many of which are freely available and easily accessible for life scientists who wish to apply 3D models to understand different molecular systems (see Table 1). Feig [5] has also provided a thorough review of the MD-based sampling strategies.

PREFMD is a refinement web server based on the successful MD-based strategy tested in CASP11 by the Feig group [85]. The locPREFMD web server, which was also developed by Feig group, aims at improving the local quality of predicted 3D structures, rather than the overall quality, with the molecular dynamics simulations using modified force fields, according to the MolProbity score [86].

The Rosetta hybridization refinement protocol, developed by the Baker group, was tested in CASP11 and CASP12 and performed well [77]. The refinement approach used is dependent on the accuracy of the starting models (high or low resolution) [77]. The high-resolution protocol consists of the refinement of the local regions, including the errors. If the starting models are predicted to be far away from the native state, then the whole structure is refined using the low-resolution protocol [77].

The Seok group has developed their GalaxyRefine method as a web server and its protocol is based on re-packing side chains and then repeated structural relaxation by short molecular-dynamics simulations [54,88]. The approach was tested in CASP8, CASP9, and CASP10, and it managed to improve the local and global quality of the starting models [54]. GalaxyRefineComplex was also developed in order to refine protein-protein interactions, based on the GalaxyRefine protocol [54,116].

The KoBaMIN refinement web server also employs an efficient protocol, based on the principle of energy minimisation using a knowledge-based force field [66]. The approach performed well in CASP8, CASP9, and CASP10, but mostly made conservative changes to the starting models [57,58,66,72].

The Floudas group developed the Princenton_TIGRESS server, which employs a combination of various restraint strategies: CYANA in the sampling stage [117], Rosetta Fast Relax relaxation [75], CHARMM in the short MD stage [84,102], and a machine learning approach in the selection step using ddFIRE [118], Banch [119], and Rosetta [75,120] energy functions, under implicit-solvent conditions [89]. The web server was subsequently upgraded (Princenton_TIGRESS2.0) with Support Vector Machine (SVM)-driven classification and enhanced MD stages [56]. The Floudas group methods were among the top five refinement programs in CASP10 and CASP11 [53,65,89].

The refinement of protein structure models is also possible using the ModRefiner algorithm, which is based on two main steps [67]: The first step is the refinement of the backbone topology, starting from C-alpha traces. This step is, then, followed by side-chain addition, using a physics- and knowledge-based force field [67].

3Drefine is based on the optimisation of hydrogen bonds network with MESHI [121] and atomic-level energy minimisation using composite physics and a knowledge-based force field [41,122]. The approach was tested in the CASP8 and CASP9 refinement categories, where it ranked among the top groups. The method uses a relatively conservative approach for sampling models, making very minor alterations to the backbone. i3Drefine is an iterative version of the 3Drefine refinement protocol, and is also presented as a web server [41,52,122].

The ReFOLD server, developed by our group, uses a unique hybrid approach consisting of three stages to refine 3D models and fix the errors identified by ModFOLD6 [112]. The first stage is based on the optimisation of hydrogen bonds and contacts using i3Drefine [43,52]. The second stage uses a scalable molecular dynamics simulation of the predicted 3D models with Nanoscale Molecular Dynamics (NAMD) [123]. In the final stage, ModFOLD6 is also used to evaluate and score the 3D models generated by the i3Drefine and NAMD protocols by giving predicted local and global errors [43,52,112,123]. The ReFOLD server was first tested in CASP12 and showed promising performance as a computationally efficient approach. The amino acid sequence and a 3D model (in Protein Data Bank (PDB) format) of the target are the only required inputs to refine protein structures and the method has recently been integrated with the IntFOLD server [124].

The original ReFOLD protocol was relatively novel, in that it used the model quality estimation method ModFOLD6 for scoring the sampled models, instead of energy functions. The protocol has now been further developed (ReFOLD2) with the guidance of the local quality assessment score produced by ModFOLD7 (see Figure 2). The developed approach was also tested in CASP13 and ranked among the top 10 refinement methods, according to its cumulative Global Distance Test Total Score (GDT-TS) score [43,112]. The following section discusses the alternative strategies which have been deployed by groups for scoring sampled models.

3. Scoring Strategies

The MD-based and knowledge-based sampling approaches, described above, generate numerous 3D models in different alternative conformations [83,96]. Therefore, in the next stage of the refinement process, it is necessary to be able to reliably score the alternative 3D models, in order to select those that are closer to the native structure than the starting model. However, the generated alternative models are often very similar to one another, and this represents a challenge for developers of energy functions and/or quality assessment tools [5,48,54,71,83,88,108,110,125,126,127,128,129,130,131,132,133].

In Anfinsen’s hypothesis, it is stated that the native state has usually been found at the lowest Gibbs free energy, and native-like conformations are represented at a lower energy [126,134,135]. In further analysis, the most native-like state was found generally to be at the lowest energy score comparing to other states, but not always [94].

To score the 3D models sampled by the MD-based approaches utilising CHARMM c36 [97] and AMBER ff14SB force fields [99], several different energy functions have been tested to select native-like structures. Energy functions derived from the statistical analysis of known structures typically have been utilised to recognise native and native-like structures in the refinement; for example, the DFIRE [118], DDFIRE [118], RW+ [134], and Rosetta energy functions [5,48,108,126,136,137,138]. The energy scoring methods vary, depending on the choice of the reference state used to statistically analyse the atomic interactions based on known structures [48,83,108,126,139,140,141,142,143,144,145,146,147,148]. The lowest score produced by the scoring methods correlates with the lowest Root Mean Square Deviation (RMSD) score, but a consistent selection and a clear correlation is still required [55,94,106,134,149,150].

The distance-scaled, finite-ideal gas reference (DFIRE) [118,151,152] is one of the knowledge-based statistical potentials used to score native-like structures, using a distance-dependent and pairwise statistical energy function to find the 3D models closer to the native state. The lowest DFIRE score is often used to select the most native-like structures from among alternatives 3D models generated by the MD-based protocols, but it was not better than the final MD structure [48,118,151,152].

Random Walk reference state (RWplus) [134] scoring has also been used to score native-like structures. The RWplus score is based on a knowledge-based potential, including distance- and orientation-dependent potentials trained using databases of known structures [55,134]. The performance of the RWplus score was found to be better than the DFIRE score, in terms of the selection of native-like structures in refinement pipelines [55].

Rosetta energy functions [126] often identify the native-like states at a lower energy than the non-native structures [74,75,120,137,138,153,154,155]. Therefore, Rosetta energy function searches are often performed to discover the lowest energy conformation among the 3D models generated by the sampling approaches. [156]. The Rosetta energy function was also used to score the 3D models by the Baker and Feig groups in CASP13 [113,126]. However, energy-based approaches for selecting native-like conformations have not shown considerable improvement in recent years [126,157].

More recently, MQAPs, such as ProQ [158], ProQ2 [159], SELECTpro [160], and ModFOLD6 [112], have also been used to identify the most native-like structures, following the sampling stages in the refinement pipeline [43,72,106]. The MQAP approaches have traditionally been used for selection of the best models from among those submitted by tertiary structure prediction servers in the CASP experiments. In this role, they have performed well, in terms of selection of the most native-like predicted 3D models; furthermore, they are improving in their consistency [42,161,162,163]. However, such tools have not reached consistent selection for 3D models generated by refinement pipelines, where there is often much less variation. The consistent and accurate identification of the most native-like refinement models is a much harder task for MQAP methods, given the very small differences between models and, traditionally, MQAPs have not been developed for this specific role.

4. CASP: The Critical Assessment of Techniques for Protein Structure Prediction

Evaluation of predicted protein structures from a wide range of prediction approaches requires objective blind tests, which are based on unreleased experimental structures [164]. The Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiment has fulfilled the need for such objective testing since 1994. For more than two decades, John Moult and his colleagues have organised blind prediction experiments, every other year, in order to evaluate different approaches for various aspects of predicting structures from amino acid sequences [17,25,165]. The assessment experiment is always carried out by independent assessors and no prediction groups have access to the experimentally-determined structures for targets, prior to their release into the PDB [23,26,165,166].

4.1. The Refinement Category in CASP Experiments

The refinement category was introduced as an additional prediction category in CASP8, in order to encourage further improvements to the accuracy of predicted 3D models. The CASP assessors have typically provided the best predicted 3D models as refinement targets, in order to evaluate whether or not they can be successfully further improved [57]. Only the refinement of the provided starting model is requested and teams are discouraged from providing alternative models built from additional templates. The category aims to further increase in the accuracy of the best-predicted 3D models and the refinement methods have been able to add value to the prediction process [53,57,58,64,65].

It has been challenging for developers of refinement methods to improve the 3D models provided in the refinement category of CASP. This is primarily because the best-predicted models that are chosen as refinement targets may have already been once-refined in their source pipelines. Therefore, any further improvements to the quality of the provided models are, perhaps, less obvious and so it becomes an exercise of diminishing returns [57,58]. Moreover, some parts of the provided starting 3D models may have been based on known structures, particularly TBM predictions, and so the starting models might already be highly accurate and fairly close enough to the native structures [57,58]. Therefore, any “refining” of the starting models may be more likely to lead to deterioration in model quality, instead. With regard to the above, it is far harder to improve the quality of the predicted 3D models generated by TBM, compared to FM targets, as TBM models are often already highly accurate. In other words, the refinement of provided starting models that are already far away from the native structure are much easier to improve, and they more likely to improve in any refinement process, as there is more room for improvement to be made [5,57,58,64,83].

The selection of the CASP targets is also an important factor affecting the success of the refinement approaches. Small domains and domains that are free of crystal and oligomeric contacts have been preferred in previous CASP experiments [58]. Nevertheless, it is problematic to identify the target difficulties and compare performances across CASP refinement datasets [64]. For example, relatively bigger and oligomeric structures were selected as some of the refinement targets in CASP13, and such targets were far harder to refine than single small domains provided in previous CASPs.

The assessment criteria of CASP in the refinement category are mainly based on the comparison of the predicted 3D models with the native structure, utilising a wide range of measurements [64]. The alpha-carbon geometry and the backbone distance of the predicted models with the native structure are also the major component of the measurements based on superposition, particularly in the Template Modeling (TM)-Score [167]. Short-range contacts, including side-chain interactions, van der Waals clashes, and different elements in the structure are also taken into account by using the Ramachandran map along with the backbone units [57,58]. CASP assessors measure the global quality of predicted and refined models using the Global Distance Test (GDT) [168,169] (GDT_TS and GDT_HA) scores, and the Root Mean Square Deviation (RMSD) score, based on C-alpha atom superposition [57,58,167,168]. To measure the local quality of the models, the MolProbity [170] and SphereGrinder [171] (SphGr) scores have been used. The Local Distance Difference Test (LDDT) [172] score has also been used as a local and superposition-free measurement since CASP11 [65]. The global and local scores are combined into a weighted Z-score, in order to rank the models. The Z-score has been upgraded, using a machine learning algorithm, a Contact Area Difference Score [173] (CAD), and a Quality Control Score [174] (QCS), to compare performance in CASP12 [53].

It should be noted that the protein structures are flexible and can be observed in different conformations. The flexibility of the protein structures is a vital concept to consider, in terms of their functions; however, flexible regions are often not considered in CASP evaluations [58,175,176,177]. Although the experimental structures determined by NMR, X-ray crystallography, and cryo-electron microscopy represent an average conformation, average conformations are not perfect enough to justify their use in refinement approaches [58,178,179,180,181]. Therefore, non-native dependent measurements, such as the MolProbity score, could be considered more in the Z-score formula. Furthermore, the major CASP measurements, such as GDT-TS, GDT-HA, and RMS_CA, rely on backbone superposition, but the rate of the side-chain and local interactions could also be given more emphasis in the formula, depending on the interactions in the targets [53,57,58,64,65].

The refinement prediction groups in CASP are asked to submit up to five predicted or refined models, from the best to the worst under time constraints, and the first submitted model is assumed as the best model chosen by each group [53,57,58,64,65]. Submitting five predictions also enables groups to test different sampling approaches. In the CASP9 experiment, it was noticed that the prediction groups often had difficulties in ranking their structures accurately, and there were just a couple of groups who were able to rank their models better than a random selection [58]. Therefore, CASP assessors developed a new assessment method, called “cherry-picking”, as a second set of analysis [58]. The cherry-picked analysis considered the overall score as the best model, due to the lack of an accurate order of submitted models. However, accurate rank order of predictions is an important part of any 3D model selection process [58]. For example, MD-based approaches generate hundreds of models, so it is necessary to be able to accurately order the models for practical purposes. This issue highlights the importance of the scoring stage, but, presently, the CASP assessors do not evaluate the sampling and scoring methods independently in the refinement category. The need for identifying the best model was also emphasised in the following CASP experiments [53,58,64,65].

The sampling and scoring stages are different processes, and the best sampling or scoring groups have not been clearly distinguished in recent CASP experiments [57,58]. If prediction groups were to be able to submit more models, besides the top five models, then refinement methods could perhaps be assessed in terms of the sampling and scoring aspects. Such a separation of evaluation may help to boost the improvement of refinement methods. The relationship between sampling and scoring is complicated, and a strong correlation has not been found between observed scores and the available scoring methods [58]. Nevertheless, submitting additional models would bring an additional workload for CASP predictors and assessors; thus, a more pragmatic strategy may need to be devised.

4.2. Progress with Refinement Strategies

It is noteworthy that, in the last 12 years, significant progress has been witnessed in the refinement category, since it was introduced in CASP8 [57]. However, initially, the top groups in CASP8 did not make any measurable improvement in performance in CASP9 [57,58]. It was also reported that the refinement approaches tested in CASP9 were found to be conservative, in terms of improving the starting models, and were not successful at correctly ranking the order of the submitted five models [57,58]. In CASP9, some hints from the assessors about accurate and problematic regions and the GDT-HA and GDT-TS scores of the starting models were also shared with prediction groups during the CASP experiment [58], although it is not known how many groups made good use of this information.

Although the cherry-picking approach was taken into consideration while analysing the performance of the refinement groups participating in CASP9, significant progress was not observed [58]. The overall score of the refined models was much lower than the starting models in CASP9 [58]. It was also observed that the conservative strategies were less likely to worsen the starting models than the more adventurous MD-based strategies. On the other hand, some of the MD-based approaches tested in CASP9 showed promising performance, in terms of sampling [58].

In CASP10, the leading groups managed to increase the accuracy of the backbone and side-chain interactions in most of the refinement targets [64]. However, the overall performance of most of the groups indicated that they were not able to consistently improve upon the starting models. The groups using MD-based approaches with access to advanced supercomputer facilities have opened a new epoch in the refinement of protein structures since CASP10, and they have generally performed much better than the knowledge-based approaches [64]. Significant energy changes were also observed among models generated by the more adventurous MD groups in CASP10, and energy scoring appeared to be more worthwhile information to be utilised by the scoring methods [64]. The top five groups also managed to improve their methods in CASP11 with the same pace gained in CASP10 [64,65]. Furthermore, the majority of the groups had improved more than half of the refinement targets in CASP11 [64,65].

While a modest improvement was seen in CASP8 and CASP9, compared to CASP10 [57,58,64], the progress in the MD-based approaches has led to successive gains in accuracy since CASP10 [65]. The growing trend in the consistency of the refinement of 3D models has been consolidated in CASP11 and CASP12 [53,65]. Although the targets were difficult, the refinement approaches tested in CASP12 have shown a considerable improvement over CASP11 [53]. The diversity of the refinement approaches in CASP12 is also promising for the future of the refinement, [53,166]. The numbers of targets and groups have increased dramatically since CASP8, from 12 to 42 targets and from 24 to 39 prediction groups in CASP12 [53,58,64,65]. In CASP13, many new hybrid refinement protocols emerged, using new restraint strategies and scoring functions, including energy functions and MQAPs [113]. These new methods performed well, in terms of increasing the accuracy of initial models, although the refinement targets were larger and more difficult, compared to previous CASPs.

One of the headline-grabbing groups from CASP13 was DeepMind, with their AlphaFold method for template-free modelling [182] however, the group did not participate in the refinement category. The success of the group in the free modelling category was partly due their accurate prediction of inter-residue distances. These more precise predictions could be used to enhance contact-based restraints in future refinement strategies.

5. Conclusions

The accuracy of 3D predicted models is a key factor for furthering in silico studies, particularly where experimental knowledge is scarce. Near-experimental accuracy is often required to properly understand the functional role of a protein, and the accuracy degree may vary, depending on the type of the computational application. Building 3D models with TBM and FM methods may not always be adequate to meet the required accuracy level for some biological applications, due to the unavailability of a suitable template and modelling errors, including irregular bonds and angles. Therefore, the refinement of predicted 3D structure is crucial for increasing the accuracy of initial structures and correction of local errors. Unfortunately, it is still challenging to deliver consistent refinement of 3D protein models, especially at high resolutions, as there is less room for improving the already highly-accurate predicted structures. The refinement of predicted 3D models consists of two independent stages—the sampling and scoring of refined models—and both should be the focus of future assessments, in order for us to gauge where progress is being made.

In the sampling stage, many different strategies, from rapid automated servers to highly computationally-intensive MD methods, have been suggested for improving initial structures towards the native basin. The MD-based sampling strategies have the potential to reach near-experimental accuracies with improvements in computing power and scoring methods. Unfortunately, the most successful approaches still require supercomputer-scale resources, which makes them less practical and may put them out of reach of general biologists.

Although the current force fields perform well, in terms of directing the initial structures towards the native structure, structural deviations are often encountered in MD simulations, due to imperfections. A wide range of restraint strategies, based on the knowledge of the native structures, have been applied to avoid structural deviations. The partial restraints, particularly based on known structures, may provide more reliable guidance for protein model refinement towards the native basin, compared to restraining the whole structure, as the application of restraints on poorly-predicted regions may limit the scope for refinement. For instance, the local quality assessment scores produced by MQAPs can provide an alternative approach for determining poorly-predicted regions, which could lead to more focused refinement, instead of refining or restraining the whole structures [43,77,88,111,112,156,183,184].

There are a few groups in CASP who start from sequences to build 3D models, assess the 3D models, and finally refine the best predictions. Our group (the McGuffin group) is one of the leading groups, in terms of producing local quality assessment scores, and our local quality assessment score is used to guide our short and fast MD-based refinement approach, which we tested in CASP13. The approach (ReFOLD2) is perhaps the first attempt at using local quality assessment scores to guide the MD simulation and assess the sampled 3D models. The aim of this approach is to more consistently refine the predicted 3D models with far less computational effort, by using the guidance of the predicted per-residue errors.

The accuracy of the scoring functions, including energy functions and MQAPs, is crucial for successful prediction and refinement. The 3D models generated by the sampling approaches are structurally very similar and, so, consistently distinguishing the most native-like states from non-native conformations, using either energy functions or MQAPs, still remains an unsolved problem.

The prediction of protein structures is a step towards computational functional analyses, but interactions with ligands, ions, and proteins are also important for determining protein functions. Therefore, ideally, the refinement of 3D models should also include oligomeric states and protein–ligand complexes. In the real world, proteins are always interacting with various ligands, such as ions, inhibitors, and peptides. Therefore, the refinement of protein models might still be somewhat artificial, if they do not also consider more complete molecular systems.

Author Contributions

R.A. drafted the manuscript, contributed text and figures, and carried out initial editing of the manuscript; L.J.M. conceived the idea and carried out final editing of the manuscript and figures. All authors read and approved the final manuscript.

Funding

This research was supported by the Republic of Turkey Ministry of National Education (to R.A.).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

NMR	Nuclear Magnetic Resonance
CPU	Central Processing Unit
GPU	graphics processing unit
Cryo-EM	cryo-electron microscopy
PDB	Protein Data Bank
CASP	Critical Assessment of techniques for Structure Prediction
CHARMM	Chemistry at Harvard Macromolecular Mechanics
SVM	Support Vector Machine
NAMD	Nanoscale Molecular Dynamics
LDDT	Local Distance Difference Test on All Atoms
TBM	Template-Based Modelling
FM	Free Modelling
MQAPs	Model Quality Assessment Programs
MD	Molecular Dynamics
DFIRE	Distance-Scaled, Finite-Ideal Gas Reference
DDFIRE	Dipolar Distance-Scaled, Ideal Gas Reference
RWplus	Random Walk reference state Plus
GDT-TS	Global Distance Test Total Score
GDT_HA	Global Distance Test High Accuracy
SphGr	SphereGrinder
RMSD	Root mean square deviation
TM-Score	Template Modeling Score

References

McGuffin, L.J. Aligning Sequences to Structures. In Protein Structure Prediction; Humana Press: Totowa, NJ, USA, 2008; pp. 61–90. [Google Scholar]
McGuffin, L.J. Protein Fold Recognition and Threading. In Computational Structural Biology; WORLD SCIENTIFIC: Singapore, 2008; pp. 37–60. [Google Scholar]
Perutz, M.F.; Rossmann, M.G.; Cullis, A.F.; Muirhead, H.; Will, G.; North, A.C.T. Structure of Hæmoglobin: A Three-Dimensional Fourier Synthesis at 5.5-Å. Resolution, Obtained by X-Ray Analysis. Nature 1960, 185, 416–422. [Google Scholar] [CrossRef] [PubMed]
Kendrew, J.C.; Bodo, G.; Dintzis, H.M.; Parrish, R.G.; Wyckoff, H.; Phillips, D.C. A Three-Dimensional Model of the Myoglobin Molecule Obtained by X-Ray Analysis. Nature 1958, 181, 662–666. [Google Scholar] [CrossRef] [PubMed]
Feig, M. Computational protein structure refinement: Almost there, yet still so far to go. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2017, 7, e1307. [Google Scholar] [CrossRef]
Petsko, G.A.; Ringe, D. Protein Structure and Function; New Science Press: London, UK, 2004; ISBN 9781405119221. [Google Scholar]
Drenth, J. Principles of Protein X-Ray Crystallography. Springer: Berlin/Heidelberg, Germany, 1999; ISBN 0387985875. [Google Scholar]
Heinemann, U.; Frevert, J.; Hofman, K.-P.; Illing, G.; Oschkinat, H.; Saenger, W.; Zettl, R. Linking Structural Biology With Genome Research. In Genomics and Proteomics; Kluwer Academic Publishers: Boston, MA, USA, 2002; pp. 179–189. [Google Scholar]
Murata, K.; Wolf, M. Cryo-electron microscopy for structural analysis of dynamic biological macromolecules. Biochim. Biophys. Acta Gen. Subj. 2018, 1862, 324–334. [Google Scholar] [CrossRef] [PubMed]
Jonic, S.; Vénien-Bryan, C. Protein structure determination by electron cryo-microscopy. Curr. Opin. Pharmacol. 2009, 9, 636–642. [Google Scholar] [CrossRef] [PubMed]
Brocchieri, L.; Karlin, S. Protein length in eukaryotic and prokaryotic proteomes. Nucleic Acids Res. 2005, 33, 3390–3400. [Google Scholar] [CrossRef] [PubMed]
Rangwala, H.; Karypis, G. Introduction to Protein Structure Prediction: Methods and Algorithms; Wiley: New York, NY, USA, 2010; ISBN 9780470470596. [Google Scholar]
Roche, D.B.; Buenavista, M.T.; McGuffin, L.J. Protein Structure Prediction and Structural Annotation of Proteomes. In Encyclopedia of Biophysics; Roberts, G.C.K., Ed.; Springer: Berlin/Heidelberg, Germany, 2013; pp. 2061–2068. [Google Scholar]
Stoker, H.S. Organic and Biological Chemistry, 6th ed.; White, A., Ed.; Cengage Learning, Brooks/Cole: Boston, MA, USA, 2013; ISBN 1133103952. [Google Scholar]
Roche, D.B.; Buenavista, M.T.; McGuffin, L.J. FunFOLDQA: A Quality Assessment Tool for Protein-Ligand Binding Site Residue Predictions. PloS ONE 2012, 7, e38219. [Google Scholar] [CrossRef] [PubMed]
Pavlopoulou, A.; Michalopoulos, I. State-of-the-art bioinformatics protein structure prediction tools (Review). Int. J. Mol. Med. 2011, 28, 295–310. [Google Scholar]
Moult, J.; Fidelis, K.; Zemla, A.; Hubbard, T. Critical assessment of methods of protein structure prediction (CASP)-round V. Proteins Struct. Funct. Genet. 2003, 53, 334–339. [Google Scholar] [CrossRef]
Bradley, P.; Misura, K.M.S.; Baker, D. Toward High-Resolution de Novo Structure Prediction for Small Proteins. Science 2005, 309, 1868–1871. [Google Scholar] [CrossRef] [PubMed]
Zhang, C.; Liu, S.; Zhu, Q.; Zhou, Y. A Knowledge-Based Energy Function for Protein–Ligand, Protein–Protein, and Protein–DNA Complexes. J. Med. Chem. 2005, 48, 2325–2335. [Google Scholar] [CrossRef]
Ginalski, K.; Grishin, N.V.; Godzik, A.; Rychlewski, L. Practical lessons from protein structure prediction. Nucleic Acids Res. 2005, 33, 1874–1891. [Google Scholar] [CrossRef] [PubMed]
Lee, J.; Wu, S.; Zhang, Y. Ab Initio Protein Structure Prediction. In From Protein Structure to Function with Bioinformatics; Springer: Dordrecht, The Netherlands, 2009; pp. 3–25. [Google Scholar]
Moult, J.; Fidelis, K.; Kryshtafovych, A.; Rost, B.; Hubbard, T.; Tramontano, A. Critical assessment of methods of protein structure prediction—Round VII. Proteins Struct. Funct. Bioinform. 2007, 69, 3–9. [Google Scholar] [CrossRef]
Moult, J.; Fidelis, K.; Kryshtafovych, A.; Schwede, T.; Tramontano, A. Critical assessment of methods of protein structure prediction: Progress and new directions in round XI. Proteins Struct. Funct. Bioinform. 2016, 84, 4–14. [Google Scholar] [CrossRef]
Moult, J.; Fidelis, K.; Rost, B.; Hubbard, T.; Tramontano, A. Critical assessment of methods of protein structure prediction (CASP)—Round 6. Proteins Struct. Funct. Bioinform. 2005, 61, 3–7. [Google Scholar] [CrossRef]
Moult, J.; Pedersen, J.T.; Judson, R.; Fidelis, K. A large-scale experiment to assess protein structure prediction methods. Proteins Struct. Funct. Genet. 1995, 23, ii–iv. [Google Scholar] [CrossRef]
Moult, J. A decade of CASP: Progress, bottlenecks and prognosis in protein structure prediction. Curr. Opin. Struct. Biol. 2005, 15, 285–289. [Google Scholar] [CrossRef] [PubMed]
Tramontano, A.; Morea, V. Assessment of homology-based predictions in CASP5. Proteins Struct. Funct. Genet. 2003, 53, 352–368. [Google Scholar] [CrossRef] [PubMed]
Lance, B.K.; Deane, C.M.; Wood, G.R. Exploring the potential of template-based modelling. Bioinformatics 2010, 26, 1849–1856. [Google Scholar] [CrossRef]
Joo, K.; Lee, J.; Lee, S.; Seo, J.-H.; Lee, S.J.; Lee, J. High accuracy template based modeling by global optimization. Proteins Struct. Funct. Bioinforma. 2007, 69, 83–89. [Google Scholar] [CrossRef]
Roy, A.; Kucukural, A.; Zhang, Y. I-TASSER: A unified platform for automated protein structure and function prediction. Nat. Protoc. 2010, 5, 725–738. [Google Scholar] [CrossRef]
Šali, A.; Blundell, T.L. Comparative Protein Modelling by Satisfaction of Spatial Restraints. J. Mol. Biol. 1993, 234, 779–815. [Google Scholar] [CrossRef] [PubMed]
Mirjalili, V.; Noyes, K.; Feig, M. Physics-based protein structure refinement through multiple molecular dynamics trajectories and structure averaging. Proteins Struct. Funct. Bioinform. 2014, 82, 196–207. [Google Scholar] [CrossRef] [PubMed]
Bernstein, F.C.; Koetzle, T.F.; Williams, G.J.B.; Meyer, E.F.; Brice, M.D.; Rodgers, J.R.; Kennard, O.; Shimanouchi, T.; Tasumi, M. The Protein Data Bank. A Computer-Based Archival File for Macromolecular Structures. Eur. J. Biochem. 1977, 80, 319–324. [Google Scholar] [CrossRef]
Berman, H.M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T.N.; Weissig, H.; Shindyalov, I.N.; Bourne, P.E. The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235–242. [Google Scholar] [CrossRef]
Fischer, D. 3D-SHOTGUN: A novel, cooperative, fold-recognition meta-predictor. Proteins Struct. Funct. Genet. 2003, 51, 434–441. [Google Scholar] [CrossRef] [PubMed]
Montelione, G.T. Structural genomics: An approach to the protein folding problem. Proc. Natl. Acad. Sci. USA 2001, 98, 13488–13489. [Google Scholar] [CrossRef]
Westbrook, J.; Feng, Z.; Chen, L.; Yang, H.; Berman, H.M. The Protein Data Bank and structural genomics. Nucleic Acids Res. 2003, 31, 489–491. [Google Scholar] [CrossRef] [PubMed]
Gerstein, M.; Edwards, A.; Arrowsmith, C.H.; Montelione, G.T. Structural genomics: Current progress. Science 2003, 299, 1663. [Google Scholar] [CrossRef]
Baker, D.; Sali, A. Protein structure prediction and structural genomics. Science 2001, 294, 93–96. [Google Scholar] [CrossRef]
Roche, D.B.; Buenavista, M.T.; Tetchner, S.J.; McGuffin, L.J. The IntFOLD server: An integrated web resource for protein fold recognition, 3D model quality assessment, intrinsic disorder prediction, domain prediction and ligand binding site prediction. Nucleic Acids Res. 2011, 39, 171–176. [Google Scholar] [CrossRef] [PubMed]
Bhattacharya, D.; Cheng, J. 3Drefine: Consistent protein structure refinement by optimizing hydrogen bonding network and atomic-level energy minimization. Proteins 2013, 81, 119–131. [Google Scholar] [CrossRef]
McGuffin, L.J.; Buenavista, M.T.; Roche, D.B. The ModFOLD4 server for the quality assessment of 3D protein models. Nucleic Acids Res. 2013, 41, 1–5. [Google Scholar] [CrossRef] [PubMed]
Shuid, A.N.; Kempster, R.; McGuffin, L.J. ReFOLD: A server for the refinement of 3D protein models guided by accurate quality estimates. Nucleic Acids Res. 2017, 45, W422–W428. [Google Scholar] [CrossRef] [PubMed]
Brylinski, M.; Skolnick, J. A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation. Proc. Natl. Acad. Sci. USA 2008, 105, 129–134. [Google Scholar] [CrossRef] [PubMed]
Bonneau, R.; Tsai, J.; Ruczinski, I.; Baker, D. Functional Inferences from Blind ab Initio Protein Structure Predictions. J. Struct. Biol. 2001, 134, 186–190. [Google Scholar] [CrossRef] [PubMed]
Wieman, H.; Tøndel, K.; Anderssen, E.; Drabløs, F. Homology-based modelling of targets for rational drug design. Mini Rev. Med. Chem. 2004, 4, 793–804. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y. Protein structure prediction: When is it useful? Curr. Opin. Struct. Biol. 2009, 19, 145–155. [Google Scholar] [CrossRef]
Mirjalili, V.; Feig, M. Protein Structure Refinement through Structure Selection and Averaging from Molecular Dynamics Ensembles. J. Chem. Theory Comput. 2013, 9, 1294–1303. [Google Scholar] [CrossRef]
Laskowski, R.A.; Watson, J.D.; Thornton, J.M. ProFunc: A server for predicting protein function from 3D structure. Nucleic Acids Res. 2005, 33, W89–W93. [Google Scholar] [CrossRef]
Becker, O.M.; Dhanoa, D.S.; Marantz, Y.; Chen, D.; Shacham, S.; Cheruku, S.; Heifetz, A.; Mohanty, P.; Fichman, M.; Sharadendu, A.; et al. An Integrated in Silico 3D Model-Driven Discovery of a Novel, Potent, and Selective Amidosulfonamide 5-HT1A Agonist (PRX-00023) for the Treatment of Anxiety and Depression. J. Med. Chem. 2006, 49, 3116–3135. [Google Scholar] [CrossRef]
Ekins, S.; Mestres, J.; Testa, B. In silico pharmacology for drug discovery: Applications to targets and beyond. Br. J. Pharmacol. 2007, 152, 21–37. [Google Scholar] [CrossRef]
Bhattacharya, D.; Cheng, J. i3Drefine Software for Protein 3D Structure Refinement and Its Assessment in CASP10. PloS ONE 2013, 8. [Google Scholar] [CrossRef] [PubMed]
Hovan, L.; Oleinikovas, V.; Yalinca, H.; Kryshtafovych, A.; Saladino, G.; Gervasio, F.L. Assessment of the model refinement category in CASP12. Proteins Struct. Funct. Bioinforma. 2018, 86, 152–167. [Google Scholar] [CrossRef]
Heo, L.; Park, H.; Seok, C. GalaxyRefine: Protein structure refinement driven by side-chain repacking. Nucleic Acids Res. 2013, 41, 384–388. [Google Scholar] [CrossRef] [PubMed]
Heo, L.; Feig, M. What makes it difficult to refine protein models further via molecular dynamics simulations? Proteins Struct. Funct. Bioinform. 2018, 86, 177–188. [Google Scholar] [CrossRef]
Khoury, G.A.; Smadbeck, J.; Kieslich, C.A.; Koskosidis, A.J.; Guzman, Y.A.; Tamamis, P.; Floudas, C.A. Princeton_TIGRESS 2.0: High refinement consistency and net gains through support vector machines and molecular dynamics in double-blind predictions during the CASP11 experiment. Proteins Struct. Funct. Bioinform. 2017, 85, 1078–1098. [Google Scholar] [CrossRef] [PubMed]
MacCallum, J.L.; Hua, L.; Schnieders, M.J.; Pande, V.S.; Jacobson, M.P.; Dill, K.A. Assessment of the protein-structure refinement category in CASP8. Proteins Struct. Funct. Bioinform. 2009, 77, 66–80. [Google Scholar] [CrossRef]
MacCallum, J.L.; Pérez, A.; Schnieders, M.J.; Hua, L.; Jacobson, M.P.; Dill, K.A. Assessment of protein structure refinement in CASP9. Proteins Struct. Funct. Bioinform. 2011, 79, 74–90. [Google Scholar] [CrossRef]
Terashi, G.; Kihara, D. Protein structure model refinement in CASP12 using short and long molecular dynamics simulations in implicit solvent. Proteins Struct. Funct. Bioinform. 2018, 86, 189–201. [Google Scholar] [CrossRef]
Meiler, J.; Baker, D. Rapid protein fold determination using unassigned NMR data. Proc. Natl. Acad. Sci. USA 2003, 100, 15404–15409. [Google Scholar] [CrossRef]
Sliwoski, G.; Kothiwale, S.; Meiler, J.; Lowe, E.W. Computational methods in drug discovery. Pharmacol. Rev. 2014, 66, 334–395. [Google Scholar] [CrossRef] [PubMed]
Giorgetti, A.; Raimondo, D.; Miele, A.E.; Tramontano, A. Evaluating the usefulness of protein structure models for molecular replacement. Bioinformatics 2005, 21, ii72–ii76. [Google Scholar] [CrossRef] [PubMed]
Qian, B.; Raman, S.; Das, R.; Bradley, P.; McCoy, A.J.; Read, R.J.; Baker, D. High-resolution structure prediction and the crystallographic phase problem. Nature 2007, 450, 259–264. [Google Scholar] [CrossRef]
Nugent, T.; Cozzetto, D.; Jones, D.T. Evaluation of predictions in the CASP10 model refinement category. Proteins Struct. Funct. Bioinform. 2014, 82, 98–111. [Google Scholar] [CrossRef] [PubMed]
Modi, V.; Dunbrack, R.L. Assessment of refinement of template-based models in CASP11. Proteins 2016, 260–281. [Google Scholar] [CrossRef]
Rodrigues, J.P.G.L.M.; Levitt, M.; Chopra, G. KoBaMIN: A knowledge-based minimization web server for protein structure refinement. Nucleic Acids Res. 2012, 40, 323–328. [Google Scholar] [CrossRef] [PubMed]
Xu, D.; Zhang, Y. Improving the physical realism and structural accuracy of protein models by a two-step atomic-level energy minimization. Biophys. J. 2011, 101, 2525–2534. [Google Scholar] [CrossRef] [PubMed]
Misura, K.M.S.S.; Baker, D. Progress and challenges in high-resolution refinement of protein structure models. Proteins Struct. Funct. Genet. 2005, 59, 15–29. [Google Scholar] [CrossRef]
Jagielska, A.; Wroblewska, L.; Skolnick, J. Protein model refinement using an optimized physics-based all-atom force field. Proc. Natl. Acad. Sci. USA 2008, 105, 8268–8273. [Google Scholar] [CrossRef] [PubMed]
Lin, M.S.; Head-Gordon, T. Reliable protein structure refinement using a physical energy function. J. Comput. Chem. 2011, 32, 709–717. [Google Scholar] [CrossRef] [PubMed]
Lu, H.; Skolnick, J. Application of statistical potentials to protein structure refinement from low resolutionab initio models. Biopolymers 2003, 70, 575–584. [Google Scholar] [CrossRef] [PubMed]
Chopra, G.; Kalisman, N.; Levitt, M. Consistent refinement of submitted models at CASP using a knowledge-based potential. Proteins Struct. Funct. Bioinform. 2010, 78, 2668–2678. [Google Scholar] [CrossRef]
Han, R.; Leo-Macias, A.; Zerbino, D.; Bastolla, U.; Contreras-Moreira, B.; Ortiz, A.R. An efficient conformational sampling method for homology modeling. Proteins Struct. Funct. Bioinform. 2008, 71, 175–188. [Google Scholar] [CrossRef] [PubMed]
Kim, D.E.; Blum, B.; Bradley, P.; Baker, D. Sampling Bottlenecks in De novo Protein Structure Prediction. J. Mol. Biol. 2009, 393, 249–260. [Google Scholar] [CrossRef]
Leaver-Fay, A.; Tyka, M.; Lewis, S.M.; Lange, O.F.; Thompson, J.; Jacak, R.; Kaufman, K.W.; Renfrew, P.D.; Smith, C.A.; Sheffler, W.; et al. Rosetta3: An Object-Oriented Software Suite for the Simulation and Design of Macromolecules. Methods Enzymol. 2011, 487, 545–574. [Google Scholar]
Song, Y.; DiMaio, F.; Wang, R.Y.-R.; Kim, D.; Miles, C.; Brunette, T.; Thompson, J.; Baker, D. High-Resolution Comparative Modeling with RosettaCM. Structure 2013, 21, 1735–1742. [Google Scholar] [CrossRef] [PubMed]
Ovchinnikov, S.; Park, H.; Kim, D.E.; DiMaio, F.; Baker, D. Protein structure prediction using Rosetta in CASP12. Proteins Struct. Funct. Bioinform. 2018, 86, 113–121. [Google Scholar] [CrossRef]
Summa, C.M.; Levitt, M. Near-native structure refinement using in vacuo energy minimization. Proc. Natl. Acad. Sci. USA 2007, 104, 3177–3182. [Google Scholar] [CrossRef]
Fan, H.; Mark, A.E. Refinement of homology-based protein structures by molecular dynamics simulation techniques. Protein Sci. 2004, 13, 211–220. [Google Scholar] [CrossRef]
Chen, J.; Brooks, C.L. Can molecular dynamics simulations provide high-resolution refinement of protein structure? Proteins Struct. Funct. Bioinform. 2007, 67, 922–930. [Google Scholar] [CrossRef]
Ishitani, R.; Terada, T.; Shimizu, K. Refinement of comparative models of protein structure by using multicanonical molecular dynamics simulations. Mol. Simul. 2008, 34, 327–336. [Google Scholar] [CrossRef]
Kannan, S.; Zacharias, M. Application of biasing-potential replica-exchange simulations for loop modeling and refinement of proteins in explicit solvent. Proteins Struct. Funct. Bioinform. 2010, 78, 2809–2819. [Google Scholar] [CrossRef]
Gront, D.; Kmiecik, S.; Blaszczyk, M.; Ekonomiuk, D.; Koliński, A. Optimization of protein models. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2012, 2, 479–493. [Google Scholar] [CrossRef]
Chen, J.; Brooks, C.L.; Khandogin, J. Recent advances in implicit solvent-based methods for biomolecular simulations. Curr. Opin. Struct. Biol. 2008, 18, 140–148. [Google Scholar] [CrossRef]
Heo, L.; Feig, M. PREFMD: A web server for protein structure refinement via molecular dynamics simulations. Bioinformatics 2018, 34, 1063–1065. [Google Scholar] [CrossRef]
Feig, M. Local Protein Structure Refinement via Molecular Dynamics Simulations with locPREFMD. J. Chem. Inf. Model. 2016, 56, 1304–1312. [Google Scholar] [CrossRef]
Lindorff-Larsen, K.; Piana, S.; Palmo, K.; Maragakis, P.; Klepeis, J.L.; Dror, R.O.; Shaw, D.E. Improved side-chain torsion potentials for the Amber ff99SB protein force field. Proteins Struct. Funct. Bioinform. 2010, 78, 1950–1958. [Google Scholar] [CrossRef]
Lee, G.R.; Heo, L.; Seok, C. Effective protein model structure refinement by loop modeling and overall relaxation. Proteins Struct. Funct. Bioinform. 2016, 84, 293–301. [Google Scholar] [CrossRef] [PubMed]
Khoury, G.A.; Tamamis, P.; Pinnaduwage, N.; Smadbeck, J.; Kieslich, C.A.; Floudas, C.A. Princeton_TIGRESS: Protein geometry refinement using simulations and support vector machines. Proteins Struct. Funct. Bioinform. 2014, 82, 794–814. [Google Scholar] [CrossRef] [PubMed]
Raval, A.; Piana, S.; Eastwood, M.P.; Dror, R.O.; Shaw, D.E. Refinement of protein structure homology models via long, all-atom molecular dynamics simulations. Proteins Struct. Funct. Bioinform. 2012, 80, 2071–2079. [Google Scholar] [CrossRef] [PubMed]
Lindorff-Larsen, K.; Piana, S.; Dror, R.O.; Shaw, D.E. How Fast-Folding Proteins Fold. Science 2011, 334, 517–520. [Google Scholar] [CrossRef] [PubMed]
Zhu, J.; Fan, H.; Periole, X.; Honig, B.; Mark, A.E. Refining homology models by combining replica-exchange molecular dynamics and statistical potentials. Proteins Struct. Funct. Bioinform. 2008, 72, 1171–1188. [Google Scholar] [CrossRef] [PubMed]
Lee, M.R.; Tsai, J.; Baker, D.; Kollman, P.A. Molecular dynamics in the endgame of protein structure prediction. J. Mol. Biol. 2001, 313, 417–430. [Google Scholar] [CrossRef] [PubMed]
Heo, L.; Feig, M. Experimental accuracy in protein structure refinement via molecular dynamics simulations. Proc. Natl. Acad. Sci. USA 2018, 115, 13276–13281. [Google Scholar] [CrossRef]
Lindorff-Larsen, K.; Maragakis, P.; Piana, S.; Eastwood, M.P.; Dror, R.O.; Shaw, D.E. Systematic Validation of Protein Force Fields against Experimental Data. PloS ONE 2012, 7, e32131. [Google Scholar] [CrossRef]
Huang, J.; Rauscher, S.; Nawrocki, G.; Ran, T.; Feig, M.; de Groot, B.L.; Grubmüller, H.; MacKerell, A.D. CHARMM36m: An improved force field for folded and intrinsically disordered proteins. Nat. Methods 2017, 14, 71–73. [Google Scholar] [CrossRef] [PubMed]
Best, R.B.; Zhu, X.; Shim, J.; Lopes, P.E.M.; Mittal, J.; Feig, M.; MacKerell, A.D. Optimization of the Additive CHARMM All-Atom Protein Force Field Targeting Improved Sampling of the Backbone ϕ, ψ and Side-Chain χ ₁ and χ ₂ Dihedral Angles. J. Chem. Theory Comput. 2012, 8, 3257–3273. [Google Scholar] [CrossRef] [PubMed]
Best, R.B.; Buchete, N.-V.; Hummer, G. Are Current Molecular Dynamics Force Fields too Helical? Biophys. J. 2008, 95, L07–L09. [Google Scholar] [CrossRef]
Maier, J.A.; Martinez, C.; Kasavajhala, K.; Wickstrom, L.; Hauser, K.E.; Simmerling, C. ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. J. Chem. Theory Comput. 2015, 11, 3696–3713. [Google Scholar] [CrossRef]
Robertson, M.J.; Tirado-Rives, J.; Jorgensen, W.L. Improved Peptide and Protein Torsional Energetics with the OPLS-AA Force Field. J. Chem. Theory Comput. 2015, 11, 3499–3509. [Google Scholar] [CrossRef]
MacPherson, S.; Larochelle, M.; Turcotte, B.; Link, C.; Kann, M.; Swinkels, J.W.; Kornegay, E.T.; Verstegen, M.W.; Tertiary, P.; Structures, Q.; et al. Computational Protein Design: A problem in combinatorial optimization. What is a protein? Nutr. Res. Rev. 2004, 11, 205–229. [Google Scholar]
MacKerell, A.D.; Banavali, N.; Foloppe, N. Development and current status of the CHARMM force field for nucleic acids. Biopolymers 2001, 56, 257–265. [Google Scholar] [CrossRef]
MacKerell, A.D.; Feig, M.; Brooks, C.L. Extending the treatment of backbone energetics in protein force fields. J. Comp. Chem. 2004, 25, 1400–1415. [Google Scholar] [CrossRef] [PubMed]
Ovchinnikov, S.; Kim, D.E.; Wang, R.Y.-R.; Liu, Y.; DiMaio, F.; Baker, D. Improved de novo structure prediction in CASP11 by incorporating coevolution information into Rosetta. Proteins Struct. Funct. Bioinforma. 2016, 84, 67–75. [Google Scholar] [CrossRef]
Case, D.A.; Darden, T.; Cheatham, T.E.; Wang, J.; Duck, R.E.; Luo, R.; Walker, R.C.; Zhang, W.; Merz, K.M.; Roberts, B.P.; et al. Amber 12; University of California: San Francisco, CA, USA, 2012. [Google Scholar]
Cheng, Q.; Joung, I.; Lee, J. A Simple and Efficient Protein Structure Refinement Method. J. Chem. Theory Comput. 2017, 13, 5146–5162. [Google Scholar] [CrossRef] [PubMed]
Park, I.-H.; Gangupomu, V.; Wagner, J.; Jain, A.; Vaidehi, N. Structure Refinement of Protein Low Resolution Models Using the GNEIMO Constrained Dynamics Method. J. Phys. Chem. B 2012, 116, 2365–2375. [Google Scholar] [CrossRef]
Feig, M.; Mirjalili, V. Protein Structure Refinement via Molecular-Dynamics Simulations: What works and what does not? Proteins Struct. Funct. Bioinforma. 2016, 84, 282–292. [Google Scholar] [CrossRef]
Cao, W.; Terada, T.; Nakamura, S.; Shimizu, K. Refinement of Comparative-Modeling Structures by Multicanonical Molecular Dynamics. Genome Inform. 2003, 14, 484–485. [Google Scholar]
Zhang, J.; Liang, Y.; Zhang, Y. Atomic-Level Protein Structure Refinement Using Fragment-Guided Molecular Dynamics Conformation Sampling. Structure 2011, 19, 1784–1795. [Google Scholar] [CrossRef]
Park, H.; Seok, C. Refinement of Unreliable Local Regions in Template-based Protein Models. Proteins Struct. Funct. Bioinform. 2012, 80, 1974–1986. [Google Scholar] [CrossRef]
Maghrabi, A.H.A.; McGuffin, L.J. ModFOLD6: An accurate web server for the global and local quality estimation of 3D protein models. Nucleic Acids Res. 2017. [Google Scholar] [CrossRef] [PubMed]
Critical Assessment of Techniques for Protein Structure Prediction. 13 Abstracts. Available online: http://predictioncenter.org/casp13/index.cgi (accessed on 2 April 2019).
Seemayer, S.; Gruber, M.; Söding, J. CCMpred—Fast and precise prediction of protein residue-residue contacts from correlated mutations. Bioinformatics 2014, 30, 3128–3130. [Google Scholar] [CrossRef]
Liu, Y.; Palmedo, P.; Ye, Q.; Berger, B.; Peng, J. Enhancing Evolutionary Couplings with Deep Convolutional Neural Networks. Cell Syst. 2018, 6, 65.e3–74.e3. [Google Scholar] [CrossRef] [PubMed]
Heo, L.; Lee, H.; Seok, C. GalaxyRefineComplex: Refinement of protein-protein complex model structures driven by interface repacking. Sci. Rep. 2016, 6, 32153. [Google Scholar] [CrossRef]
Güntert, P. Automated NMR Structure Calculation With CYANA. In Protein NMR Techniques; Humana Press: Totowa, NJ, USA, 2004; Volume 278, pp. 353–378. [Google Scholar]
Yang, Y.; Zhou, Y. Specific interactions for ab initio folding of protein terminal regions with secondary structures. Proteins Struct. Funct. Bioinform. 2008, 72, 793–803. [Google Scholar] [CrossRef]
Cossio, P.; Granata, D.; Laio, A.; Seno, F.; Trovato, A. A simple and efficient statistical potential for scoring ensembles of protein structures. Sci. Rep. 2012, 2, 351. [Google Scholar] [CrossRef]
Kuhlman, B.; Dantas, G.; Ireton, G.C.; Varani, G.; Stoddard, B.L.; Baker, D. Design of a novel globular protein fold with atomic-level accuracy. Science 2003, 302, 1364–1368. [Google Scholar] [CrossRef] [PubMed]
Kalisman, N.; Levi, A.; Maximova, T.; Reshef, D.; Zafriri-Lynn, S.; Gleyzer, Y.; Keasar, C. MESHI: A new library of Java classes for molecular modeling. Bioinformatics 2005, 21, 3931–3932. [Google Scholar] [CrossRef] [PubMed]
Bhattacharya, D.; Nowotny, J.; Cao, R.; Cheng, J. 3Drefine: An interactive web server for efficient protein structure refinement. Nucleic Acids Res. 2016, 44, W406–W409. [Google Scholar] [CrossRef] [PubMed]
Phillips, J.C.; Braun, R.; Wang, W.; Gumbart, J.; Tajkhorshid, E.; Villa, E.; Chipot, C.; Skeel, R.D.; Kalé, L.; Schulten, K. Scalable molecular dynamics with NAMD. J. Comput. Chem. 2005, 26, 1781–1802. [Google Scholar] [CrossRef]
McGuffin, L.J.; Atkins, J.D.; Salehe, B.R.; Shuid, A.N.; Roche, D.B. IntFOLD: An integrated server for modelling protein structures and functions from amino acid sequences. Nucleic Acids Res. 2015, 43, W169–W173. [Google Scholar] [CrossRef] [PubMed]
Rykunov, D.; Fiser, A. New statistical potential for quality assessment of protein models and a survey of energy functions. BMC Bioinform. 2010, 11, 128. [Google Scholar] [CrossRef]
Alford, R.F.; Leaver-Fay, A.; Jeliazkov, J.R.; O’Meara, M.J.; DiMaio, F.P.; Park, H.; Shapovalov, M.V.; Renfrew, P.D.; Mulligan, V.K.; Kappel, K.; et al. The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design. J. Chem. Theory Comput. 2017, 13, 3031–3048. [Google Scholar] [CrossRef] [PubMed]
Park, H.; DiMaio, F.; Baker, D. The Origin of Consistent Protein Structure Refinement from Structural Averaging. Structure 2015, 23, 1123–1128. [Google Scholar] [CrossRef] [PubMed]
Stumpff-Kane, A.W.; Maksimiak, K.; Lee, M.S.; Feig, M. Sampling of near-native protein conformations during protein structure refinement using a coarse-grained model, normal modes, and molecular dynamics simulations. Proteins Struct. Funct. Bioinform. 2007, 70, 1345–1356. [Google Scholar] [CrossRef] [PubMed]
Larsen, A.B.; Wagner, J.R.; Jain, A.; Vaidehi, N. Protein Structure Refinement of CASP Target Proteins Using GNEIMO Torsional Dynamics Method. J. Chem. Inf. Model. 2014, 54, 508–517. [Google Scholar] [CrossRef]
Olson, M.A.; Lee, M.S. Evaluation of Unrestrained Replica-Exchange Simulations Using Dynamic Walkers in Temperature Space for Protein Structure Refinement. PloS ONE 2014, 9, e96638. [Google Scholar] [CrossRef]
Kumar, A.; Campitelli, P.; Thorpe, M.F.; Ozkan, S.B. Partial unfolding and refolding for structure refinement: A unified approach of geometric simulations and molecular dynamics. Proteins Struct. Funct. Bioinform. 2015, 83, 2279–2292. [Google Scholar] [CrossRef]
Zhang, Y.; Skolnick, J. SPICKER: A clustering approach to identify near-native protein folds. J. Comput. Chem. 2004, 25, 865–871. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.; Skolnick, J. Scoring function for automated assessment of protein structure template quality. Proteins Struct. Funct. Bioinform. 2004, 57, 702–710. [Google Scholar] [CrossRef]
Zhang, J.; Zhang, Y. A Novel Side-Chain Orientation Dependent Potential Derived from Random-Walk Reference State for Protein Fold Selection and Structure Prediction. PloS ONE 2010, 5, e15386. [Google Scholar] [CrossRef]
Anfinsen, C.B. Principles that govern the folding of protein chains. Science 1973, 181, 223–230. [Google Scholar] [CrossRef]
Lu, M.; Dousis, A.D.; Ma, J. OPUS-PSP: An orientation-dependent statistical all-atom potential derived from side-chain packing. J. Mol. Biol. 2008, 376, 288–301. [Google Scholar] [CrossRef]
Tyka, M.D.; Keedy, D.A.; André, I.; Dimaio, F.; Song, Y.; Richardson, D.C.; Richardson, J.S.; Baker, D. Alternate states of proteins revealed by detailed energy landscape mapping. J. Mol. Biol. 2011, 405, 607–618. [Google Scholar] [CrossRef] [PubMed]
DiMaio, F.; Tyka, M.D.; Baker, M.L.; Chiu, W.; Baker, D. Refinement of Protein Structures into Low-Resolution Density Maps Using Rosetta. J. Mol. Biol. 2009, 392, 181–190. [Google Scholar] [CrossRef] [PubMed]
Gohlke, H.; Klebe, G. Statistical potentials and scoring functions applied to protein–ligand binding. Curr. Opin. Struct. Biol. 2001, 11, 231–235. [Google Scholar] [CrossRef]
Russ, W.P.; Ranganathan, R. Knowledge-based potential functions in protein design. Curr. Opin. Struct. Biol. 2002, 12, 447–452. [Google Scholar] [CrossRef]
Buchete, N.-V.; Straub, J.; Thirumalai, D. Development of novel statistical potentials for protein fold recognition. Curr. Opin. Struct. Biol. 2004, 14, 225–232. [Google Scholar] [CrossRef] [PubMed]
Zhou, Y.; Zhou, H.; Zhang, C.; Liu, S. What is a Desirable Statistical Energy Function for Proteins and How Can It Be Obtained? Cell Biochem. Biophys. 2006, 46, 165–174. [Google Scholar] [CrossRef]
Skolnick, J. In quest of an empirical potential for protein structure prediction. Curr. Opin. Struct. Biol. 2006, 16, 166–171. [Google Scholar] [CrossRef]
Bradley, P.; Malmström, L.; Qian, B.; Schonbrun, J.; Chivian, D.; Kim, D.E.; Meiler, J.; Misura, K.M.S.; Baker, D. Free modeling with Rosetta in CASP6. Proteins Struct. Funct. Bioinform. 2005, 61, 128–134. [Google Scholar] [CrossRef]
Sippl, M.J. Knowledge-based potentials for proteins. Curr. Opin. Struct. Biol. 1995, 5, 229–235. [Google Scholar] [CrossRef]
Jernigan, R.L.; Bahar, I. Structure-derived potentials and protein simulations. Curr. Opin. Struct. Biol. 1996, 6, 195–209. [Google Scholar] [CrossRef]
Moult, J. Comparison of database potentials and molecular mechanics force fields. Curr. Opin. Struct. Biol. 1997, 7, 194–199. [Google Scholar] [CrossRef]
Lazaridis, T.; Karplus, M. Effective energy functions for protein structure prediction. Curr. Opin. Struct. Biol. 2000, 10, 139–145. [Google Scholar] [CrossRef]
Dutagaci, B.; Heo, L.; Feig, M. Structure refinement of membrane proteins via molecular dynamics simulations. Proteins Struct. Funct. Bioinform. 2018, 86, 738–750. [Google Scholar] [CrossRef]
Olson, M.A.; Lee, M.S. Application of replica exchange umbrella sampling to protein structure refinement of nontemplate models. J. Comput. Chem. 2013, 34, 1785–1793. [Google Scholar] [CrossRef]
Zhou, H.; Zhou, Y. Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction. Protein Sci. 2002, 11, 2714–2726. [Google Scholar] [CrossRef] [PubMed]
Yang, Y.; Zhou, Y. Ab initio folding of terminal segments with secondary structures reveals the fine difference between two closely related all-atom statistical energy functions. Protein Sci. 2008, 17, 1212–1219. [Google Scholar] [CrossRef]
Leaver-Fay, A.; O’Meara, M.J.; Tyka, M.; Jacak, R.; Song, Y.; Kellogg, E.H.; Thompson, J.; Davis, I.W.; Pache, R.A.; Lyskov, S.; et al. Scientific Benchmarks for Guiding Macromolecular Energy Function Improvement. Methods Enzymol. 2013, 523, 109–143. [Google Scholar] [PubMed]
Rohl, C.A.; Strauss, C.E.M.; Misura, K.M.S.; Baker, D. Protein Structure Prediction Using Rosetta. Methods Enzymol. 2004, 383, 66–93. [Google Scholar]
Kuhlman, B.; Baker, D. Native protein sequences are close to optimal for their structures. Proc. Natl. Acad. Sci. USA 2000, 97, 10383–10388. [Google Scholar] [CrossRef]
Park, H.; DiMaio, F.; Baker, D. CASP11 refinement experiments with ROSETTA. Proteins Struct. Funct. Bioinform. 2016, 84, 314–322. [Google Scholar] [CrossRef]
Park, H.; Ovchinnikov, S.; Kim, D.E.; DiMaio, F.; Baker, D. Protein homology model refinement by large-scale energy optimization. Proc. Natl. Acad. Sci. USA 2018, 115, 3054–3059. [Google Scholar] [CrossRef] [PubMed]
Wallner, B.; Elofsson, A. Can correct protein models be identified? Protein Sci. 2003, 12, 1073–1086. [Google Scholar] [CrossRef] [PubMed]
Ray, A.; Lindahl, E.; Orn Wallner, B. Improved model quality assessment using ProQ2. BMC Bioinform. 2012, 13, 1–12. [Google Scholar] [CrossRef]
Randall, A.; Baldi, P. SELECTpro: Effective protein model selection using a structure-based energy function resistant to BLUNDERs. BMC Struct. Biol. 2008, 8, 52. [Google Scholar] [CrossRef]
Kryshtafovych, A.; Barbato, A.; Fidelis, K.; Monastyrskyy, B.; Schwede, T.; Tramontano, A. Assessment of the assessment: Evaluation of the model quality estimates in CASP10. Proteins 2014, 82 Suppl. 2, 112–126. [Google Scholar] [CrossRef]
Larsson, P.; Skwark, M.J.; Wallner, B.; Elofsson, A. Assessment of global and local model quality in CASP8 using Pcons and ProQ. Proteins Struct. Funct. Bioinform. 2009, 77, 167–172. [Google Scholar] [CrossRef]
Cozzetto, D.; Kryshtafovych, A.; Tramontano, A. Evaluation of CASP8 model quality predictions. Proteins Struct. Funct. Bioinform. 2009, 77, 157–166. [Google Scholar] [CrossRef] [PubMed]
Kryshtafovych, A.; Venclovas, Č.; Fidelis, K.; Moult, J. Progress over the first decade of CASP experiments. Proteins Struct. Funct. Bioinform. 2005, 61, 225–236. [Google Scholar] [CrossRef] [PubMed]
Moult, J.; Fidelis, K.; Kryshtafovych, A.; Rost, B.; Tramontano, A. Critical assessment of methods of protein structure prediction-Round VIII. Proteins Struct. Funct. Bioinform. 2009, 77, 1–4. [Google Scholar] [CrossRef] [PubMed]
Moult, J.; Fidelis, K.; Kryshtafovych, A.; Schwede, T.; Tramontano, A. Critical assessment of methods of protein structure prediction (CASP)—Round x. Proteins Struct. Funct. Bioinform. 2014, 82, 1–6. [Google Scholar] [CrossRef]
Zhang, Y.; Skolnick, J. TM-align: A protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 2005, 33, 2302–2309. [Google Scholar] [CrossRef]
Zemla, A. LGA: A method for finding 3D similarities in protein structures. Nucleic Acids Res. 2003, 31, 3370–3374. [Google Scholar] [CrossRef]
Kryshtafovych, A.; Monastyrskyy, B.; Fidelis, K. CASP11 statistics and the prediction center evaluation system. Proteins Struct. Funct. Bioinform. 2016, 84, 15–19. [Google Scholar] [CrossRef] [PubMed]
Chen, V.B.; Arendall, W.B.; Headd, J.J.; Keedy, D.A.; Immormino, R.M.; Kapral, G.J.; Murray, L.W.; Richardson, J.S.; Richardson, D.C.; Richardson, D.C. MolProbity: All-atom structure validation for macromolecular crystallography. Acta Crystallogr. Sect. D Biol. Crystallogr. 2010, 66, 12–21. [Google Scholar] [CrossRef]
Kryshtafovych, A.; Monastyrskyy, B.; Fidelis, K. CASP prediction center infrastructure and evaluation measures in CASP10 and CASP ROLL. Proteins Struct. Funct. Bioinform. 2014, 82, 7–13. [Google Scholar] [CrossRef]
Mariani, V.; Biasini, M.; Barbato, A.; Schwede, T. lDDT: A local superposition-free score for comparing protein structures and models using distance difference tests. Bioinformatics 2013, 29, 2722–2728. [Google Scholar] [CrossRef]
Olechnovič, K.; Kulberkytė, E.; Venclovas, Č. CAD-score: A new contact area difference-based function for evaluation of protein structural models. Proteins Struct. Funct. Bioinform. 2013, 81, 149–162. [Google Scholar] [CrossRef] [PubMed]
Cong, Q.; Kinch, L.N.; Pei, J.; Shi, S.; Grishin, V.N.; Li, W.; Grishin, N.V. An automatic method for CASP9 free modeling structure prediction assessment. Bioinformatics 2011, 27, 3371–3378. [Google Scholar] [CrossRef] [PubMed]
Rasmussen, B.F.; Stock, A.M.; Ringe, D.; Petsko, G.A. Crystalline ribonuclease A loses function below the dynamical transition at 220 K. Nature 1992, 357, 423–424. [Google Scholar] [CrossRef]
Eisenmesser, E.Z.; Bosco, D.A.; Akke, M.; Kern, D. Enzyme dynamics during catalysis. Science 2002, 295, 1520–1523. [Google Scholar] [CrossRef]
Benkovic, S.J.; Hammes-Schiffer, S. A perspective on enzyme catalysis. Science 2003, 301, 1196–1202. [Google Scholar] [CrossRef]
McCoy, A.J.; Grosse-Kunstleve, R.W.; Storoni, L.C.; Read, R.J. IUCr likelihood-enhanced fast translation functions. Acta Crystallogr. Sect. D Biol. Crystallogr. 2005, 61, 458–464. [Google Scholar] [CrossRef]
Mobley, D.L.; Chodera, J.D.; Dill, K.A. On the use of orientational restraints and symmetry corrections in alchemical free energy calculations. J. Chem. Phys. 2006, 125, 084902. [Google Scholar] [CrossRef]
Zagrovic, B.; van Gunsteren, W.F. Comparing atomistic simulation data with the NMR experiment: How much can NOEs actually tell us? Proteins Struct. Funct. Bioinform. 2006, 63, 210–218. [Google Scholar] [CrossRef]
Lindorff-Larsen, K.; Best, R.B.; DePristo, M.A.; Dobson, C.M.; Vendruscolo, M. Simultaneous determination of protein structure and dynamics. Nature 2005, 433, 128–132. [Google Scholar] [CrossRef]
Senior, A.; Jumper, J.; Hassabis, D. Deep Mind, AlphaFold: Using AI for scientific discovery. Available online: https://deepmind.com/blog/alphafold/ (accessed on 8 May 2019).
Wallner, B.; Elofsson, A. Prediction of global and local model quality in CASP7 using Pcons and ProQ. Proteins Struct. Funct. Bioinform. 2007, 69, 184–193. [Google Scholar] [CrossRef] [PubMed]
Uziela, K.; Wallner, B. ProQ2: Estimation of model accuracy implemented in Rosetta. Bioinformatics 2016, 32, 1411–1413. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Flowchart outlining the generalized protocol for the refinement of tertiary structure models applied by groups during the Critical Assessment of techniques for Structure Prediction (CASP) experiments.

Figure 2. Example of refinement of a CASP13 model by the McGuffin group. The predicted per-residue error is produced by ModFOLD7 and, then, a new restraint strategy, based on the predicted per-residue error, is applied during the sampling stage: (A) CASP13 prediction target T0958; (B) top selected server model (BAKER-ROSETTASERVER_TS2), displayed using the B-factor scheme; (C) the top selected server model is coloured using an occupancy column, where blue regions indicate restrained residues and red regions indicate unrestrained residues during the MD simulation; (D) superposition of the top selected server model (cyan), refined model (magenta), and native structure (green). T0958: BAKER-ROSETTASERVER_TS2 versus T0958_ReFOLD_8, a GDT_HA improvement from 0.419 to 0.4464.

Table 1. Publicly-available refinement web servers, based on methods tested in the CASP experiments.

Name	URL
PREFMD [85]	http://feiglab.org/prefmd
locPREFMD [86]	http://feig.bch.msu.edu/web/services/locprefmd/
GalaxyRefine [54]	http://galaxy.seoklab.org/refine
KoBaMIN [66]	http://csb.stanford.edu/kobamin
Princeton_TIGRESS 2.0 [56]	http://atlas.engr.tamu.edu/refinement/
ModRefiner [67]	http://zhanglab.ccmb.med.umich.edu/ModRefiner
3DRefine [41,122]	http://sysbio.rnet.missouri.edu/3Drefine/
ReFOLD [43]	http://www.reading.ac.uk/bioinf/ReFOLD/
FG-MD [110]	http://zhanglab.ccmb.med.umich.edu/FG-MD/

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Adiyaman, R.; McGuffin, L.J. Methods for the Refinement of Protein Structure 3D Models. Int. J. Mol. Sci. 2019, 20, 2301. https://doi.org/10.3390/ijms20092301

AMA Style

Adiyaman R, McGuffin LJ. Methods for the Refinement of Protein Structure 3D Models. International Journal of Molecular Sciences. 2019; 20(9):2301. https://doi.org/10.3390/ijms20092301

Chicago/Turabian Style

Adiyaman, Recep, and Liam James McGuffin. 2019. "Methods for the Refinement of Protein Structure 3D Models" International Journal of Molecular Sciences 20, no. 9: 2301. https://doi.org/10.3390/ijms20092301

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Methods for the Refinement of Protein Structure 3D Models

Abstract

1. Introduction

2. Sampling Strategies

Sampling Protocols

3. Scoring Strategies

4. CASP: The Critical Assessment of Techniques for Protein Structure Prediction

4.1. The Refinement Category in CASP Experiments

4.2. Progress with Refinement Strategies

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI