Exploring Routes to Enhance the Calculation of Free Energy Differences via Non-Equilibrium Work SQM/MM Switching Simulations Using Hybrid Charge Intermediates between MM and SQM Levels of Theory or Non-Linear Switching Schemes

Schöller, Andreas; Woodcock, H. Lee; Boresch, Stefan

doi:10.3390/molecules28104006

Open AccessArticle

Exploring Routes to Enhance the Calculation of Free Energy Differences via Non-Equilibrium Work SQM/MM Switching Simulations Using Hybrid Charge Intermediates between MM and SQM Levels of Theory or Non-Linear Switching Schemes

by

Andreas Schöller

^1,2

,

H. Lee Woodcock

³

and

Stefan Boresch

^1,*

¹

Faculty of Chemistry, Department of Computational Biological Chemistry, University of Vienna, Währingerstr. 17, A-1090 Vienna, Austria

²

Vienna Doctoral School in Chemistry (DoSChem), University of Vienna, Währingerstr. 42, A-1090 Vienna, Austria

³

Department of Chemistry, University of South Florida, 4202 E. Fowler Ave., CHE205, Tampa, FL 33620-5250, USA

^*

Author to whom correspondence should be addressed.

Molecules 2023, 28(10), 4006; https://doi.org/10.3390/molecules28104006

Submission received: 27 March 2023 / Revised: 25 April 2023 / Accepted: 26 April 2023 / Published: 10 May 2023

(This article belongs to the Special Issue Advances in Computational and Theoretical Chemistry)

Download

Browse Figures

Versions Notes

Abstract

:

Non-equilibrium work switching simulations and Jarzynski’s equation are a reliable method for computing free energy differences,

Δ A^{low \to high}

, between two levels of theory, such as a pure molecular mechanical (MM) and a quantum mechanical/molecular mechanical (QM/MM) description of a system of interest. Despite the inherent parallelism, the computational cost of this approach can quickly become very high. This is particularly true for systems where the core region, the part of the system to be described at different levels of theory, is embedded in an environment such as explicit solvent water. We find that even for relatively simple solute–water systems, switching lengths of at least 5 ps are necessary to compute

Δ A^{low \to high}

reliably. In this study, we investigate two approaches towards an affordable protocol, with an emphasis on keeping the switching length well below 5 ps. Inserting a hybrid charge intermediate state with modified partial charges, which resembles the charge distribution of the desired high level, makes it possible to obtain reliable calculations with 2 ps switches. Attempts using step-wise linear switching paths, on the other hand, did not lead to improvement, i.e., a faster convergence for all systems. To understand these findings, we analyzed the solutes’ properties as a function of the partial charges used and the number of water molecules in direct contact with the solute, and studied the time needed for water molecules to reorient themselves upon a change in the solute’s charge distribution.

Keywords:

free energy; indirect thermodynamic cycle; non-equilibrium simulation

1. Introduction

Since the first applications reported shortly before 1990 [1,2], so-called alchemical free energy simulations (FESs) have become an essential tool of computational chemists in academia and industry alike [3,4,5,6]. The free energy difference determines the spontaneities of chemical or biochemical processes, and hence, makes it possible to predict, e.g., the binding affinities of potential drugs [7,8,9]. In addition to computationally estimating an important macroscopic, thermodynamic quantity, FESs can also provide insight in the microscopic origins [10]. One source of error is the accuracy with which interactions within and between molecules are modeled in the underlying molecular dynamics simulations. In certain applications, force field-based descriptions may be insufficient [11]. Even if one is not interested in studying chemical reactions, the neglect of induced electronic polarization in classical force fields may lead to problems, even for seemingly simple applications, such as the computation of relative free energies of hydration of mono- or bivalent ions [12,13,14]. In such situations, hybrid quantum-mechanical/molecular mechanical (QM/MM) Hamiltonians are called for. However, even when using only semi-empirical quantum chemical methods (henceforth abbreviated as SQMs) for the core region, i.e., the part of the system that one is particularly interested in, the computational cost can quickly become prohibitive. Further, several tricks that are frequently used in alchemical FES do not work with SQM/MM Hamiltonians [15]. Early on, Warshel, Gao, and others pioneered the use of indirect cycles, also referred to as “multilevel” free energy simulations, as depicted in Figure 1a, which have become a widely used way of performing FES with an (S)QM/MM description of interactions [16,17,18].

To illustrate the indirect approach, we consider the calculation of the aqueous solvation free energy of a solute X. As can be seen in Figure 1a, the quantity of interest and the solvation free energy difference

Δ A_{X g a s \to s o l v}^{S Q M / M M}

at the high SQM/MM level of theory (dashed, black arrow in Figure 1a), can also be obtained in three steps according to

Δ A_{X g a s \to s o l v}^{S Q M / M M} = - Δ A_{X g a s}^{M M \to S Q M} + Δ A_{X g a s \to s o l v}^{M M} + Δ A_{X s o l v}^{M M \to S Q M / M M}

(1)

In addition to computing the solvation free energy

Δ A_{X g a s \to s o l v}^{M M}

at the force field (MM) level of theory (solid black arrow), two correction steps, accounting for the free energy difference of X in the gas phase and the aqueous solution between the two levels of theory (green and red arrows), are required. The development in this field over recent years shows that there is not only enormous potential, but also a high degree of interest in the FES community to use this strategy to compute free energy differences at the SQM/MM levels of theory [19,20,21,22,23,24,25,26,27].

Work by Heimdal and Ryde [28] made clear that the challenging step of indirect cycle SQM/MM FES is the calculation of the low-to-high corrections

Δ A_{X g a s}^{M M \to S Q M}

and

Δ A_{X s o l v}^{M M \to S Q M / M M}

: the green and red arrows of Figure 1a. Various approaches have been used, often requiring a sophisticated decomposition of the cycle and the employment of fitting procedures in order to be successful [18,29]. In the past, we showed that the corrections

Δ A^{M M \to S Q M}

can be computed reliably and accurately using non-equilibrium work (NEW) methods [30,31,32,33,34,35]. In our work to date, computationally expensive protocols were used, which are unsuitable for most real-world applications. Recently, we optimized the NEW protocols regarding switching length and the number of switches, and presented a sufficiently efficient workflow for practical applications [36]. However, these optimized protocols were tested so far only in the gas phase, i.e., the green arrow of Figure 1a (

Δ A_{X g a s}^{M M \to S Q M}

).

Here, we investigate whether the protocols suggested in Ref. [36] are also suitable for aqueous solution, i.e., when the region of interest is to be treated at a high level of theory, e.g., SQM, interacts with an MM environment (

Δ A_{X s o l v}^{M M \to S Q M / M M}

, red arrow in Figure 1a). In earlier work, we had noted that NEW switching simulations from a pure MM to an SQM/MM level of theory converged more slowly compared to gas phase transformations involving only MM to SQM, and observed that the charge distribution resulting from the fixed charges of the force field and the average charge distribution obtained at the SQM/MM level of theory were noticeably different [31]. We speculated that the necessary reorientation of water in response to such a change in the charge distribution of the solute might be responsible for the slower convergence. Using equilibrium methods to compute

Δ A_{X s o l v}^{M M \to S Q M / M M}

, Ito and Cui reported that inserting an intermediate state with fixed charges more similar to the charge distribution at the SQM/MM target state improved convergence considerably [37].

The solvent-reorientation dynamics of water in response to charge changes is experimentally well characterized and understood [38,39]. Two relaxation times, one faster than 0.5 ps and the other approximately 2 ps, can be discerned. Therefore, if the charge distribution of the region of interest is significantly different at the two levels of theory, NEW switching simulations of only 2 ps, as used in Ref. [31], may be too short to obtain converged results. With this in mind, we tested two strategies. (i) Similarly to the work by Ito and Cui, we introduce an intermediate state, which is still purely MM, but with the atomic partial charges modified to resemble the average charge distribution of the SQM description at the high level of theory. (ii) Since water reorientation in response to a change in charge distribution occurs on two timescales, the linear switching protocols we have used so far may be suboptimal. In particular, we test whether protocols in which switching is carried out initially at a faster rate, e.g., switching from the MM to a 35% SQM/MM description at just 10% of the switching length, followed by a second slower phase during the remaining 90% of the switching length, might lead to improved convergence. Obviously, both strategies can be combined. Model calculations exploring these two approaches are complemented by a series of analyses in which we investigate the effects of a solute’s charge distribution on properties such as its dipole moment, as well as on the interactions with surrounding waters.

The focus of this work is on the effect by which the charge distribution of a solute described at different levels of theory has on the convergence of calculating

Δ A_{X s o l v}^{M M \to S Q M / M M}

. To avoid unrelated complications from too-flexible molecules, where conformational preferences might be different at the low and high levels of theory, respectively, we mostly chose rather rigid model compounds for the test calculations. Rather than using (a subset of) the so-called “HiPen” test set [34], we selected tautomeric pairs (cf. Methods). Most of the compounds are rather rigid without rotatable bonds. Therefore, the pure MM and the SQM/MM descriptions of the systems—the solute (tautomer) are described by SQM, and all waters remain classical—differing primarily by the charge distributions of the solute at the two levels of theory.

The remainder of the manuscript is organized as follows. In Results, we first study the convergence of NEW switching simulations, using the optimized protocol from our preceding study [36] (switching lengths of 2 ps, 200 NEW switches per transformation). We then investigate how convergence

Δ A_{X s o l v}^{M M \to S Q M / M M}

improves when employing switching lengths of 5 ps, which, however, may be too costly for many applications. Next, we present results using the two mitigation strategies outlined above. Our findings are rationalized using additional data characterizing the differences in solute properties, in particular the dipole moment at the two levels of theory, as well as details on the solvent-reorientation dynamics of water under the simulation conditions. In Section 4, we summarize the theoretical background, introduce the model systems, and provide the technical details of all simulations and analyses carried out.

2. Results

2.1. Overview of Calculated Free Energy Differences and Paths

Below, several approaches to calculate the free energy difference between two levels of theory are compared. To specify the method and the exact meaning of a free energy difference, we adopt the following labeling scheme:

Δ A_{X p h a s e}^{l e v e l o f t h e o r y I \to / \leftrightarrow l e v e l o f t h e o r y I I}

Here, the subscript X indicates the compound; the qualifier phase can either be “gas” (gas phase) or “solv” (solvated phase). In the superscript, level of theory I and level of theory II can be “MM”, “MULL(gas)”, “MULL(solv)”, “MULL(solv*)”, or “SQM(/MM)”. MM indicates the use of the regular force field. The abbreviation SQM indicates that the SCC-DFTB semi-empirical method was used to compute interactions, either for the full system (the gas phase), or for the solute (the aqueous solution; waters were always described classically). The labels MULL(gas), MULL(solv), and MULL(solv*) refer to three methods used to derive the average Mulliken charges described in Section 4.3.1.

A simple arrow → stands for a one-sided method to compute the free energy difference; in this work, this always means Jarzynki’s equation (JAR) [40]. The double-pointed arrow ↔ indicates the use of a two-sided method. As described in Section 4, we used Crooks’ equation (CRO) [41] to compute the reference results. Furthermore, the free energy differences between MM and the intermediate state with modified partial charges, i.e., MULL(gas), MULL(solv), or MULL(solv*), were computed using Bennett’s acceptance ratio method (BAR) [42].

As in our previous work, all results are reported relative to a reference result; i.e., instead of reporting, e.g.,

Δ A_{X p h a s e}^{M M \to S Q M}

, we report the corresponding double free energy difference

δ Δ A_{X p h a s e} = Δ A_{X p h a s e}^{M M \to S Q M} - Δ A_{X p h a s e}^{CRO_{Ref}}

. In the past, the reference results

Δ A_{X p h a s e}^{CRO_{Ref}}

were obtained with CRO, based on 200 instances of 2 ps forward and backward NEW switches. As described in Section 2.2 below, we decided to use the CRO results obtained from 5 ps switches as the reference (CRO

_{Ref}

). The exact definition of

δ Δ A_{X p h a s e}

depends on whether the free energy difference between two levels of theory is computed in a single step, i.e.,

δ Δ A_{X p h a s e} = Δ A_{X p h a s e}^{MM \to / \leftrightarrow SQM} - Δ A_{X p h a s e}^{CRO_{Ref}}

(2a)

or via an hybrid charge intermediate state. Here, we have

δ Δ A_{X p h a s e} = Δ A_{X p h a s e}^{MM \leftrightarrow MULL} + Δ A_{X p h a s e}^{MULL \to SQM} - Δ A_{X p h a s e}^{CRO_{Ref}}

(2b)

to account for the correction step between the MM representation and the hybrid Mulliken charge intermediate state; cf. Equation (8).

2.2. Performances of 2 and 5 ps Linear Switching Protocols in Solution

Before carrying out calculations in aqueous solution, the optimized JAR protocol from Ref. [36] was tested for all compounds in the gas phase. All results can be found in Table S3 of SI. The model systems used in this work are described in Section 4.2. Except for 1-t1 (see further down),

|δ Δ A|

was

< 0.1

kcal/mol. While

δ Δ A (1 - t 1) \approx 0.35

kcal/mol is slightly larger than the ideal maximum deviation of

\pm 0.25

kcal/mol, this value is still acceptable for most practical applications. Thus, the protocol recommendations of our previous study [36] are applicable to the model compounds studied here. Based on the gas phase results, any noticeable deterioration of

δ Δ A

therefore has to be caused by differences in the description of solute–solvent interactions at the two levels of theory.

In Figure 2, the

δ Δ A

values for all compounds obtained via the three protocols/methods are compared against the reference results obtained from 200 forward/backward switches of 5 ps length: 200 MM to SQM switches of 2 ps length, the recommended protocol from Ref. [36], (JAR(2ps), red circles), 200 MM to SQM switches of 5 ps length (JAR(5ps), orange circles), and the CRO results obtained from 200 forward/backward switches of 2 ps length (CRO(2ps), blue circles). The data visualized in Figure 2 are tabulated in Table S2; even more detailed results can be found in Tables S4 and S5. For the shortest protocol, JAR(2ps), four results lie outside the

\pm 0.25

kcal/mol threshold indicated by the two dashed lines in the figure. Taking error bars into account (see Table S2), the results for 1-t1 and 8-t2 can be considered as acceptable,

|δ Δ A| < 0.4

kcal/mole; furthermore,

δ Δ A

(1-t1) is comparable to the value obtained in the gas phase (see Table S3). However, for 5-t2 and 6-t2, the deviation from the reference result is larger than 1 kcal/mol and would lead to a sizable systematic error.

The use of a 5 ps switching length (JAR(5ps), orange circles) certainly helps, but one poor result, 6-t2,

δ Δ A \approx 0.8

kcal/mol, remains. Thus, the relatively inexpensive protocol suggested in Ref. [36] cannot be recommended for calculations in solution. Even worse, the use of more costly switching simulations of 5 ps length does not help in all cases.

Figure 2 also shows results for the CRO(2ps) protocol (blue circles). Compared to our findings in the gas phase [36], the differences to the reference result CRO(5ps) are noticeably more pronounced. Furthermore, as can be seen in Tables S4 and S5, the statistical uncertainty of the CRO(2ps) results was consistently higher than those of the CRO(5ps) results, even though the overlap between the forward and backward work distributions seemed adequate in all cases. While all CRO(2ps)

δ Δ A

values lie within the

\pm 0.25

kcal/mol threshold, we nevertheless decided to choose CRO(5ps) as the reference protocol.

Three of the four compounds failing the

\pm 0.25

kcal/mol threshold using the JAR(2ps) protocol, 5-t2, 6-t2, and 8-t2, are lactams. Interestingly, the corresponding tautomeric lactim states do not cause any problems. The pair 1-t1/1-t2 belongs to the keto-enol class of tautomerism. For 1-t1,

δ Δ A \approx 0.35

kcal/mol in both the gas phase and in aqueous solution; therefore, different conformational preferences at the two levels of theory may be responsible for the poor convergence when using switching lengths of 2 ps.

2.3. Performances of Hybrid Charge Intermediates

In Figure 3, we summarize the performances of the indirect NEW switching protocols using a hybrid intermediate charge state in terms of

δ Δ A

, cf. Equation (2b). As described in Methods (Section 4.3.1), three approaches to obtain average Mulliken charges were tested: averaging over gas phase configurations (MULL(gas), over configurations sampled in aqueous solution at the MM level of theory (MULL(solv)), and over configurations sampled in aqueous solution at the SQM/MM level of theory (MULL(solv*)). The results obtained with the direct JAR(2ps) protocol (red circles), already shown in Figure 2 and henceforth referred to as just MM, are included as well.

Clearly, all three protocols employing an intermediate hybrid charge state perform much better than the direct NEW switches (MM), though not to the same degree. Using the MULL(gas) intermediate state (orange triangles), the

\pm 0.25

kcal/mol threshold for 5-t2 is still missed,

δ Δ A = 0.32

kcal/mol, and the result for Ala becomes worse (

δ Δ A = - 0.49

kcal/mol). For MULL(solv), the blue squares, all results lie de facto within the

\pm 0.25

kcal/mol threshold (

δ Δ A (5 - t 2) = - 0.28

kcal/mol,

δ Δ A (Ser) = - 0.26

kcal/mol). It should be noted that taking the statistical uncertainty of the results into account, none of these deviations from the

\pm 0.25

kcal/mol threshold are statistically significant. Finally, all MULL(solv*) results (green diamonds) lie well within the

\pm 0.25

kcal/mol threshold.

Figure 3 shows that in most cases, the use of a hybrid intermediate charge state lowers

δ Δ A

, and typically, the use of MULL(solv*) leads to the best results, followed in this order, using MULL(solv) and MULL(gas). To quantify this, we report in Table 1 the mean absolute deviation (MAD) for the data shown in Figure 3, together with the spread of

δ Δ A

for each of the protocols. Already, the use of MULL(gas) lowers the MAD from 0.29 to 0.12 kcal/mol, and the largest error

δ Δ A

reduces from 1.80 to 0.49 kcal/mol. The MULL(solv) intermediate state leads to a further improvement (

MAD = 0.09

, and the largest error is

0.28

kcal/mol). Finally, the MAD for MULL(solv*) is 0.07 kcal/mol and the largest

δ Δ A

is as low as 0.20 kcal/mol.

The improvements resulting from the hybrid intermediate charge states are roughly inversely proportional to the computational effort to obtain the average Mulliken-like charges. The MULL(gas) and MULL(solv) charges are obtained from configurations at the MM level of theory; the required computations at the SQM(/MM) level of theory to obtain the Mulliken charges are slightly more costly for the solvated system. By contrast, to obtain the MULL(solv*) charges, sampling needs to be carried out at the high (SQM/MM) level of theory. Given that the improvements of using MULL(solv*) rather than MULL(solv) are relatively small, this leads to an additional computational cost that, at least for the systems studied here, seems not to be worthwhile. In other words, the use of the MULL(solv) hybrid partial charges seems to be a good compromise between correctness and computational cost. Each of the indirect NEW switching protocols requires the calculation of

Δ A^{MM \leftrightarrow MULL}

; however, this is a strict force field-based free energy calculation, which can be computed quickly and efficiently on GPUs using the CHARMM/OpenMM interface (cf. Section 4.4.1).

2.4. Performances of Modified Switching Protocols

Figure 4 summarizes the results obtained with stepwise linear switching protocols for a subset of the model compounds. For details, see Section 4.3.2 and Figure 11. In all cases, the baseline results, indicated as red, solid circles, are the free energy difference obtained from one-step MM → SQM switches, i.e., the MM/JAR(2ps) results already presented. For the first problematic case, 5-t2, all three stepwise linear protocols reduce

δ Δ A

considerably. However, in the case of 6-t2, it is difficult to speak of improvements. Two protocols, L2-1 and L3-1, lower

δ Δ A

, but the values are still far outside of the

\pm 0.25

kcal/mol threshold. The third protocol, L3-2, which worked extremely well for 5-t2, actually increases

δ Δ A

for 6-t2 compared to L1, the default linear protocol. The other three compounds included in the subset were chosen as negative controls because they already performed well with the JAR(2ps) protocol. The performance of the stepwise linear protocol is comparable, except for one poor result for 4-t2 when using L3-1.

Turning to the combination of stepwise linear and indirect NEW switching protocols with the MULL(solv) hybrid intermediate charge state (shown as squares), most of the results lie within the

\pm 0.25

kcal/mol threshold. However, since in this case, the performance of the linear protocol was excellent to begin with, any improvements are marginal. Moreover, for 4-t2, the stepwise linear protocols perform slightly poorer, and for 6-t2,

δ Δ A

obtained with the L3-1 is >1 kcal/mol.

Based on these findings, the performances of the stepwise linear switching protocols seems mixed at best. None of the protocols tested consistently improved the results. Furthermore, even when the results were improved and fell within the

\pm 0.25

kcal/mol threshold, the standard deviation of the work values,

σ_{W}

, was always larger than for the linear switching protocol. Since

σ_{W}

is a sensitive indicator of whether one can expect the convergence of JAR calculations [32], the utility of the protocols that increase it seems doubtful.

2.5. A Detailed Analysis of the Factors Affecting Convergence

2.5.1. Effects of Charge Distribution on Solute Properties

To quantify the differences between the various charge models used in this study, we calculated the root-mean-square deviation of the atomic charge differences

{RMSD}_{q}

between the MM, MULL(gas), MULL(solv) representations, using the MULL(solv*) charges as the reference value (see Section 4.3.3, Equation (9)). Since the latter are the average of the Mulliken charges derived from simulations at the SQM/MM level of theory, they are closest to the fluctuating charge distribution at the target high level of theory. The variability of the Mulliken charges derived from the SQM(/MM) reevaluations about their mean values were always extremely low, fluctuating typically less than

\pm 0.03

e about the mean; the largest value observed was 0.08 e. These low standard deviations make the use of average values meaningful in the first place. In addition to

{RMSD}_{q}

, we looked at the magnitude of the differential dipole moment

|Δ \vec{μ}| = Δ μ

(Equation (10)) and the angle

Θ_{SQM}

between the dipole moment of a charge distribution and that of the MULL(solv*) reference charges (Equation (11)).

The individual results for each compound are listed in Table S12 and depicted in Figures S4–S6 of SI. Some overall trends can be seen from the MAD values of

{RMSD}_{q}

,

Δ μ

, and

Θ_{SQM}

in Table 2. As expected, the MAD(

{RMSD}_{q}

) is largest for MM and smallest for MULL(solv), with MULL(gas) in between; the same is the case for MAD(

Δ μ

). This is in accord with the MAD(

δ Δ A

) results of Table 1 in Section 2.3, where the use of the MULL(gas) intermediate charge state led to a noticeable improvement, and the MULL(solv) charges lowered the MAD(

δ Δ A

) even further. The MAD(

Θ_{SQM}

) behaves somewhat differently; here, the MULL(gas) value is lower than that of MULL(solv). As can be seen in Table S12 and Figure S6, this holds true for most compounds. Nevertheless,

Θ_{SQM}

of MULL(solv) is always smaller than that of the MM charges. The findings strongly suggest that the differences between the charge distribution at the two levels of theory have a strong influence on the convergence of the calculations of

Δ A^{high \to low}

via JAR, with the difference in magnitude of the two dipole moments being more important than the orientation of the respective dipole moment vectors.

Based on the overall convergence results (Figure 2), we surmised that compounds containing a lactam moiety tended to converge more slowly than the corresponding lactim state, e.g., 5-t2 vs. 5-t1, etc. One reason for this is that the differences between the MM and the SQM(/MM) descriptions are greater for the lactams than for the lactims. The analysis of partial charges points in this direction as well. Several differences can be discerned in the detailed charge distribution data for the individual molecules, e.g., Figure S4. For all lactim–lactam pairs (compounds 2, 3, 4, 5, 6, 8),the MM

{RMSD}_{q}

of the lactam state is noticeably higher than that of the corresponding lactim state. Similarly, the MM

{RMSD}_{q}

of each of the lactams (red circle) is at least 0.1e higher than for MULL(solv) (blue squares).

In Figure 5, we visualize some points made above for the two molecules, for which obtaining a converged

Δ A^{MM \to SQM}

was most difficult, 5-t2 (top) and 6-t2 (bottom). For each of the three charge representations, MM, MULL(gas), and MULL(solv), we display the differential atomic charges, both as labels, as well as a color gradient from blue to red, and the resulting differential dipole moment vectors (orange arrows). The magnitude of the differential dipole moment

Δ μ = |Δ \vec{μ}|

and the angle

Θ_{SQM}

between the dipole moment of a charge distribution and that of the MULL(solv*) reference charge distribution are listed directly. The difference in partial charges and the MULL(solv*) charges can be large, e.g., the difference for the -C=O part of the lactam moiety is almost

\pm 0.5

e for 5-t2 (top, left in Figure 5). One further sees that the charge differences become smaller between the MM and the two MULL charge sets, with atoms colored in clear blue or red for MM; e.g., the -C=O group (left side) has just a shade of blue or red for MULL(solv) (right side). Accordingly, the length of the orange arrows, i.e.,

Δ μ

of the MULL charge states, is noticeably smaller than that for MM. Even more detailed plots, including the exact values of the charge difference for each atom, for 3-t2, 4-t2, 5-t2, 6-t2, and 8-t2, can be found in Figures S7–S11.

2.5.2. Effects of Charge Distribution on the First Solvation Shell

Figure 6 displays the difference between the average number of waters

Δ N_{W a t e r s}

within

\leq 3

Å of the solute. The numbers are relative to the target high-level state, i.e., the average number of waters found in the SQM/MM simulations. First, one sees that

Δ N_{W a t e r s}

for the MULL(solv*) charges (green diamonds), which were also derived from the SQM/MM simulations, are very close to zero, in accord with expectation. Several results for the MULL(solv) charges (blue squares) are also quite close to zero, but mostly for the lactams (particularly 2-t2, 3-t2, 4-t2, 5-t2) the difference in water molecules

Δ N_{W a t e r s} \approx - 1

. For the MULL(gas) charges (orange triangles), all

Δ N_{W a t e r s}

values are negative, i.e., on average, there are fewer water molecules in close contact with the solute, compared to SQM/MM. This is not too surprising, because the charges were derived from SCC-DFTB gas phase calculations; hence, the solute did not experience polarization from its interaction with the solvent. This may also explain why on average, the use of the MULL(gas) hybrid intermediate state performed worse than MULL(solv), even though it performed significantly better than MM (cf. Table 1).

The MM results (red circles) fall into two distinct groups, with either more or less water present than in SQM/MM. For most lactams, with the slightly surprising exception of 6-t2,

Δ N_{W a t e r s}

is negative, whereas the lactims, i.e., have a positive

Δ N_{W a t e r s}

. Thus, one sees again distinct differences in properties for the force field representations of lactims and lactams, respectively.

2.5.3. Water Reorientation Dynamics

In the previous subsections, we characterized the influences of different charge distributions (atomic partial charges) on the static properties of the solute and the surrounding solvent layer in several ways. As the endpoints of the NEW switches have different charge distributions, it is of considerable interest to study the dynamics of solvent reorientation. We therefore computed the unnormalized Stoke shift relaxation

\bar{Δ Δ U (t)}

for several compounds, and the MM and MULL(solv) charge distributions (see Section 4.3.3, Equation (13)). The results for 5t-2 are shown in Figure 7; analogous plots for 4t-2, 6t-2, and 8t-2 can be found in Figures S12–S14 in SI. All fit parameters are summarized in Table S13 of SI.

As described in Section 4.3.3, the mono-exponential fit was carried out in such a way that it picks out the slow process(es) of water reorientation. Looking at Figure 7, one sees that the relaxation times for MM and MULL(solv) are relatively comparable (

τ \approx 1

ps), but that the prefactor

Δ U_{0}

is quite different (>10 kcal/mol for MM and ≈4 kcal/mol for MULL(solv)). Thus, while

\bar{Δ Δ U (t)}

has for all practical reasons reached 0 after about 3 ps in the MULL(solv) case (blue curve), for MM (red curve), this is the case only after 5 ps. Since we would ideally carry out NEW switches of only 2 ps length, the obvious question is the value of

Δ U (t = 2 ps)

. Using the fit parameters, we find that MM ≈ 1.7 kcal/mol, and that MULL(solv) ≈ 0.5 kcal/mol. These values roughly mirror the values for

δ Δ A

for MM/JAR(2ps) and MULL(solv), respectively.

For the second problem case, 6-t2, very similar results were obtained; see Figure S13 and Table S13.

Overall, the slow decay process of

\bar{Δ Δ U (t)}

always occurs with a time constant

τ \approx 1

ps. Thus, the degree to which the system is still out of equilibrium regarding the SQM/MM target state therefore depends crucially on how big the difference is at

t = 0

. Thus, it becomes clear that convergence is facilitated if the solute charge distributions of the initial and final state resemble each other, explaining why better results were obtained for all the MULL intermediate charge states. Phrased differently, to obtain converged JAR results from 2 ps switching simulations, the charge distributions of the initial and final states have to be very similar. Since the determining factor seems to be the initial difference in charge distribution, using stepwise linear switching paths cannot help much, explaining at least in part why we did not obtain any real improvements from the stepwise linear protocols we attempted.

2.5.4. Interplay between Charge Distribution and Conformational Preferences

The compounds used in this work were mostly rigid and specifically chosen to avoid complications from conformational degrees of freedom, the exceptions being 1-t1 and the blocked amino acids Ala and Ser. Modifying partial charges as for the MULL hybrid intermediate states may have an effect on conformational preferences. As can be seen in Figure 3, all three MULL hybrid intermediate states improve the convergence for 1-t1. For the two blocked amino acids, MULL(gas) results in a poorer convergence for Ala, and MULL(solv) results in a slightly poorer convergence for Ser. Given that Ala and Ser are the smallest possible peptide-like molecules with protein backbone-like

ϕ

and

ψ

dihedral angles, we performed some analyses on the dihedral angle conformational preferences. In earlier work [31], we showed that purely classical Hamiltonians and SQM/MM Hamiltonians resulted in different preferred conformations of

ϕ

and

ψ

, as well as

χ_{1}

in the Ser case. In Figure 8 and Figure 9,

ϕ / ψ

maps for Ala and Ser are shown for MM, MULL(solv), and SQM. The differences between MM (left) and SQM (right) are clearly visible. For both blocked amino acids, MM has a single narrow minimum at

ϕ \approx 150^{\circ}

,

ψ \approx - 50^{\circ}

, whereas SQM has a broader distribution at

ϕ \approx 150^{\circ}

, and

- 150^{\circ} < ψ < - 50^{\circ}

. While the MULL

ϕ / ψ

maps (middle plots) are more similar to MM than to SQM, a second minimum at

ϕ \approx 150^{\circ} / ψ \approx - 150^{\circ}

has appeared. Thus, although the effect is small, the use of hybrid charge intermediates also makes this state slightly more similar to a high-level state, in terms of conformational preferences.

3. Discussion

The optimized protocol proposed in Ref. [36] (200 switches of 2 ps length) would be efficient enough for many practical applications. However, it has only been validated in the gas phase. If water reorientation is responsible for the slower convergence in aqueous solution, as observed previously [31,37], then, given the experimentally known slower relaxation time constant of >1 ps, switching lengths of 2 ps may not be long enough. This suspicion was confirmed in this work, and we found that in one case, even a switching length of 5 ps was insufficient.

We explored two approaches to keep the switching length at 2 ps: (i) introducing a hybrid intermediate charge state, which makes the solute charge distribution more similar to that at the SQM/MM level of theory. (ii) Since the solvent reorientation also has an ultrafast component, we used a stepwise linear switching scheme instead of the standard one, switching faster, initially followed by one or two slower steps. While the use of hybrid intermediate charge states significantly improved convergence, the benefits of multi-stage switching protocols were mixed.

The partial charges of the intermediate hybrid states were obtained by averaging over the Mulliken charges obtained from 200 energy evaluations at the SQM level of theory. The computational cost of this is relatively small; the overhead, if any, comes from sampling the configurations over which one averages in the first place. Not surprisingly, the best results (lowest overall

δ Δ A

; see Table 1) were obtained from coordinate sets generated during SQM/MM simulations, i.e., calculations at the target high level of theory (MULL(solv*)). While we have performed such calculations to obtain reference results with CRO, we would ideally prefer to avoid them in practical applications. In contrast, the cheapest approach is to use the coordinate sets saved during the MM gas phase simulations (MULL(gas)). While these results were an improvement over the direct MM→SQM/MM switches, the MAD(

δ Δ A

) was higher than that of MULL(solv*). The results for the third method tested, MULL(solv), fell in between. Here, the configurations to be reevaluated at the SQM/MM level of theory were generated from the MM simulations in the solvated phase. Since one can reuse the stored coordinates/restart files to start the NEW switching simulations, the computational cost is practically zero. All results obtained with the MULL(solv) hybrid intermediate charge state met our rather stringent quality criteria; the improvements over the direct MM→SQM/MM switches were noticeable, and the difference in convergence over the more expensive MULL(solv*) approach was small (a MAD of 0.09 instead of 0.07 kcal/mol). Thus, the MULL(solv) approach to derive an intermediate charge representation seems to be a good compromise between computational cost and convergence improvement. While for this study, averaged Mulliken charges were the logical choice because of the use of SCC-DFTB, other charge assignment procedures could be explored if different QM methods were used, such as ESP or CM5-symmetrized charges [19,43]. All methods require an additional, free energy simulation to obtain

Δ A^{MM \leftrightarrow MULL}

at the low level of theory, but its computational cost is small compared to that of the NEW switching simulations needing SQM/MM Hamiltonians.

As reported in Section 2.4, we have not seen consistent convergence improvements for all systems using the two-stage and three-stage switching protocols. This prompted a detailed analysis of how the MM and SQM/MM representations differ. Comparing the MM charges of the force field to the average Mulliken charges, one sees that there are indeed surprisingly large differences in charge distribution, and hence, the solute dipole moment. The differences in properties of the solute have a noticeable effect on the number of water molecules in close contact with the solute. Furthermore, by studying the time correlation function

\bar{Δ Δ U (t)}

after switching from one level of theory to the other, we were able to confirm the time scales of experimental solvation spectroscopy. There clearly is a fast process with a relaxation time

≪ 1

ps, and a slower process with

τ \approx 1

ps. This slower time constant does not change much if the initial charge distributions are equal to MM or MULL(solv). If the difference between the initial and final distribution is large, then the effect of the unfinished solvent reorientation after 2 ps (the switching length we are aiming for), will also be large. If the two charge distributions are more similar, as is the case for MULL(solv), then

Δ U (t = 2 ps)

is small enough to have little effect on the results, even though the solvent reorientation may not be complete. On the one hand, this observation explains the effectiveness of using hybrid charge intermediate states. On the other hand, it helps to rationalize our failure to obtain consistently convergent results with the multistage switching schemes. While one could certainly construct more complex switching paths, the speed of the solvent reorientation would remain the same.

4. Materials and Methods

4.1. Computing Free Energy Differences between the Levels of Theory Using NEW Methods

In several previous studies, [30,31,32,33,34,35,36] we demonstrated that NEW methods are a robust means of computing free energy differences between levels of theory, such as an MM and an SQM/MM description of a system. Given a series of non-equilibrium work

W^{M M \to S Q M / M M}

values obtained from switching simulations, the free energy difference is computed according to Jarzynski’s equation [40]:

Δ A^{M M \to S Q M / M M} = - k_{B} T ln 〈exp (- β W^{M M \to S Q M / M M})〉

(3)

where

k_{B}

, T, and

β

have the usual meanings of Boltzmann’s constant, absolute temperature, and

β = 1 / k_{B} T

. The angular brackets indicate averaging over a sufficiently large number of individual work values. To realize this scheme, one has to save snapshots during an equilibrium MD simulation at the MM level of theory. These serve as the starting point for the switching simulations using a hybrid energy function:

U (λ_{t}) = (1 - λ_{t}) U^{M M} + λ_{t} U^{S Q M / M M}

(4)

Here,

0 \leq λ_{t} \leq 1

is the coupling parameter which is incremented, e.g., linearly at each step of the switching simulation, according to

λ_{t} = λ_{t} (i) = i / N_{s w i t c h}, i = 1, 2, \dots, N_{s w i t c h}

(5)

In our protocols,

N_{s w i t c h}

is, e.g., 2000 or 5000 steps, corresponding to a switching length of 2 or 5 ps with a timestep

δ t = 1

fs, and we carry out two hundred of such switches (see below for details). Equation (5) describes the simplest switching protocol; in this work, we also test modified protocols where

λ_{t}

is incremented at different speeds. The switching simulations are costly because the evaluation of the mixed potential energy function

U (λ_{t})

requires the high-level energy function

U^{S Q M / M M}

and its gradient. However, since the switches are independent, they can be carried out in parallel. For each timestep at which

λ_{t}

is incremented, the work is accumulated according to (cf. Ref. [44])

W (t + δ t) = W (t) + U (x_{t + δ t}, λ_{t + δ t}) - U (x_{t + δ t}, λ_{t})

(6)

One-sided methods, such as Jarzynski’s equation, are less efficient and reliable than the two-sided approaches [45]. The process outlined above to obtain

W^{M M \to S Q M / M M}

can be reversed, so one ends up with distributions

P^{M M \to S Q M / M M} (W)

of work values in the forward, and

P^{S Q M / M M \to M M} (W)

in the backward direction. Crooks showed that these two distributions are related according to [41]

P^{M M \to S Q M / M M} (W) = P^{S Q M / M M \to M M} (- W) exp [β (W - Δ A^{M M \to S Q M / M M})]

(7)

from which one can deduce the free energy difference between the two states, i.e., between the two levels of theory. The notation in Equation (7) has been adapted to the present application. The practical use of Crooks’ equation is computationally costly, since one needs to generate equilibrium configurations at the high level of theory. As in the past, Crooks’ equation will therefore only be used to generate reference results to assess the convergence of free energy differences obtained using Jarzynski’s equation. In this work, Jarzynski’s and Crooks’ equations (or their use) are abbreviated as JAR and CRO, respectively.

The mixing of Hamiltonians as required by Equation (4) was realized with the MSCALE module of CHARMM [46], which makes it possible to combine energies and forces from different sources, including separate programs, in an almost arbitrary manner. The MSCALE functionality is coupled to one of CHARMM’s tools (PERT) to compute free energy differences [47]. Realizing that short/fast slow-growth thermodynamic integration calculations yield a non-equilibrium work rather than a converged free energy differences [48], one can exploit PERT’s slow-growth mode to carry out NEW switches. Using PERT, it is also possible to carry out stepwise linear switching protocols; see Section 4.3.2 below.

4.2. Choice of Model Systems

This study focuses on the optimization of the calculation of

Δ A_{X s o l v}^{M M \to S Q M / M M}

(red arrow in Figure 1a); in other words, the free energy difference between a system in which all interactions are described using classical mechanics, and a hybrid system, in which a small region of interest is described using SQM, and the remainder described using MM. Examples are solute–solvent systems, in which the solute is modeled either via molecular mechanics or via quantum chemistry, whereas the solvent (water) is always treated classically. As outlined earlier, difficulties may arise from the differences in the charge distributions of the MM and SQM representations of the solute molecule X. To exclude other factors hindering convergence, suitable model systems should be relatively rigid and not have multiple conformational substates.

Many tautomer pairs fulfil this requirement. Therefore, we picked seven such pairs from the tautomer part of the SAMPL2 challenge [49]. One additional pair was taken from Tautobase [50]. We also included N-acetyl-alanine-methylamide (Ala) and N-acetyl-serine-methylamide (Ser) as model solutes, since these were the two compounds for which we noticed slow convergences in Ref. [31]. All model compounds used in this work are shown in Figure 10.

The prediction of tautomer preferences in aqueous solution remains a challenging problem [51] and is out of the scope of this study. Here, we concentrate on finding efficient protocols to compute the correction

Δ A_{X s o l v}^{M M \to S Q M / M M}

, the red arrow in Figure 1. Since (S)QM/MM methods are likely necessary for answering questions involving tautomer preferences, the present work establishes the necessary methodology to enable such calculations efficiently.

4.3. Strategies to Improve the Convergences of NEW Simulations

4.3.1. Hybrid Charge Intermediates

As outlined in the Introduction, cf. Figure 1b, we investigated whether the quantity of interest

Δ A_{X s o l v}^{M M \to S Q M / M M}

can be computed more efficiently by inserting an intermediate state. We refer to it as “MULL”; the rationale for this label will become clear shortly. Thus, we have

Δ A_{X s o l v}^{M M \to S Q M / M M} = Δ A_{X s o l v}^{M M \leftrightarrow M U L L} + Δ A_{X s o l v}^{M U L L \to S Q M / M M}

(8)

This intermediate state should have a charge distribution resembling that of the SQM/MM end state. To be useful in practice, the free energy difference between the initial MM and the intermediate state

Δ A_{X s o l v}^{M M \leftrightarrow M U L L}

must be easy (=fast) to compute. In practice, this means that the interactions at the intermediate state must also be classical. The obvious choice, therefore, is to use the force field of the initial MM state with suitably modified partial charges.

In all our methodological work to date [30,31,32,33,34,35,36], we used the SCC-DFTB SQM method [52,53] as the high level of theory. On the one hand, SCC-DFTB is expected to be sufficiently similar to ab initio QM methods, so that insights obtained will be transferable to full DFT QM methods. On the other hand, SCC-DFTB is fast enough to explicitly carry out simulations at the high level of theory, something which is not feasible for ab initio QM methods. The ability to carry out simulations at the SQM/MM level of theory makes it possible to obtain rigorous reference results using Crooks’ equation; cf. the previous section.

Since Mulliken charges and Mulliken charge analysis are at the core of the SCC-DFTB methodology [52,53], a logical choice for the intermediate state is the combination of the MM force field (see below for details) with fixed partial charges derived from the Mulliken charges used at the SCC-DFTB level of theory; hence, the superscript MULL in Figure 1b and Equation (8). In SCC-DFTB calculations, the converged Mulliken charges change at every simulation step; therefore, an averaging procedure is needed. There are several possibilities for how to obtain a representative sample of configurations for which Mulliken charges are computed. The following three approaches are tested in this study. (i) Clearly, the most exact method is to use simulations at the high level of theory (the solute treated using SCC-DFTB, surrounded by MM waters) to obtain configurations for which the Mulliken charges are calculated and then averaged. All simulations and results using this intermediate state are labeled MULL(solv*). Our goal, however, is to avoid simulations at the high level of theory as much as possible. Therefore, we also used configurations sampled at the MM level of theory to derive Mulliken charges for the intermediate representation. Specifically, we used (ii) the Mulliken charges obtained from snapshots of the MM gas phase simulation of the solute (labeled as MULL(gas)), or (iii) we used snapshots from the corresponding MM simulations in water to obtain Mulliken charges for the solute (labeled as MULL(solv)).

The three sets of MULL charges were obtained by averaging over configurations saved during equilibrium MD simulations, i.e., MULL(gas) charges were derived from the MM gas phase simulations, MULL(solv) from the MM simulations in solutions, and MULL(solv*) from the SQM/MM simulations in solution. These are the simulations needed to save restart files for the NEW switching simulations to the respective other levels of theory. As described below (cf. Section 4.4.1), we always carried out eight repetitions, i.e., eight MD simulations per state and system. From each of these, 25 configurations were taken and reevaluated at the SQM(/MM) level of theory, writing out the converged Mulliken charges. Thus, each MULL charge set was obtained by averaging over 200 configurations.

If one inserts one of the MULL intermediate states to compute

Δ A^{M M \to S Q M / M M}

as described above, then one also must compute

Δ A_{X s o l v}^{M M \leftrightarrow M U L L}

to close the thermodynamic cycle (cf. Figure 1b). Since the MM and the MULL states differ only in the partial atomic charges, we calculated

Δ A_{X s o l v}^{M M \leftrightarrow M U L L}

using equilibrium free energy methods. Specifically, in addition to the end states, we used three equidistant intermediate states and calculated the free energy difference using Bennett’s acceptance ratio method (BAR) [42].

4.3.2. Stepwise Linear Switching Protocols

All protocols that are employed to switch from MM to SQM/MM are quite rapid, so the system will not remain in equilibrium. In the context of slow-growth thermodynamic integration, this has been referred to as Hamiltonian lag, and has been shown to lead to poorly converged results [48,54,55]. Although JAR [40] makes it possible to obtain the equilibrium free energy difference for a process from multiple irreversible work values, the convergence of the results depends on the variance of the work values, which in turn depends on the deviation of the switching trajectories from the equilibrium conditions [44,56].

In addition to inserting an intermediate state with partial charges resembling the high level of theory, as just described, we explored the use of modified switching protocols. Results from solvation spectroscopy [38,39] indicate that the faster process of water reorientation in response to a change in the charge distribution of a solute is completed after approximately 0.5 ps. We therefore tested piecewise linear switching protocols consisting of two or three stages, referred to as L2-X and L3-X, respectively. The three protocols investigated in detail are shown in Figure 11, which also include the default linear protocol L1 of Equation (5) (leftmost plot). In all cases, the total switching length

τ

is 2000 fs. In L2-1, instead of incrementing

λ_{t}

linearly, there are two stages: first,

λ_{t}

is incremented from

λ_{t} = 0.00 \to 0.35

in 200 fs, followed by slower linear switching from

λ_{t} = 0.35 \to 1.00

over 1800 fs. The two L3-X protocols use three stages. In L3-1,

λ_{t}

is switched from 0.0 to 0.35 to 0.8 to 1.0 over 200, 900, and 900 fs, respectively. Finally, L3-2 is very similar, but

λ_{t}

is switched from 0.0 to 0.35 to 0.8 to 1.0 over 200, 400, and 1400 fs, respectively. In all modified protocols,

λ_{τ}

is initially changed more rapidly compared to L1, followed by a slower change in

λ_{τ}

during the remainder of the switch.

To verify the applicability and the performance of the modified compared to the standard linear switching protocol, systematic tests were carried out for five compounds (1-t2, 4-t2, 5-t2, 6-t2, and 8-t1). This selection was motivated by the results obtained with the linear protocol, and includes both “easy” and “difficult” solutes with respect to convergence (cf. Results, Section 2.2).

4.3.3. Analyses Carried Out

The main focus of this study lies on computing

Δ A_{X s o l v}^{M M \to S Q M / M M}

efficiently, and most of our results are concerned with the accuracy of JAR-based protocols compared to the reference results obtained with CRO. Additional analyses and characterizations were carried out to understand how solute–solvent interactions change and thus affect convergence when switching from an MM to an SQM representation of the solute.

Characterizing Charge Distributions

As described in Section 4.3.1, we use three intermediate charge distributions to facilitate the transformation from the low to the high level of theory. It is therefore of interest to quantify the differences between the charge sets used. In the following, the reference charge set is always the one that is derived from averaging over the Mulliken charges obtained from SQM/MM trajectories [MULL(solv*)]. To quantify the difference between two charge sets, we computed the root-mean-square deviation

{RMSD}_{q}

, defined as

{RMSD}_{q} = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(q_{i}^{m e t h o d} - q_{i}^{M U L L (s o l v *)})}^{2}}

(9)

Here, N is the number of atoms of the molecule, the

q_{i}^{M U L L (s o l v *)}

charges serve as the reference, and the

q_{i}^{m e t h o d}

is one of the other charge sets used; i.e., method can be MM, MULL(gas), or MULL(solv).

Because all our solutes are neutral, the most important quantity affected by a change in partial charges is the dipole moment

\vec{μ}

. To compare the dipole moments resulting from the various charge sets, we consider the “differential” dipole moments between two charge distributions of a solute, i.e.,

Δ \vec{μ} = \sum_{i = 1}^{N} (q_{i}^{m e t h o d} - q_{i}^{M U L L (s o l v *)}) \cdot \vec{r_{i}}

(10)

In practice, we mostly compare the length of this vector, i.e.,

|Δ \vec{μ}| = Δ μ

.

Furthermore, the angle between the dipole moment vectors of two charge representations is of interest, which can be obtained in the usual way, is:

Θ_{SQM} = {cos}^{- 1} (\frac{{\vec{μ}}^{m e t h o d} \cdot {\vec{μ}}^{M U L L (s o l v *)}}{| {\vec{μ}}^{m e t h o d} | \cdot | {\vec{μ}}^{M U L L (s o l v *)} |})

(11)

Dipole moments were computed using the COOR DIPOle command of CHARMM (see https://academiccharmm.org/documentation/version/c47b1/corman (accessed on 29 April 2023)); if the difference between two charge sets is assigned as partial charges, one obtains the corresponding differential dipole moment.

Characterization of the First Solvation Shell

We analyzed all MM, MULL (gas), MULL (solv), MULL (solv*), and SQM/MM equilibrium trajectories to determine the average number of solvent water molecules in proximity to the solute. Specifically, a water molecule was considered to be in close contact if the distance between the water oxygen and a non-hydrogen atom of the solute was

\leq 3

Å. These analyses were carried out with the atom selection facility of CHARMM. (See https://academiccharmm.org/documentation/version/c47b1/select (accessed on 29 April 2023)).

Dynamics of Solvent Reorientation

To investigate the detailed dynamics of solvent reorientation upon changing the description of the solute from the MM to the SQM level of theory, we resort to ideas from computational spectroscopy, specifically the computation of the normalized Stokes shift

S (t)

from non-equilibrium MD simulations.

S (t)

is defined as [57,58]:

S (t) = \frac{ν (t) - ν (\infty)}{ν (0) - ν (\infty)} \approx \frac{\bar{Δ U (t)} - \bar{Δ U (\infty)}}{\bar{Δ U (0)} - \bar{Δ U (\infty)}}

(12)

where

ν (t)

is the time-dependent frequency of the emitted fluorescence light of the probe molecule’s chromophore. In computation, one assumes that changes in interactions between the solute and the solvent leading to

ν (t)

are purely electrostatic in nature. Thus, one takes equilibrated configurations from solute–solvent simulations, changes the partial charges from those describing the ground state to those describing the excited state, restarts the simulations, and monitors the difference in electrostatic solute–solvent interactions between the ground (

U^{0}

) and the excited state (

U^{*}

),

Δ U (t) = U^{*} (t) - U^{0} (t)

as a function of time. Just as the measured fluorescence signal is an average of the emissions from all the chromophores present in solution, one averages over at least a few hundred simulations; hence the bar in

\bar{Δ U (t)}

.

In the context of our work, the low- and high-level representations of the solute (region of interest) play the roles of the ground and excited states in solvation spectroscopy. In both cases, the solvent water has to reorient itself following a change in the charge distribution of the solute. Therefore, we adopted the computational approach to compute

S (t)

[57,58] as follows. We used starting configurations equilibrated either at (i) the MM level of theory or (ii) the MULL(solv) intermediate state, and restarted simulations with the full SQM/MM description of interactions. For each system studied (4-t2, 5-t2, 6-t2 and 8-t2), we computed 800 simulations of 10 ps length; based on the experimental findings, it is reasonable to expect that solvent reorientation is completed after this time [38,39]. At each timestep, we saved

Δ U (t) = U^{S Q M / M M} (t) - U^{m e t h o d} (t)

, where method was either MM or MULL(solv). This task was facilitated by a locally modified version of CHARMM, but could equally well be accomplished through the post-processing of trajectories saved during the simulations. The 800

Δ U (t)

time series were then averaged, resulting in the averaged time series

\bar{Δ U (t)}

. Next, we estimated

\bar{Δ U (\infty)}

, i.e., the limit of

t \to \infty

, by averaging (again) over the last 2000 entries of

\bar{Δ U (t)}

(

8 ps \leq t \leq 10 ps

).

Equation (12) is the definition of the normalized Stokes shift, whereas in the present context omitting the normalization, it turned out to be advantageous. Thus, we operate with the averaged time series,

\bar{Δ Δ U (t)} = \bar{Δ U (t)} - \bar{Δ U (\infty)},

(13)

which essentially is an unnormalized Stokes shift. To interpret it more easily,

\bar{Δ Δ U (t)}

was fitted to a mono-exponential target function

\bar{Δ Δ U (t)} \propto Δ U_{0} \cdot e^{- t / τ}

(14)

where

Δ U_{0}

is the prefactor of the exponential decay at

t = 0

, t is the time in fs, and

τ

the relaxation time constant in fs. As described in the Introduction, it is known experimentally that the orientation of solvent water occurs on two timescales [38,39]. Because any effects on the convergence of JAR calculations will result from the slower process, we intentionally always discarded the first 0.5 ps when carrying out the fit. Thus, the

τ

found using the fitting procedure should correspond to the slower reorientation process.

4.4. Overview of Simulations Carried Out

All simulations were carried out with CHARMM (developmental versions c44a2 and c47a1) [47]. The calculation of

Δ A_{X s o l v}^{M M \to S Q M / M M}

requires equilibrium simulations to generate the configurations from which the NEW switching simulations are started. As described above, in addition to switching directly from the MM to the SQM/MM level of theory, we also inserted three force field-based intermediate states with different atomic partial charges. For each of these states, we also had to carry out equilibrium simulations, followed by NEW switches to the SQM/MM level of interest. The NEW switching simulations themselves were carried out either in a strictly linear fashion (protocol L1 in Figure 11), or with a two- or three-staged linear protocol. To make sure that any convergence problems observed did not arise without solvent, we also carried out gas phase simulations for all compounds shown in Figure 10 using the optimized protocol of Ref. [36].

4.4.1. Simulation Details

Preparation and Initial Equilibration

For each molecule shown in Figure 10, starting coordinates were generated using CHARMM-GUI [59,60]. Missing force field parameters were generated using CGenFF 2.4.0 [61,62,63], as invoked by CHARMM-GUI. The solute–solvent systems were set up by placing each molecule into a cubic box of water molecules with an initial side-length of 26 Å containing 572 (CHARMM-modified) TIP3 waters [64]. Any water molecule overlapping with a solute atom was deleted. Each system was then equilibrated as follows: 100 steps of steepest descent minimization were followed by a constant pressure/temperature (CPT) simulation of 100 ps length with a timestep of 1 fs, applying the Langevin piston barostat [65] (mass of the pressure piston: 400 Da, Langevin piston collision frequency: 20 ps⁻¹, Langevin piston bath temperature: 300 K). The final 20 ps of these equilibration runs were used to determine the average box size. In Table S1 of the SI, we list the size of the simulation box determined in this manner, as well as the number of water molecules present, for each of the compounds studied.

Force Field-Based Equilibrium Simulations

Starting from the initial solute coordinates and the equilibrated solute–solvent systems, eight Langevin dynamics (LD) simulations (timestep 1 fs, friction coefficient 5 ps⁻¹) were carried out in the gas phase and in solution, respectively. Each of these simulations was initialized with different random velocities drawn from a Maxwell-Boltzmann distribution at 300 K. In each gas phase run, 5 ns of simulation time were discarded as equilibration, followed by 10 ns of production, during which the restart files were saved every 1000th step. In the analogous solution simulations at constant volume, 0.5 ns were discarded as equilibration. During the subsequent 1 ns production phase, restart files were saved every 1000th step. Thus, during the cumulative simulation length of 8 ns (solution)/80 ns (gas phase), 8000 (solution)/80,000 (gas phase) restart files were saved. These served as the pool of configurations sampled in the canonical ensemble, from which non-equilibrium switching simulations to the high (SQM(/MM)) level of theory were started.

The solute molecules were always fully flexible; the TIP3 waters were held rigid using SHAKE [66]. In the gas phase, the nonbonded interactions were not truncated (“infinite” cutoff radius). In the solution simulations, Lennard-Jones interactions were smoothly switched off between 10 and 12 Å, and electrostatic interactions were computed with the particle-mesh-Ewald method [67] (

κ = 0.34

Å⁻¹, spline interpolation to order 6, FFT grid size of 32).

The protocol just described was used both for the simulations at the MM level of theory, i.e., using the CGenFF force field, and for all simulations with a hybrid intermediate representation (force field with Mulliken-derived partial charges).

SQM(/MM) Equilibrium Simulations

To calculate the reference values via Crook’s equation, we also carried out simulations at the SQM(/MM) level of theory. Specifically, the solute was treated with the self-consistent-charge density-functional tight-binding (SCC-DFTB) method, as implemented in CHARMM [68], using the 3ob-3-1 [69,70,71,72] parameter set (https://www.dftb.org/parameters/download/3ob/3ob-3-1-cc/(accessed on 29 April 2023)). Water molecules were always treated classically. Analogous to the MM case just described, eight LD simulations started with different random initial velocities were carried out in the gas phase and in solution. The simulation length in all cases was 1 ns (1 million steps with a timestep of 1 fs). Restart files were written every 1000th step in solution and every 100th step in the gas phase, thus resulting in a total of 800 (solution)/8000 (gas phase) restart files generated during a cumulative simulation length of 8 ns. Nonbonded interactions were treated identically, as in the MM case.

4.4.2. Non-Equilibrium Work Simulations

From each set of restart files written during the equilibrium simulations at the MM, MULL, and SQM(/MM) levels of theory, every 40th was selected as the starting point for a NEW switch to the respective other level of theory. Thus, 200 switches per molecule were carried out to compute the free energy difference between two levels of theory. In all switching simulations, we used a timestep of 1 fs. Unless otherwise noted, the switching length was 2000 fs (2000 steps), both for linear and stepwise linear switching paths.

4.4.3. Calculation of $Δ A_{X s o l v}^{MM \leftrightarrow MULL}$ :

As described in Section 4.3.1, to close the thermodynamic cycle, one needs to compute

Δ A_{X s o l v}^{MM \leftrightarrow MULL}

. Since the only difference between the

M M

and

M U L L

descriptions of a system are the partial charges of the solute,

{q_{i}}

, intermediate states can be set up easily according to

q_{i} (λ) = (1 - λ) q_{i}^{M M} + λ q_{i}^{M U L L}

. To ensure an overlap between neighboring states, we used three intermediate states (

λ = 0.25, 0.5, 0.75

) in addition to the MM (

λ = 0

) and MULL (

λ = 1

) end states. All simulation settings were the same as described in the force field-based equilibrium simulations; the production length per state was 1 ns. At each

λ

state, 1000 coordinate sets were saved and the energies were reevaluated at the neighboring states. The free energy difference between the neighboring states was computed with BAR; these were summed up to give

Δ A_{X s o l v}^{MM \leftrightarrow MULL}

. All calculations were repeated eight times, starting from initial random velocities. The standard deviation obtained from these eight repetitions were used to estimate the standard error via Gaussian error propagation. All MM calculations were carried out with the OpenMM [73] GPU acceleration available in CHARMM (https://academiccharmm.org/documentation/version/c47b1/openmm(accessed on 29 April 2023)).

5. Conclusions

In summary, the reliable calculation of free energy differences between the MM and SQM/MM levels of theory in aqueous solution require either switching lengths of at least 5 ps, or the use of an appropriate intermediate charge state. In terms of aiding convergence, the use of intermediate states is not new. For example, the internal degrees of freedom of a molecule or region that should be described at two levels of theory can be made more similar to the target high-level representation via force matching [29,33,74]. Thus, the use of intermediate charge states plays the same role for electrostatic interactions between the core region (the region changed from e.g., MM to SQM) and the environment (the region always described at the low level of theory), as force-matched parameters play for the bonded energy terms of the core region. Since intermediate charge states can be rationally designed without too much computational effort, the poor convergence of MM→SQM/MM simulations due to slow solvent reorientation can be easily circumvented.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/molecules28104006/s1, Figure S1:

δ Δ A

including error bars; Figure S2:

δ Δ A

, all points; Figure S3:

δ Δ A

, all points, including error bars; Figure S4: RMSD[q]; Figure S5:

Δ μ

[D]; Figure S6:

θ_{S Q M} [^{\circ}]

; Figure S7: 3-t2 charge analysis; Figure S8: 4-t2 charge analysis; Figure S9: 8-t2 charge analysis; Figure S10: 4-t2 TIP3P relaxation; Figure S11: 6-t2 TIP3P relaxation; Figure S12: 8-t2 TIP3P relaxation; Table S1: Additional information on model compounds; Table S2: Summary table of all

δ Δ A

results in solution; Table S3: Detailed results for

N_{s w i t c h}

= 2000 fs

N_{r e p l i c a t e}

= 200

Δ A_{X s o l v}^{M M \to S Q M}

in solution; Table S4: Detailed results for

N_{s w i t c h}

= 2000 fs

N_{r e p l i c a t e}

= 200

Δ A_{g a s}^{M M \to S Q M}

; Table S5: Detailed results for

N_{s w i t c h}

= 5000 fs

N_{r e p l i c a t e}

= 200

Δ A_{X s o l v}^{M M \to S Q M}

in solution; Table S6: Detailed results for

N_{s w i t c h}

= 2000 fs

N_{r e p l i c a t e}

= 200

Δ A_{X s o l v}^{M U L L (g a s) \to S Q M}

in solution; Table S7: Detailed results for energy correction term via FEP

Δ A_{X s o l v}^{M U L L (g a s) \to S Q M}

in solution; Table S8: Detailed results for

N_{s w i t c h}

= 2000 fs

N_{r e p l i c a t e}

= 200

Δ A_{X s o l v}^{M U L L (s o l v) \to S Q M}

in solution; Table S9: Detailed results for energy correction term via FEP

Δ A_{X s o l v}^{M U L L (s o l v) \to S Q M}

in solution; Table S10: Detailed results for

N_{s w i t c h}

= 2000 fs

N_{r e p l i c a t e}

= 200

Δ A_{X s o l v}^{M U L L (s o l v *) \to S Q M}

in solution; Table S11: Detailed results for energy correction term via FEP

Δ A_{X s o l v}^{M U L L (s o l v *) \to S Q M}

in solution; Table S12: Mulliken Charge Analysis; Table S13: TIP3P Water Relaxation. Refs. [75,76] are cited in the Supplementary Materials.

Author Contributions

Conceptualization, A.S., H.L.W. and S.B.; methodology, H.L.W. and S.B.; validation, A.S. and S.B.; formal analysis, A.S.; investigation, A.S.; resources, S.B.; writing—original draft preparation, A.S.; writing—review and editing, H.L.W. and S.B.; visualization, A.S.; supervision, S.B.; project administration, S.B.; funding acquisition, H.L.W. and S.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Institute of General Medical Sciences, grant number 1R01GM129519-01 (HLW), and by the Austrian Science Fund (FWF), grant number P-31024 (SB). The APC for this article were also covered by the Austrian Science Fund (FWF).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Any additional data not available in the main manuscript or the SI are available on request from the corresponding authors.

Acknowledgments

We thank Christian Schröder for helping with the calculations of Stoke shift relaxations.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MD	Molecular Dynamics
MM	Molecular Mechanics
(S)QM	(Semi-empirical-)Quantum Mechanics
(S)QM/MM	(Semi-empirical-)Quantum Mechanical/ Molecular Mechanical hybrid methods
FES	Free energy simulation
FEP	Free energy perturbation
TI	Thermodynamic integration
BAR	Bennett’s acceptance ratio
NEW	Non-equilibrium work methods
JAR	Jarzynski’s Equation
CRO	Crook’s Equation

References

Bash, P.A.; Singh, U.C.; Langridge, R.; Kollman, P.A. Free Energy Calculations by Computer Simulation. Science 1987, 236, 564–568. [Google Scholar] [CrossRef]
Gao, J.; Kuczera, K.; Tidor, B.; Karplus, M. Hidden Thermodynamics of Mutant Proteins: A Molecular Dynamics Analysis. Science 1989, 244, 1069–1072. [Google Scholar] [CrossRef]
Chodera, J.D.; Mobley, D.L.; Shirts, M.R.; Dixon, R.W.; Branson, K.; Pande, V.S. Alchemical free energy methods for drug discovery: Progress and challenges. Curr. Opin. Struct. Biol. 2011, 21, 150–160. [Google Scholar] [CrossRef]
Barros, E.P.; Ries, B.; Böselt, L.; Champion, C.; Riniker, S. Recent developments in multiscale free energy simulations. Curr. Opin. Struct. Biol. 2022, 72, 55–62. [Google Scholar] [CrossRef]
Mey, A.S.; Allen, B.K.; Macdonald, H.E.B.; Chodera, J.D.; Hahn, D.F.; Kuhn, M.; Michel, J.; Mobley, D.L.; Naden, L.N.; Prasad, S.; et al. Best practices for alchemical free energy calculations [Article v1.0]. Living J. Comput. Mol. Sci. 2020, 2, 18378. [Google Scholar] [CrossRef]
Schindler, C.E.; Baumann, H.; Blum, A.; Boöse, D.; Buchstaller, H.P.; Burgdorf, L.; Cappel, D.; Chekler, E.; Czodrowski, P.; Dorsch, D.; et al. Large-scale assessment of binding free energy calculations in active drug discovery projects. J. Chem. Inf. Model. 2020, 60, 5457–5474. [Google Scholar] [CrossRef]
Cournia, Z.; Allen, B.; Sherman, W. Relative Binding Free Energy Calculations in Drug Discovery: Recent Advances and Practical Considerations. J. Chem. Inf. Model. 2017, 57, 2911–2937. [Google Scholar] [CrossRef]
Limongelli, V. Ligand binding free energy and kinetics calculation in 2020. Comput. Mol. Sci. 2020, 10, e1455. [Google Scholar] [CrossRef]
Maier, S.; Thapa, B.; Erickson, J.; Raghavachari, K. Comparative assessment of QM-based and MM-based models for prediction of protein–ligand binding affinity trends. Phys. Chem. Chem. Phys. 2022, 24, 14525–14537. [Google Scholar] [CrossRef]
Yang, W.; Gao, Y.Q.; Cui, Q.; Ma, J.; Karplus, M. The missing link between thermodynamics and structure in F₁-ATPase. Proc. Natl. Acad. Sci. USA 2003, 100, 874–879. [Google Scholar] [CrossRef]
Mulholland, A.J. Chemical accuracy in QM/MM calculations on enzyme-catalysed reactions. Chem. Cent. J. 2007, 1, 19. [Google Scholar] [CrossRef] [PubMed]
Lev, B.; Roux, B.; Noskov, S.Y. Relative free energies for hydration of monovalent ions from QM and QM/MM simulations. J. Chem. Theory Comput. 2013, 9, 4165–4175. [Google Scholar] [CrossRef] [PubMed]
Riahi, S.; Roux, B.; Rowley, C.N. QM/MM molecular dynamics simulations of the hydration of Mg (II) and Zn (II) ions. Can. J. Chem. 2013, 91, 552–558. [Google Scholar] [CrossRef]
Delgado, J.A.; Wineman-Fisher, V.; Pandit, S.; Varma, S. Inclusion of High-Field Target Data in AMOEBA’s Calibration Improves Predictions of Protein–Ion Interactions. J. Chem. Inf. Model. 2022, 62, 4713–4726. [Google Scholar] [CrossRef] [PubMed]
Yang, W.; Cui, Q.; Min, D.; Li, H. QM/MM Alchemical Free Energy Simulations: Challenges and Recent Developments. Annu. Rep. Comput. Chem. 2010, 6, 51–62. [Google Scholar] [CrossRef]
Gao, J. Absolute free energy of solvation from Monte Carlo simulations using combined quantum and molecular mechanical potentials. J. Phys. Chem. 1992, 96, 537–540. [Google Scholar] [CrossRef]
Luzhkov, V.; Warshel, A. Microscopic models for quantum mechanical calculations of chemical processes in solutions: LD/AMPAC and SCAAS/AMPAC calculations of solvation energies. J. Comput. Chem. 1992, 13, 199–213. [Google Scholar] [CrossRef]
Cui, Q.; Pal, T.; Xie, L. Biomolecular QM/MM Simulations: What Are Some of the “Burning Issues”? J. Phys. Chem. B 2021, 125, 689–702. [Google Scholar] [CrossRef]
Hudson, P.S.; Aviat, F.; Meana-Pañeda, R.; Warrensford, L.; Pollard, B.C.; Prasad, S.; Jones, M.R.; Woodcock, H.L.; Brooks, B.R. Obtaining QM/MM binding free energies in the SAMPL8 drugs of abuse challenge: Indirect approaches. J. Comput.-Aided Mol. Des. 2022, 36, 263–277. [Google Scholar] [CrossRef]
Wang, X.; He, Q.; Sun, Z. BAR-based multi-dimensional nonequilibrium pulling for indirect construction of a QM/MM free energy landscape. Phys. Chem. Chem. Phys. 2019, 21, 6672–6688. [Google Scholar] [CrossRef]
Sun, Z.; Liu, Z. BAR-Based Multi-Dimensional Nonequilibrium Pulling for Indirect Construction of QM/MM Free Energy Landscapes: Varying the QM Region. Adv. Theory Simula. 2021, 4, 2100185. [Google Scholar] [CrossRef]
Wang, J.N.; Liu, W.; Li, P.; Mo, Y.; Hu, W.; Zheng, J.; Pan, X.; Shao, Y.; Mei, Y. Accelerated Computation of Free Energy Profile at Ab Initio Quantum Mechanical/Molecular Mechanics Accuracy via a Semiempirical Reference Potential. 4. Adaptive QM/MM. J. Chem. Theory Comput. 2021, 17, 1318–1325. [Google Scholar] [CrossRef] [PubMed]
Hudson, P.S.; Han, K.; Woodcock, H.L.; Brooks, B.R. Force matching as a stepping stone to QM/MM CB [8] host/guest binding free energies: A SAMPL6 cautionary tale. J. Comput.-Aided Mol. Des. 2018, 32, 983–999. [Google Scholar] [CrossRef] [PubMed]
König, G.; Pickard, F.C.; Mei, Y.; Brooks, B.R. Predicting hydration free energies with a hybrid QM/MM approach: An evaluation of implicit and explicit solvation models in SAMPL4. J. Comput.-Aided Mol. Des. 2014, 28, 245–257. [Google Scholar] [CrossRef]
König, G.; Brooks, B.R.; Thiel, W.; York, D.M. On the convergence of multi-scale free energy simulations. Mol. Simul. 2018, 44, 1062–1081. [Google Scholar] [CrossRef]
König, G.; Brooks, B.R. Predicting binding affinities of host-guest systems in the SAMPL3 blind challenge: The performance of relative free energy calculations. J. Comput.-Aided Mol. Des. 2012, 26, 543–550. [Google Scholar] [CrossRef]
Zeng, J.; Tao, Y.; Giese, T.J.; York, D.M. Modern semiempirical electronic structure methods and machine learning potentials for drug discovery: Conformers, tautomers, and protonation states. J. Chem. Phys. 2023, 158, 124110. [Google Scholar] [CrossRef]
Heimdal, J.; Ryde, U. Convergence of QM/MM free-energy perturbations based on molecular-mechanics or semiempirical simulations. Phys. Chem. Chem. Phys. 2012, 14, 12592–12604. [Google Scholar] [CrossRef]
Giese, T.J.; York, D.M. Development of a robust indirect approach for MM→QM free energy calculations that combines force-matched reference potential and Bennett’s acceptance ratio methods. J. Chem. Theory Comput. 2019, 15, 5543–5562. [Google Scholar] [CrossRef]
Hudson, P.S.; Woodcock, H.L.; Boresch, S. Use of Nonequilibrium Work Methods to Compute Free Energy Differences Between Molecular Mechanical and Quantum Mechanical Representations of Molecular Systems. J. Phys. Chem. Lett. 2015, 6, 4850–4856. [Google Scholar] [CrossRef]
Kearns, F.L.; Hudson, P.S.; Woodcock, H.L.; Boresch, S. Computing Converged Free Energy Differences Between Levels of Theory via Nonequilibrium Work Methods: Challenges and Opportunities. J. Comput. Chem. 2017, 38, 1376–1388. [Google Scholar] [CrossRef] [PubMed]
Boresch, S.; Woodcock, H.L. Convergence of single-step free energy perturbation. Mol. Phys. 2017, 115, 1200–1213. [Google Scholar] [CrossRef]
Hudson, P.S.; Boresch, S.; Rogers, D.M.; Woodcock, H.L. Accelerating QM/MM Free Energy Computations via Intramolecular Force Matching. J. Chem. Theory Comput. 2018, 14, 6327–6335. [Google Scholar] [CrossRef]
Kearns, F.L.; Warrensford, L.; Boresch, S.; Woodcock, H.L. The Good, the Bad, and the Ugly: “HiPen”, a New Dataset for Validating (S)QM/MM Free Energy Simulations. Molecules 2019, 24, 681. [Google Scholar] [CrossRef]
Hudson, P.S.; Woodcock, H.L.; Boresch, S. Use of Interaction Energies in QM/MM Free Energy Simulations. J. Chem. Theory Comput. 2019, 15, 4632–4645. [Google Scholar] [CrossRef]
Schöller, A.; Kearns, F.; Woodcock, H.L.; Boresch, S. Optimizing the Calculation of Free Energy Differences in Nonequilibrium Work SQM/MM Switching Simulations. J. Phys. Chem. B 2022, 126, 2798–2811. [Google Scholar] [CrossRef]
Ito, S.; Cui, Q. Multi-level free energy simulation with a staged transformation approach. J. Chem. Phys. 2020, 153, 044115. [Google Scholar] [CrossRef]
Maroncelli, M.; Fleming, G.R. Computer simulation of the dynamics of aqueous solvation. J. Chem. Phys. 1988, 89, 5044–5069. [Google Scholar] [CrossRef]
Maroncelli, M. The dynamics of solvation in polar liquids. J. Mol. Liq. 1993, 57, 1–37. [Google Scholar] [CrossRef]
Jarzynski, C. Nonequilibrium equality for free energy differences. Phys. Rev. Lett. 1997, 78, 2690. [Google Scholar] [CrossRef]
Crooks, G.E. Path-ensemble averages in systems driven far from equilibrium. Phys. Rev. E 2000, 61, 2361. [Google Scholar] [CrossRef]
Bennett, C.H. Efficient estimation of free energy differences from Monte Carlo data. J. Comput. Phys. 1976, 22, 245–268. [Google Scholar] [CrossRef]
Marenich, A.V.; Jerome, S.V.; Cramer, C.J.; Truhlar, D.G. Charge Model 5: An Extension of Hirshfeld Population Analysis for the Accurate Description of Molecular Interactions in Gaseous and Condensed Phases. J. Chem. Theory Comput. 2012, 8, 527–541. [Google Scholar] [CrossRef] [PubMed]
Dellago, C.; Hummer, G. Computing Equilibrium Free Energies Using Non-Equilibrium Molecular Dynamics. Entropy 2014, 16, 41–61. [Google Scholar] [CrossRef]
Pohorille, A.; Jarzynski, C.; Chipot, C. Good Practices in Free-Energy Calculations. J. Phys. Chem. B 2010, 114, 10235–10253. [Google Scholar] [CrossRef]
Woodcock, H.L.; Miller, B.T.; Hodoscek, M.; Okur, A.; Larkin, J.D.; Ponder, J.W.; Brooks, B.R. MSCALE: A general utility for multiscale modeling. J. Chem. Theory Comput. 2011, 7, 1208–1219. [Google Scholar] [CrossRef]
Brooks, B.R.; Brooks, C.L., III; Mackerell, A.D., Jr.; Nilsson, L.; Petrella, R.J.; Roux, B.; Won, Y.; Archontis, G.; Bartels, C.; Boresch, S.; et al. CHARMM: The biomolecular simulation program. J. Comput. Chem. 2009, 30, 1545–1614. [Google Scholar] [CrossRef]
Hunter, J.E.; Reinhardt, W.P.; Davis, T.F. A finite-time variational method for determining optimal paths and obtaining bounds on free energy changes from computer simulations. J. Chem. Phys. 1993, 99, 6856–6864. [Google Scholar] [CrossRef]
Geballe, M.T.; Skillman, A.G.; Nicholls, A.; Guthrie, J.P.; Taylor, P.J. The SAMPL2 Blind Prediction Challenge: Introduction and Overview. J. Comput.-Aided Mol. Des. 2010, 24, 259–279. [Google Scholar] [CrossRef]
Wahl, O.; Sander, T. Tautobase: An open tautomer database. J. Chem. Inf. Model. 2020, 60, 1085–1089. [Google Scholar] [CrossRef]
Wieder, M.; Fass, J.; Chodera, J.D. Fitting quantum machine learning potentials to experimental free energy data: Predicting tautomer ratios in solution. Chem. Sci. 2021, 12, 11364–11381. [Google Scholar] [CrossRef] [PubMed]
Elstner, M.; Porezag, D.; Jungnickel, G.; Elsner, J.; Haugk, M.; Frauenheim, T.; Suhai, S.; Seifert, G. Self-consistent-charge density-functional tight-binding method for simulations of complex materials properties. Phys. Rev. B 1998, 58, 7260. [Google Scholar] [CrossRef]
Elstner, M. The SCC-DFTB method and its application to biological systems. Theor. Chem. Acc. 2006, 116, 316–325. [Google Scholar] [CrossRef]
Pearlman, D.A.; Kollman, P.A. The lag between the Hamiltonian and the system configuration in free energy perturbation calculations. J. Chem. Phys. 1989, 91, 7831–7839. [Google Scholar] [CrossRef]
Hermans, J. Simple analysis of noise and hysteresis in (slow-growth) free energy simulations. J. Phys. Chem. 1991, 95, 9029–9032. [Google Scholar] [CrossRef]
Jarzynski, C. Rare events and the convergence of exponentially averaged work values. Phys. Rev. E 2006, 73, 046105. [Google Scholar] [CrossRef]
Heid, E.; Harringer, S.; Schröder, C. The small impact of various partial charge distributions in ground and excited state on the computational Stokes shift of 1-methyl-6-oxyquinolinium betaine in diverse water models. J. Chem. Phys. 2016, 145, 164506. [Google Scholar] [CrossRef]
Schröder, C.; Heid, E. Computational solvation dynamics: Implementation, application, and validation. In Annual Reports in Computational Chemistry; Elsevier: Amsterdam, The Netherlands, 2020; pp. 93–154. [Google Scholar] [CrossRef]
Jo, S.; Kim, T.; Iyer, V.G.; Im, W. CHARMM-GUI: A web-based graphical user interface for CHARMM. J. Comput. Chem. 2008, 29, 1859–1865. [Google Scholar] [CrossRef]
Lee, J.; Cheng, X.; Swails, J.M.; Yeom, M.S.; Eastman, P.K.; Lemkul, J.A.; Wei, S.; Buckner, J.; Jeong, J.C.; Qi, Y.; et al. CHARMM-GUI input generator for NAMD, GROMACS, AMBER, OpenMM, and CHARMM/OpenMM simulations using the CHARMM36 additive force field. J. Chem. Theory Comput. 2016, 12, 405–413. [Google Scholar] [CrossRef]
Vanommeslaeghe, K.; MacKerell, A.D., Jr. Automation of the CHARMM General Force Field (CGenFF) I: Bond perception and atom typing. J. Chem. Inf. Model. 2012, 52, 3144–3154. [Google Scholar] [CrossRef]
Vanommeslaeghe, K.; Raman, E.P.; MacKerell, A.D., Jr. Automation of the CHARMM General Force Field (CGenFF) II: Assignment of Bonded Parameters and Partial Atomic Charges. J. Chem. Inf. Model. 2012, 52, 3155–3168. [Google Scholar] [CrossRef] [PubMed]
Yu, W.; He, X.; Vanommeslaeghe, K.; MacKerell Jr, A.D. Extension of the CHARMM general force field to sulfonyl-containing compounds and its utility in biomolecular simulations. J. Comput. Chem. 2012, 33, 2451–2468. [Google Scholar] [CrossRef] [PubMed]
Neria, E.; Fischer, S.; Karplus, M. Simulation of activation free energies in molecular systems. J. Chem. Phys. 1996, 105, 1902. [Google Scholar] [CrossRef]
Feller, S.E.; Zhang, Y.; Pastor, R.W.; Brooks, B.R. Constant pressure molecular dynamics simulation: The Langevin piston method. J. Chem. Phys. 1995, 103, 4613–4621. [Google Scholar] [CrossRef]
Ryckaert, J.P.; Ciccotti, G.; Berendsen, H.J. Numerical integration of the cartesian equations of motion of a system with constraints: Molecular dynamics of n-alkanes. J. Comput. Phys. 1977, 23, 327–341. [Google Scholar] [CrossRef]
Essmann, U.; Perera, L.; Berkowitz, M.L.; Darden, T.; Lee, H.; Pedersen, L.G. A smooth particle mesh Ewald method. J. Chem. Phys. 1995, 103, 8577–8593. [Google Scholar] [CrossRef]
Cui, Q.; Elstner, M.; Kaxiras, E.; Frauenheim, T.; Karplus, M. A QM/MM implementation of the self-consistent charge density functional tight binding (SCC-DFTB) method. J. Phys. Chem. B 2001, 105, 569–585. [Google Scholar] [CrossRef]
Gaus, M.; Goez, A.; Elstner, M. Parametrization and Benchmark of DFTB3 for Organic Molecules. J. Chem. Theory Comput. 2013, 9, 338–354. [Google Scholar] [CrossRef]
Gaus, M.; Lu, X.; Elstner, M.; Cui, Q. Parameterization of DFTB3/3OB for Sulfur and Phosphorus for Chemical and Biological Applications. J. Chem. Theory Comput. 2014, 10, 1518–1537. [Google Scholar] [CrossRef]
Lu, X.; Gaus, M.; Elstner, M.; Cui, Q. Parametrization of DFTB3/3OB for Magnesium and Zinc for Chemical and Biological Applications. J. Phys. Chem. B 2015, 119, 1062–1082. [Google Scholar] [CrossRef]
Kubillus, M.; Kubař, T.; Gaus, M.; Řezáč, J.; Elstner, M. Parameterization of the DFTB3 Method for Br, Ca, Cl, F, I, K, and Na in Organic and Biological Systems. J. Chem. Theory Comput. 2015, 11, 332–342. [Google Scholar] [CrossRef] [PubMed]
Eastman, P.; Swails, J.; Chodera, J.D.; McGibbon, R.T.; Zhao, Y.; Beauchamp, K.A.; Wang, L.P.; Simmonett, A.C.; Harrigan, M.P.; Stern, C.D.; et al. OpenMM 7: Rapid development of high performance algorithms for molecular dynamics. PLoS Comput. Biol. 2017, 13, e1005659. [Google Scholar] [CrossRef] [PubMed]
Pan, X.; Li, P.; Ho, J.; Pu, J.; Mei, Y.; Shao, Y. Accelerated computation of free energy profile at ab initio quantum mechanical/molecular mechanical accuracy via a semi-empirical reference potential. II. Recalibrating semi-empirical parameters with force matching. Phys. Chem. Chem. Phys. 2019, 21, 20595–20605. [Google Scholar] [CrossRef] [PubMed]
Wood, R.H.; Muhlbauer, W.C.F.; Thompson, P.T. Systematic errors in free energy perturbation calculations due to a finite sample of configuration space: Sample-size hysteresis. J. Phys. Chem. 1991, 95, 6670–6675. [Google Scholar] [CrossRef]
Wu, D.; Kofke, D.A. Model for small-sample bias of free-energy calculations applied to Gaussian-distributed nonequilibrium work measurements. J. Chem. Phys. 2004, 121, 8742–8747. [Google Scholar] [CrossRef]

Figure 1. (a) Indirect alchemical thermodynamic cycle to compute

Δ A_{X g a s \to s o l v}^{S Q M / M M}

(dashed arrow) in three steps according to Equation (1). (b) Indirect alchemical thermodynamic cycle to compute

Δ A_{X s o l v}^{M M \to S Q M / M M}

(red arrow) via hybrid charge intermediates; cf. Equation (8).

Figure 1. (a) Indirect alchemical thermodynamic cycle to compute

Δ A_{X g a s \to s o l v}^{S Q M / M M}

(dashed arrow) in three steps according to Equation (1). (b) Indirect alchemical thermodynamic cycle to compute

Δ A_{X s o l v}^{M M \to S Q M / M M}

(red arrow) via hybrid charge intermediates; cf. Equation (8).

Figure 2. Comparison of

δ Δ A_{X s o l v}^{M M \to S Q M}

using three protocols without intermediate state: JAR(2ps): 2 ps forward switches from MM to SQM, JAR(5ps): 5 ps forward switches from MM to SQM, and CRO(2ps): 2 ps forward and backward switches. In all cases, the CRO results obtained from 5 ps switches were considered as the reference values CRO

_{Ref}

. For 5t-2 and 6t-2, the JAR(2ps) results are off-scale, which is indicated by the red arrows.

Figure 2. Comparison of

δ Δ A_{X s o l v}^{M M \to S Q M}

using three protocols without intermediate state: JAR(2ps): 2 ps forward switches from MM to SQM, JAR(5ps): 5 ps forward switches from MM to SQM, and CRO(2ps): 2 ps forward and backward switches. In all cases, the CRO results obtained from 5 ps switches were considered as the reference values CRO

_{Ref}

. For 5t-2 and 6t-2, the JAR(2ps) results are off-scale, which is indicated by the red arrows.

Figure 3. Comparison of

δ Δ A

obtained with three hybrid intermediate charge states: MULL(gas), orange triangles; MULL(solv), blue squares; MULL(solv*), green diamonds. All results were obtained from 200 NEW switches of 2 ps length to compute

Δ A^{MULL \to SQM}

and include the correction

Δ A^{MM \leftrightarrow MULL}

; cf. Equation (2b). The JAR(2ps) results already shown in Figure 2 (red circles) are included for comparison purposes.

Figure 3. Comparison of

δ Δ A

obtained with three hybrid intermediate charge states: MULL(gas), orange triangles; MULL(solv), blue squares; MULL(solv*), green diamonds. All results were obtained from 200 NEW switches of 2 ps length to compute

Δ A^{MULL \to SQM}

and include the correction

Δ A^{MM \leftrightarrow MULL}

; cf. Equation (2b). The JAR(2ps) results already shown in Figure 2 (red circles) are included for comparison purposes.

Figure 4. Performances of the three modified switching protocols for a subset of the compounds. Results for the direct NEW switching protocols MM → SQM are shown as red circles; the results for calculations using the MULL(solv) hybrid charge intermediate state are blue squares. The fill color of the circles/squares indicates the stepwise linear switching protocol used (green = L2-1, orange = L3-1 and salmon = L3-2). For comparison purposes, the MM/JAR(2ps) (red circles) and MULL(solv) results (blue squares), both using the default switching protocol L1, are included as well.

Figure 5. Graphical representation of differences in partial atomic charges compared to MULL(solv*) for 5-t2 (top) and 6-t2 (bottom). Charge differences are indicated as a color gradient from blue (

δ q = - 0.2 e

) to red (

δ q = + 0.4 e

). The differential dipole moment

Δ \vec{μ}

is displayed as an orange arrow. The magnitude

Δ μ = |Δ \vec{μ}|

and the angle

Θ_{SQM}

between the dipole moment of the respective charge distribution with that of the MULL(solv*) reference distribution are given below each structure.

Figure 5. Graphical representation of differences in partial atomic charges compared to MULL(solv*) for 5-t2 (top) and 6-t2 (bottom). Charge differences are indicated as a color gradient from blue (

δ q = - 0.2 e

) to red (

δ q = + 0.4 e

). The differential dipole moment

Δ \vec{μ}

is displayed as an orange arrow. The magnitude

Δ μ = |Δ \vec{μ}|

and the angle

Θ_{SQM}

between the dipole moment of the respective charge distribution with that of the MULL(solv*) reference distribution are given below each structure.

Figure 6. Difference in the average number of water molecules closer than 3 Å, compared to SQM/MM. The average number of water molecules in the SQM/MM simulations was used as the reference value.

Figure 7. Water reorientation dynamics of TIP3P solvent for 5-t2 when switching from MM and MULL(solv) to SQM/MM. Raw data are shown in gray; fit for MM in red and fit for MULL(solv) in blue. The fit parameters (

Δ U_{0}

in kcal/mol,

τ

in fs) are listed in the inset; cf. Equation (14).

Figure 7. Water reorientation dynamics of TIP3P solvent for 5-t2 when switching from MM and MULL(solv) to SQM/MM. Raw data are shown in gray; fit for MM in red and fit for MULL(solv) in blue. The fit parameters (

Δ U_{0}

in kcal/mol,

τ

in fs) are listed in the inset; cf. Equation (14).

Figure 8. Ramachandran plots of pseudo-dipeptide/blocked alanine.

Figure 9. Ramachandran plots of pseudo-dipeptide/blocked serine.

Figure 10. Model systems used in this work: eight tautomer pairs, seven taken from the SAMPL2 challenge [49], and one (7-t1 ⇌ 7-t2) from Tautobase [50]. In addition, we used the blocked amino acids Ala and Ser as model solutes.

Figure 11. The standard linear (L1) and three modified switching protocols investigated. The total switching length

τ

was 2 ps in all four cases. The color code used for L2-1, L3-1, and L3-2 is also used in Results.

Figure 11. The standard linear (L1) and three modified switching protocols investigated. The total switching length

τ

was 2 ps in all four cases. The color code used for L2-1, L3-1, and L3-2 is also used in Results.

Table 1. MAD (mean absolute deviation) for the results shown in Figure 3, as well as the spread of

δ Δ A

(respectively, the smallest and largest absolute deviations).

Table 1. MAD (mean absolute deviation) for the results shown in Figure 3, as well as the spread of

δ Δ A

(respectively, the smallest and largest absolute deviations).

Pathway	MAD [kcal/mol]	Spread MAD [kcal/mol]
Pathway	MAD [kcal/mol]	Min	Max
MM	0.29	0.01	1.80
MULL(gas)	0.12	0.01	0.49
MULL(solv)	0.09	0.01	0.28
MULL(solv*)	0.07	0.01	0.20

Table 2. MAD (Mean Absolute Deviation) over all compounds for

{RMSD}_{q}

, the magnitude of the differential dipole moment

Δ μ

, and the angle

Θ_{SQM}

between the dipole moment vector of the respective charge distribution and that of the MULL(solv*) charge distribution.

Table 2. MAD (Mean Absolute Deviation) over all compounds for

{RMSD}_{q}

, the magnitude of the differential dipole moment

Δ μ

, and the angle

Θ_{SQM}

between the dipole moment vector of the respective charge distribution and that of the MULL(solv*) charge distribution.

Pathway	MAD RMSD $_{q} [e]$	MAD $Δ μ$ [D]	MAD $θ_{S Q M}$ [°]
MM	0.14	3.38	29.8
MULL(gas)	0.06	2.44	7.6
MULL(solv)	0.04	1.81	16.0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Schöller, A.; Woodcock, H.L.; Boresch, S. Exploring Routes to Enhance the Calculation of Free Energy Differences via Non-Equilibrium Work SQM/MM Switching Simulations Using Hybrid Charge Intermediates between MM and SQM Levels of Theory or Non-Linear Switching Schemes. Molecules 2023, 28, 4006. https://doi.org/10.3390/molecules28104006

AMA Style

Schöller A, Woodcock HL, Boresch S. Exploring Routes to Enhance the Calculation of Free Energy Differences via Non-Equilibrium Work SQM/MM Switching Simulations Using Hybrid Charge Intermediates between MM and SQM Levels of Theory or Non-Linear Switching Schemes. Molecules. 2023; 28(10):4006. https://doi.org/10.3390/molecules28104006

Chicago/Turabian Style

Schöller, Andreas, H. Lee Woodcock, and Stefan Boresch. 2023. "Exploring Routes to Enhance the Calculation of Free Energy Differences via Non-Equilibrium Work SQM/MM Switching Simulations Using Hybrid Charge Intermediates between MM and SQM Levels of Theory or Non-Linear Switching Schemes" Molecules 28, no. 10: 4006. https://doi.org/10.3390/molecules28104006

Article Menu

Exploring Routes to Enhance the Calculation of Free Energy Differences via Non-Equilibrium Work SQM/MM Switching Simulations Using Hybrid Charge Intermediates between MM and SQM Levels of Theory or Non-Linear Switching Schemes

Abstract

1. Introduction

2. Results

2.1. Overview of Calculated Free Energy Differences and Paths

2.2. Performances of 2 and 5 ps Linear Switching Protocols in Solution

2.3. Performances of Hybrid Charge Intermediates

2.4. Performances of Modified Switching Protocols

2.5. A Detailed Analysis of the Factors Affecting Convergence

2.5.1. Effects of Charge Distribution on Solute Properties

2.5.2. Effects of Charge Distribution on the First Solvation Shell

2.5.3. Water Reorientation Dynamics

2.5.4. Interplay between Charge Distribution and Conformational Preferences

3. Discussion

4. Materials and Methods

4.1. Computing Free Energy Differences between the Levels of Theory Using NEW Methods

4.2. Choice of Model Systems

4.3. Strategies to Improve the Convergences of NEW Simulations

4.3.1. Hybrid Charge Intermediates

4.3.2. Stepwise Linear Switching Protocols

4.3.3. Analyses Carried Out

Characterizing Charge Distributions

Characterization of the First Solvation Shell

Dynamics of Solvent Reorientation

4.4. Overview of Simulations Carried Out

4.4.1. Simulation Details

Preparation and Initial Equilibration

Force Field-Based Equilibrium Simulations

SQM(/MM) Equilibrium Simulations

4.4.2. Non-Equilibrium Work Simulations

4.4.3. Calculation of Δ A X s o l v MM ↔ MULL :

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.4.3. Calculation of $Δ A_{X s o l v}^{MM \leftrightarrow MULL}$ :