Wednesday, 20 April 2016

Tutorial: estimating the stability effect of a mutation with FoldX - release 4

Note: this is the exact same tutorial as published on 25 March 2015, except it is now based on FoldX4. The reason is that we have to update the FoldX license every year (from 31st of December to 1st of January). This means that if you ran jobs over Christmas Holidays, the jobs are killed at New Year's Eve. And the problem is they shifted from release 3 to 4, which was accompanied by a complete change in the interface. Which is good as it is  much simpler now.

They also made some changes in the energy computation by adding some parameters, but this is not documented yet. So if you started your project using FoldX3, and you need again to use FoldX, it is might be better to re-run FoldX4 on your whole dataset, for coherence reasons.

Finally, as usual, if you have any questions or comments, your are welcome!



Here is a brief tutorial on how to use FoldX to estimate the stability effect of a mutation in a 3D structure. The stability (ΔG) of a protein is defined by the free energy, which is express in kcal/mol. The lower it is, the more stable it is. ΔΔG is difference in free energy (in kcal/mol) between a wild-type and mutant. A mutation that brings energy (ΔΔG > 0 kcal/mol) will destabilise the structure, while a mutation that remove energy (ΔΔG < 0 kcal/mol) will stabilise the structure. A common threshold is to say that a mutation has a significant effect if ΔΔG is >1 kcal/mol, which roughly corresponds to the energy of a single hydrogen bond.

A good way to compute the free energy is to use molecular dynamics. Main problem: it can be very time-consuming.

FoldX uses an empirical method to estimate the stability effect of a mutation. The executable is available here:

You need to register, but it is free for Academics.

NB: I strongly encourage to read the manual (before or in parallel of this tutorial).
[NB2 (20/04/2016): I haven't found a proper manual for FoldX4. Only html pages per command)]


FoldX was used in many studies, i.e.:
Tokuriki N, Stricher F, Serrano L, Tawfik DS. How protein stability and new functions trade off. PLoS Comput Biol. 2008 Feb 29;4(2):e1000002

Dasmeh P, Serohijos AW, Kepp KP, Shakhnovich EI. Positively selected sites in cetacean myoglobins contribute to protein stability. PLoS Comput Biol. 2013;9(3):e1002929.
And I personally used it in three of my studies:
Studer RA, Christin PA, Williams MA, Orengo CA. Stability-activity tradeoffs constrain the adaptive evolution of RubisCO. Proc Natl Acad Sci U S A. 2014 Feb 11;111(6):2223-8.
Studer RA, Opperdoes FR, Nicolaes GA, Mulder AB, Mulder R. Understanding the functional difference between growth arrest-specific protein 6 and protein S: an evolutionary approach. Open Biol. 2014 Oct;4(10). pii: 140121.

Rallapalli PM, Orengo CA, Studer RA, Perkins SJ. Positive selection during the evolution of the blood coagulation factors in the context of their disease-causing mutations. Mol Biol Evol. 2014 Nov;31(11):3040-56.

Tutorial by example:

The structure is a bacterial cytochrome P450 (PDB:4TVF). You can download its PDB file (4TVF.pdb) from here:

Or directly with wget:

We would like to test the stability of mutation at position 280, from a leucine (L) to an aspartic acid (D). Here is the original structure, with Leu280 in green, and residues around 6Å in yellow:

FoldX works in two steps:

1) Repair the structure.

There are frequent problems in PDB structures, like steric clashes. FoldX will try to fix them and lower the global energy (ΔG). The "RepairPDB" command is better than the "Optimize" command. Here is how to launch FoldX:
foldx --command=RepairPDB --pdb=4TVF.pdb --ionStrength=0.05 --pH=7 --water=CRYSTAL --vdwDesign=2 --outPDB=true --pdbHydrogens=false

We indicate which PDB file it needs to use, that we want to repair it (RepairPDB), that it will use water and metal bonds from the PDB file (--water=CRYSTAL) and that we want a PDB as output (--outPDB=true). All other parameter are by default.

This process is quite long (around 10 minutes). Here is the result (the original structure is now in white, while the repaired structure is in yellow/green):

 We can see that some side chains have slightly moved (in particular Phe16).

The starting free energy ΔG was +73.22 kcal/mol, and it was lowered to -46.97 kcal/mol, which is now stable (remember that a "+" sign means unstable while a "-" sign means stable).

Once it's finished, it will produce a file named "4TVF_Repair.pdb", which you will use in the next step.

2) Perform the mutation

The mutation itself is performed by the BuildModel function. There are other methods, but the BuildModel is apparently the most robust (I said apparently, but there are no proper benchmarks against the other method PositionScan or PSSM). You also need to specify the mutation in a separate file "individual_list.txt". Here is the file "individual_list.txt" (yes, just one line):
It contains the starting amino acid (L), the chain (A), the position (280) and the amino acid you want at the end (D). One line correspond to one mutant. It means you can mutate many residues at the same per line (mutant) and also  produce different mutants by different numbers of lines.

In the following command line, you will see that is 4TVF_Repair.pdb and not 4TVF.pdb that is mutated. You will also notice "--numberOfRuns=3". This is because some residues can have many rotamers and could have some convergence problems. You may to increase this values to 5 or 10, in case you are mutating long residues (i.e. Arginine) that have many rotamers.

You can run it by:
foldx --command=BuildModel --pdb=4TVF_Repair.pdb --mutant-file=individual_list.txt --ionStrength=0.05 --pH=7 --water=CRYSTAL --vdwDesign=2 --outPDB=true --pdbHydrogens=false --numberOfRuns=3
It is much faster this time (i.e. a few seconds) and will produce many files.

FoldX will first mutate the target residue (L) to itself (L) and move it as well as all neighbouring side chains multiple times. We can see that Leu280 (green) was rotated:

=> This is will give the free energy of the wild-type (let's call it ΔGwt).

Then, it will mutate the target residue (L) to the desired mutant (D) and move it as well as all neighbouring side chains multiple times. We can see that Leu280 is mutated to Asp280 (see the two oxygen atoms in red):

=> This is will give the free energy of the mutant (let's call it ΔGmut).

The difference in free energy (ΔΔG) is given by ΔGmut-ΔGwt.

In the file "Raw_4TVF_Repair.fxout", you can retrieve the energy of the three runs for both WT and Mutant.

  • ΔGmut = 4TVF_Repair_1.pdb = -42.7114 kcal/mol
  • ΔGwt = WT_4TVF_Repair_1_0.pdb = -47.6248 kcal/mol
  • => ΔΔG = ΔGmut-ΔGwt = (-42.7114)-(-47.6248) = +4.9134 kcal/mol

One file contains the average difference over all runs: "Average_4TVF_Repair.fxout".

(You will notice that the difference in free energy ΔΔG is +4.85 kcal/mol [+- 0.06 kcal/mol]).

=> It means the mutation L280D is highly destabilising (positive value, and much above 1.0 kcal/mol).

PS: A way to define the threshold is to use the standard deviation (SD) by multiple:

The reported accuracy of FoldX is 0.46 kcal/mol (i.e., the SD of the difference
between ΔΔGs calculated by FoldX and the experimental values). We can bin the ΔΔG values into seven categories:
  1. highly stabilising (ΔΔG < −1.84 kcal/mol); 
  2. stabilising (−1.84 kcal/mol ≤ ΔΔG < −0.92 kcal/mol); 
  3. slightly stabilising (−0.92 kcal/mol ≤ ΔΔG < −0.46 kcal/mol); 
  4. neutral (−0.46 kcal/mol < ΔΔG ≤ +0.46 kcal/mol);
  5. slightly destabilising (+0.46 kcal/mol < ΔΔG ≤ +0.92 kcal/mol);
  6. destabilising (+0.92 kcal/mol < ΔΔG ≤ +1.84 kcal/mol);
  7. highly destabilising (ΔΔG > +1.84 kcal/mol).


  1. I am trying to analyze the stability of some the mutation on stability of proteins. On the way...I am facing some problems as follows

    """Starting BuildModel
    I did not find the wild-type DNA or PROTEIN sequence in the X-tal structure
    There was a problem in BuildModel
    in mutant_file.txt found: H40A,H87A;""""

    Can you help please guide me in solving them...Thanks in Advance

    1. I had the same problem and solved by just renaming the "mutant_file.txt" to "individual_list.txt"

  2. Thank you so much. I've been looking for the terminology for understanding a paper, but I couldn't find it anywhere until now.

  3. Hi Romain,
    can you explain what the --water parameter is doing exactly? The manual isn't very helpful in that regard.

    Thanks for that tutorial!


    1. Hi,

      I agree it is not straightforward. This is the full definition in the manual:

      Default: -IGNORE

      If -CRYSTAL uses the X-ray waters bridging two protein atoms. If -PREDICT, waters that make 2 or more hydrogen bonds to the protein are predicted. If -COMPARE it compares the predicted water bridges with the X-ray ones.

      From the paper:
      "Water. As discussed above, FoldX uses explicit representation of those water molecules that specifically interact with the protein through two or more hydrogen bonds. These water molecules are essential to correctly predict mutational free energy changes near the surface of the protein or in protein–protein or protein–DNA complexes. As the prediction of water bridges slows down the FoldX calculation and the water bridges are not essential to the analysis of amino acid residues in the core of the protein, the Water option allows three settings. (i) With the PREDICT setting, FoldX will predict the water bridges (J.W.H. Schymkowitz, F. Rousseau, I.C. Martins, J. Ferkinghoff-Borg, F. Stricher and L. Serrano, manuscript in preparation). (ii) With the CRYSTAL setting, FoldX reads the positions of the water molecules that are present in the pdb file that make at least two hydrogen bonds with the protein. (iii) With the IGNORE setting, FoldX is instructed not to include explicit consideration of water molecules."

      And the other paper:
      "The prediction algorithm operates in the following two-step manner. First, the structure of interest is screened for consensus sites by considering the canonical binding modes between metal ions and protein atoms found in the Protein Data Bank (PDB). Second the consensus positions found are used as initiation sites for an optimization (see Supporting Materials) using the Fold-X force field. The second step increases the computational time but improves the prediction accuracy on average by 0.5 Å. The optimization step is essential because the prediction of binding free energies of ion ligands is very sensitive to small changes of position due to the electrostatic nature of the interactions (see below)."

      So, as I understand:

      FoldX will take the protein structure and all surrounding water molecules. In the PDB, bonds between waters and amino acid residues are annotated as water bridges. I found this example:

      SITE 1 AC1 6 ASP A 20 ASP A 22 ASP A 24 THR A 26
      SITE 2 AC1 6 GLU A 31 HOH A 181
      SITE 1 AC2 6 ASP A 56 ASP A 58 ASN A 60 THR A 62
      SITE 2 AC2 6 GLU A 67 HOH A 240
      SITE 1 AC3 6 ASP A 93 ASP A 95 ASN A 97 TYR A 99
      SITE 2 AC3 6 GLU A 104 HOH A 188
      SITE 1 AC4 6 ASP A 129 ASP A 131 ASP A 133 GLN A 135
      SITE 2 AC4 6 GLU A 140 HOH A 233

      So the option:

      -IGNORE: FoldX will ignore any water molecules when predicting the stability of the protein.

      -CRYSTAL: FoldX will use the water molecules and the bonds between waters and amino acids.

      -PREDICT: FoldX will use the water molecules, but will predict the bonds between waters and amino acids.

      -COMPARE: FoldX will do both and compare them.

      I hope that helps.