Dynamics and deformability of α -, 310-and π -helices

: Protein structures are often represented as seen in crystals as (i) rigid macromolecules (ii) with helices, sheets and coils. However, both definitions are partial because (i) proteins are highly dynamic macromolecules and (ii) the description of protein structures could be more precise. With regard to these two points, we analyzed and quantified the stability of helices by considering α-helices as well as 3 10 - and π-helices. Molecular dynamic (MD) simulations were performed on a large set of 169 representative protein domains. The local protein conformations were followed during each simulation and analyzed. The classical flexibility index (B-factor) was confronted with the MD root mean square flexibility (RMSF) index. Helical regions were classified according to their level of helicity from high to none. For the first time, a precise quantification showed the percentage of rigid and flexible helices that underlie unexpected behaviors. Only 76.4% of the residues associated with α-helices retain the conformation, while this tendency drops to 40.5% for 3 10 -helices and is never observed for π-helices. α-helix residues that do not remain as an α-helix have a higher tendency to assume β-turn conformations than 3 10 - or π-helices. The 3 10 -helices that switch to the α-helix conformation have a higher B-factor and RMSF values than the average 3 10 -helix but are associated with a lower accessibility. Rare π-helices assume a β-turn, bend and coil conformations, but not α-or 3 10 -helices. The view on π-helices drastically changes with the new DSSP (Dictionary of Secondary Structure of Proteins) assignment approach, leading to behavior similar to 3 10 -helices, thus underlining the importance of secondary structure assignment methods.


INTRODUCTION
Before the first protein 3D structure was solved at atomic resolution [1], Pauling and Corey provided evidence that polypeptide chains can adopt a limited number of repetitive local protein structures stabilized by intramolecular hydrogen bonds (HB) [2][3][4].The two major local folds are: (i) the α-helix (or 3.613 helix) with HB between amino acid residues i and i +4, and (ii) the β-sheet composed of extended strands with HB between adjacent strands.They roughly represent 1/3 rd and 1/5 th of the residues found in proteins, respectively.Since their discovery, extensive explorations have been conducted to better understand their formation and their role in the kinetics of protein folding.Although a general view of the folding kinetics is too complex to define, theoretical models as well as recent cutting-edge experiments have underlined their significant contribution through different examples [5].Furthermore, with high-resolution experimental 3D structure accumulating, many studies have been carried out to decipher the sequence features that preferentially drive towards a given local fold than to another one.
In this context, α-helices have been intensely analyzed [6][7][8].It has been emphasized that the length of an α-helix depends on its amino acid composition [9,10] and that its extremities (or caps) have specific signatures [11][12][13][14][15].These caps can be stabilized by hydrophobic interactions between helical residues and residues outside the repetitive structures [16][17][18].The importance of such interactions was highlighted for instance in the case of class α-glutathione transferases where, using computational approaches [19,20], it was shown that the highly conserved helix 9 modulates their catalytic and binding function and that a mutation of N-cap residue Asp-209 destabilizes the enzyme's function.
Aside from the α-helix, other helical motifs have been identified, i.e. 310 and π-helices, which are less frequent and represent about 4% and 0.02% of the residues in proteins, respectively.The 310-helix is characterized by intramolecular hydrogen bonds between residues i and i+3 [21][22][23], and is usually short, containing three (representing one turn) or four residues.Nonetheless, two-turn and longer 310-helices have also been reported (22).In terms of location, they are preferentially observed at the termini of α-helices [24,25] and are considered as connectors between two α-helices [23,25].However, the 310-helix is often found in the region connecting strands within β-hairpin or β-β-corner motifs [26].In terms of sequence, their amino acid content is different from the α-helix [27].A specific analysis of a 310-helix adjoining the α-helix and β-strand has shown that the composition of 310-helices and β-strands is much more conserved among family members of homologous structures than those adjacent to two helices [26].The preferred length of the 310-helix occurring between an α-helix and β-strand is equal to 3 residues, but extends to 4 residues when located between two α-helices (α-310-α) [28].
In the π-helix (i.e.4.416-helices), hydrogen bonds are formed between amino acid residues i and i+5.This helix conformation is less stable due to steric constraints [29][30][31][32], which could explain why π-helices are rare.In 2000, Weaver [33] found only 14 well-defined π-helices in the available PDB files (i.e. about 13500 structures).However, the π-helix should occur more frequently in protein structures than has been previously described [34], and should be conserved within functionally related proteins [35].π-helices show distinct residue preferences that differ from those of α-helices [34].Interestingly, it was shown that on a limited number of π-helices they were directly linked to the formation or stabilization of a specific binding site [33].
Overall, these analyses point out significant differences in the amino acid composition of various helical motifs, which can be exploited for their prediction.For instance, the secondary structure prediction method SSPRO8 performs reasonably well for 8 different states (~62-63%) [36].However, although it aims at separate predictions of α-, 310-and π-helices, the 310-helix prediction rate is very low and the π-helix state is rarely predicted [36].The latest approach with RaptorX Property performs slightly better for 310-helix [37], but remains unsuccessful for the π-helix.The rare occurrences of these motifs largely explain the low rates.It may also arise from the difficulty to assign 310-and πhelices, which might be due to the enhanced flexibility profile of these structures [38].
Indeed, a dynamic relationship would exist between the different kinds of helices, for instance between α-and π-helices as shown in [39].Importantly, 310-helices and to a lesser extent π-helices, have been proposed to be intermediates in the folding/unfolding of α-helices [40][41][42].However, these studies are often based on model systems such as polyalanine peptides and use molecular dynamics (MD) simulations to inspect the effect of chain-lengths and N-terminus residues in α-helix folding [43].Unfortunately, flexibility profiles and putative interconversion between helical states have never been conducted for a large set of real sequences.
In this paper, we present the first large scale MD simulation study from a large number of folds to depict how helical regions evolve.Our study provides new insights into the flexibility and deformability of the different helical states, which are an essential component of the structure and function of biological macromolecules.

Datasets
A databank of 169 X-ray structures taken from Protein DataBank (PDB) [44] was extracted using ASTRAL 2.03 [45,46] (PDB ids and corresponding chain are provided in the Supplementary data S1).The dataset is filtered out based on structure resolution better than 1.5 Å, and in the absence of heteroatoms (other than water), alternate or modified residues in the chain.Moreover, the chain length should range between 50 and 250 residues.Only globular proteins were used.An in-house parser was used to filter out and fetch the information [47].

MD simulations
Three MD simulations were performed for all protein structures with GROMACS 4.5.7 software [48], using AMBER99sb force field [49].Each protein structure was immersed in a periodic dodecahedron box using TIP3P water molecules [50] and neutralized with Na + or Cl - counterions.The system was then energetically minimized with a steepest-descent algorithm for 2000 steps.The MD simulations were performed in isothermal-isobar thermodynamics ensemble (NPT) with temperature fixed at 300 K and pressure at 1 bar.A short run of 1 ns was performed to equilibrate the system using the Berendsen algorithm for temperature and pressure control [51].The coupling time constants were equal to 0.1 ps for each physical parameter.A production step of 50 ns was done using the Parrinello-Rahman algorithm [52] for temperature and pressure control, with coupling constants of T=0.1 ps and P=4 ps.All bond lengths were constrained with the LINCS algorithm [53], which allowed an integration step of 2 fs.The PME algorithm [54] was applied for long-range electrostatic interactions using a cut-off of 1 nm for nonbonded interactions.
This protocol was applied on each of the 169 protein chains.From each MD simulation, the conservation of the secondary structures was observed and the structural deviation of each snapshot from the initial structure was measured.Conformations were saved every ps.For each MD simulation, the secondary structures were analyzed and the structural deviation of each snapshot from the initial structure was measured.Trajectory analyses were performed with the GROMACS software, in-house Python and R scripts.Root mean square deviations (RMSD) and root mean square fluctuations (RMSF) were computed on Cα atoms.Normalized RMSFs and normalized B-factors were computed as in [55].

Local protein conformation analyses
Secondary structure assignment was performed using DSSP [56] (CMBI version 2000) with default parameters.A second assignment was performed later with a recent DSSP (2015) version 2.2.1 since the protocol of assignment had slightly changed.Protein Blocks (PBs), which are a structural alphabet composed of 16 local prototypes [57], was also employed to analyze local conformations.Each specific PB is characterized by the φ, ψ dihedral angles of five consecutive residues.PBs give a reasonable approximation of all local protein 3D structures [58] and are very efficient in tasks such as protein superimpositions [59] and MD analyses [60].PB assignment was carried out for every residue obtained from every snapshot extracted from MD simulations using our PBxplore tool at GitHub [61].From this description we developed a useful measure that helps in quantifying the flexibility of each amino acid, the so-called Neq (for an equivalent number of PBs) [57].Neq is a statistical measurement similar to entropy and represents the average number of PBs that a residue may adopt at a given position.Neq is calculated as follows [57]: where fx is the frequency of PB x in the position of interest.An Neq value of 1 indicates that only one type of PB is observed while a value of 16 is equivalent to an equal probability for each of the 16 states, i.e. random distribution.

Clustering approach
A residue is initially associated to one of the 8 defined secondary states assigned by DSSP [56], namely, α-, 310-and π-helices, β-strand, turns, bends, β-bridges and coil.During MDs, DSSP is used again to assign the protein chain secondary structures.Hence, each residue is associated to a vector of size S=8 representing the 8 defined secondary states and, more specifically, the occurrence of each observed state.To define common behaviors between residues, a k-means clustering approach was used [62].Thus, at first a subset is created.It represents all the residues that were associated to this state before MD, e.g.310-helices.Then a fixed number of clusters k is determined, with k=5 (selected after few tests).The k clusters are of size S=8.All the data of the subset is compared to each k cluster and the one with the minimal Euclidean distance is considered the winner.After one cycle (and after all subsets had been used), the values of the k clusters are modified to correspond to the associated observations, i.e. each cluster is the barycenter of the associated observations.After a few cycles, the k clusters are stable and can be analyzed, e.g.how the residues associated to 310helix assignment have evolved.

Analyses of protein structures
From the ASTRAL 2.03 compendium [45,46] we selected a nonredundant dataset at 40% sequence identity.It consists of 5580 protein chains resulting from 4432 PDB files.By restricting chain lengths to between 50-250 residues, the resolution is better than 1.5 Å and excluding those with discontinuity in position numbering, missing residues, modified and/or incomplete residues, only 169 domains were selectively used.The 169 domains represent an equilibrated repartition among the different SCOP classes: all-α represent 18.9% of the chains, all-β 29.6%, α/β 24.8% and α+β 26.7%.
In the dataset, the distribution of the helices assigned using DSSP [56] is 31.5% for the α-helix, 3.99% for the 310-helix and 0.28% for the π-helix, which is similar to the distribution observed by Tyagi et al. in 2009 [63].As shown in Table 1 (and Supplementary data S2), the lowest B-factors are associated to α-helices, an expected feature since α-helices are found most prominently in the ordered state [64].Interestingly, π-helices are observed in the dataset to be less flexible than 310-helices with average normalized B-factor values of 0.09 and 0.24, respectively.Both correspond to the flexible region as defined in [64][65][66][67].This tendency is correlated with the relative accessibility of the residues computed by DSSP [56], a higher accessibility being observed for 310-helices than for π-helices (Table 1 and Supplementary data S3).

Analyses of molecular dynamics
The distribution of normalized B-factors (Fig. 2A) and normalized RMSF (Fig. 2B) was highly similar to the distribution observed in a smaller dataset previously used [55,68].Fig. 2C shows the correlation between normalized B-factor values and normalized RMSF (Pearson's coefficient r = 0.43).This correct correlation is very close to the one previously observed [55], which was computed from fewer observations.Interestingly, 60.2% of the positions do not change at all (i.e.no local deformability was observed as computed with Neq, as it equals 1.0).Furthermore, the behavior observed with B-factors is confirmed with normalized RMSF analysis.The most rigid helical structure are α-helices; π-helices are more flexible but less than 310-helices, which are always the most flexible ones (see Table 1).
In the following sections, we detail the dynamic behavior of each type of helical structure and discuss the results in the light of previous studies [34,[39][40][41][42][43].α-helices α-helices represent nearly 30% of the residues of the complete dataset.Of those residues (Fig. 3A), 31.5% always remain α-helical residues during the entire simulation; 91.4% of such residues remain as an α-helix for more than 50% of the simulation time, while only 3.9% remain as an α-helix for less than 25% of time (Supplementary data S5).These statistics illustrate the very stable behavior of the α-helix.

310-helices
In comparison, despite its relative importance, the 310-helix was observed to be a less stable local structure during simulations.As can be seen in Fig. 3B, the tendency of 310-helix residues to remain in the 310-configuration is very limited.Indeed, no residue was found to retain completely the 310-helix conformation during the 50 ns of simulation (Supplementary data S5).Among the residues initially observed in the 310-helix state (3.9% of all the residues), only 15.7% of the residues were seen to have more than 90% representation as a 310-helix, but 54.1% of the residues were observed more than half of the time in this state.π-helices π-helices are extremely rare, observed 14-times less than the 310-helices in the initial dataset.They are slightly less accessible and with regard to the average B-factor and RMSF values, they are supposedly more stable.However, Fig. 3C shows that this is not the case.Indeed, only 2.4% of the residues remain more than half of the time as a π-helix and the rest is not stable, i.e. 97.6% are observed less than 1/4 th of the times as a π-helix.This result has already been presented [69].

Behavior of helical structures
As seen in the previous section, helical structures display different behaviors.The next question to be addressed relates to the nature of the conformational change, i.e. when residues are changing conformations during MD, what new conformations do they adopt?Do they preferentially explore the conformational space of the other helical conformations or the nonhelical ones?Table 2 depicts the helical exchange rates and also the change rate towards nonhelical states.For example, positions that start as an α-helix adopt non-α-helical states in 11.6% of the cases.Among them, changes towards the 310-helix represent only 14.1% of cases, the remaining (85.6%) not being helical.For positions starting as 310-helix, 46.6% of changes occur.Most of them transit to nonhelical states (82.2% of the cases) while transitions to the α-helix represent 17.7%.Hence, transitions to the π-helix are extremely rare (0.0096 and 0.0037% only for the αand 310-helix, respectively).Lastly, the π-helix residues only remain as a π-helix in 3.97% of cases; they are mainly associated with the α-helix (59.2%) and only 0.97% with the 310-helix.The change to nonhelical states represents only 39.8% of the simulation times, which is surprisingly smaller than observed either for the -helix or the 310-helix.Thus, the π-helix is associated more with helical conformations during dynamics than expected.

Helical structure behavior during dynamics
To go further into the analysis of transitions, we performed a clustering of the ensemble of conformations considering the 8 possible states of DSSP (using the k-means approach [62] with k=5 clusters for comparative purpose).The clusters are named according to their major DSSP state, with subscript indicating a minor DSSP state (T for turn or C for coil).The detailed composition of each cluster starting from the α-helix, 310-helix and π-helix initial state is given in Figs. 4, 5 and 6, respectively.
The classification highlights that a residue departing from the α-helix state (Fig. 4 and Table 3) tends to transit preferentially towards a β-turn state.Indeed, apart from the most populated cluster α1 (76.4% of the residues) that is composed of residues remaining as an αhelix, the second cluster α2 (11.5% of the residues) reveals a decreased content of α-helices in favor of β-turn states.In clusters αT1 and αT2 (4.2% and 6.6% of the residues, respectively), apart from a small subcluster depicting conformations that switch to the 310-helix conformation, β-turn conformations are even increased.The least populated cluster αC is associated with nonrepetitive structures (1.1% of the residues).Table 3 also underlines the correlation between flexibility and the presence of b-turns: the higher the b-turn or coil content, the larger the normalized RMSF (nRMSF).A similar correlation is observed with accessibility and Neq values.Clearly, the cluster α1 represents more buried and stable αhelices.
For residues initially assigned as 310-helices (Fig. 5), the most populated cluster 310 (40.5% of the residues) represents residues that remain in the 310-helix conformation.It is also the most rigid (both low B-factor and RMSF).From cluster 310 T1 to cluster 310 T2 (25.0 and 17.5% of the residues respectively), the 310-helix content decreases and the content of β-turns increases.The flexibility increases concomitantly.It also perfectly correlates with the relative accessibility (going from 32.8 to 36.3 and 41.8) and Neq values (1.34, 1.49 and 1.59, respectively).The preferred transition to the β-turn was expected as the 310-helix was shown to overlap by nearly 90% [70].This is one of the reasons for the disappearance of β-turn type III [71].The cluster 310 C has the highest content of nonrepetitive structures and is, as expected, associated with high flexibility.Surprisingly, residues in this cluster are less accessible than those in clusters 310 T1 and 310 T2 .Cluster 310 α (10.5% residues) that represents the transition to α-helical conformation exhibits low accessibility values, the lowest RMSF values compared to clusters 310 C , 310 T1 and 310 T2 , and the lowest Neq values, which shows the slightest local conformation change of all the 310 clusters.However, it is associated to high Bfactor values.
Fig. 6 (see also Supplementary data S8) summarizes the dynamical evolution of the rare π-helices.As seen previously, they are not stable, hence the cluster named π (10.1% of the residues), which has the highest frequency of π-helices, is also associated with the β-turn, bends and some coil conformations, but not αor 310-helices.This, however, is the tip of the iceberg of contradictions in this cluster.Cluster π is associated with the lowest crystallographic B-factors, with the highest accessibility observed.During the MD, it had a very flexible behavior, with the highest RMSF values and also the highest Neq values observed.The contradictions could be explained with the following three clusters, π α1 , π α2 and π αT that have a higher α-helical content with no π-helix residues and few β-turns.They have a slight increase in their Neq values.Cluster π α1 represents π-helix residues (38.7%) that had medium B-factor values associated with a higher stability as an α-helix because they are buried compared to other clusters.Clusters π T and π αT have intermediate flexibility behaviors, while the most flexible cluster is cluster π α2 .An interesting question was raised in this work regarding the potential bias in the assignment of the π-helix in the different version of DSSP.We owe it to an interesting comment by a reviewer and other studies of this bias.Fodje and Al-Karadaghi [35] nearly 15 years ago re-implemented a DSSP-like algorithm named SECSTR, which is nowadays not available.They changed the parameters to increase the visibility of the non-α helices.Recently, the DSSP order to assign helical state has been updated to prevent the underestimation of the number of π-helices [78].We decided to perform the analyses using the new DSSP (version 2.2.1) [72].Although no change was observed for 310-helices (still 3.5%), this was not the case for π-helices, which increase 15-fold going from 0.02 to 0.32%.This change comes from 2/3 of the previously assigned α-helices and 1/3 of turns.This new repartition provides a different view of π-helices.The π newDSSP -helices are more stable than expected.Their exchange rate (Table 2) is 28.5% to α-helices, 1.43% to 310-helices, 27.7 to others and remains as π newDSSP -helices 42.2% of the times.Moreover, 15% of these helices stay as π newDSSP -helices more than 90% of the times; 39.6% more than 50% of the time and 38.6% are seen at less than 25% of the time.Fig. 7 shows the new clustering for these π newDSSP -helices.This time a pure π-helix cluster was found (named nπ) that represents 20% of the residues.It is totally different from the previous cluster π, which was a mix of β-turn, bends and some coil conformations.Two other clusters (nπ α1 and nπ α2 ) are associated with the transition to α-helices; they represent 23.4% and 16.5%, while cluster nπ T is mainly associated with turns (25.5%).These three share common features with previous clusters π α1 , π α2 and π T , but the proportion of π-helix residues in these is drastically higher.The only fuzzy cluster is cluster nπ V , which is a mix of α-helix, π-helix, β-turn, and β-bridge (14.7%).These results show that the previous assignment of the π-helix strongly impacted our views of this local protein conformation.

CONCLUSION
As we have seen in a previous work on the β-bulge, sometimes nonclassical conformations associated with a classical repetitive structure can show some unexpected behavior, as for instance a protein with 15 β-bulges in which one β-bulge disappears after 2 /5 of the MD simulation and never returns [73].Similarly, we have shown that some α-helices were highly dynamic, with 5% of them being often highly accessible and shorter [55].
Herein we focused specifically on the helical structures.The first pertinent result is the quantification of the persistence of helical residues in their original local conformation.More than ¾ of α-helix residues remain in this helical conformation while it drops to 40.5% for 310helices.Surprisingly, even if they are mostly buried, π-helices are not observed to be stable in less than 20% of the simulation times.
The second interesting result on the flexibility and deformability of helical structures is the huge difference between the three types of helix.The α-helix shows good correlation between stability of the α-helical content and (i) the flexibility as seen through B-factors and RMSf, (ii) accessibility of the residues, and (iii) Neq.α-helix residues with a higher tendency to assume β-turn conformations than either the 310-or π-helices.
The 310-helix shows a similar general behavior in 90% of cases.Indeed, correlation is good in terms of flexibility (both crystallographic and in silico), accessibility (with the exception of cluster 310 C ) and Neq values.Nonetheless, the 310-helix that switches to the αhelix conformation shows drastically different characteristics.It retains higher B-factor and RMSF values than the average of cluster 310 but is associated with lower accessibility and lower Neq.This cluster seems to have the dynamic characteristics of a local protein conformation that can adopt an α-helix configuration [32].
Using classical DSSP, the π-helices cannot be described as stable.The residues that stayed mostly associated with the π-helix conformation are also associated with β-turn, bend and coil conformations, but never αor 310-helices.A counterintuitive finding is that they are also associated with low B-factors, but, due to the high accessibility, they are very flexible/deformable, having the highest RMSF and Neq values.The other residues lose their initial π-helix conformation and mainly assume an α-helical conformation or to a lesser extent a β-turn conformation.Using a recent DSSP, these results are completely mixed, with the πhelices exhibiting 310-helix behaviors.
These results are important to define more properly the properties of these helical conformations.As was done for PolyProline II [74][75][76][77] or β-turns [70,78], these less frequent states are also of interest.Numerous questions remain.It was recently shown that π-helices help the protein chain to fold properly and also in helix packing.They facilitate favorable nonbonded interactions by positioning the functionally important helical residues in the correct orientation [34].
In this work we showed and quantified for the first time precisely the exchange between different helical states that underlie unexpected behaviors.
Protein chains were selected with a limited redundancy from SCOP.They were of high quality.However, we must not downplay the fact that crystallization also produces some crystal contact packing effects, but these were found in a limited number of cases [79,80] and they did not have a significant effect on the results.It is important to properly define the properties of these helical conformations that have implications in both experimental and computational studies, i.e. analyses of flexibility of protein local conformations [58,64,81], force field parameters [69,82], disorder [83], etc. Table 1.Behaviors of helices.Average normalized B-factors (from X-ray structures), average normalized RMSF (from the MDs) and the average relative accessibility surface area (for Xray structures) of α-310-and π-helices are presented.

Fig. 1 .
Fig. 1.Different types of helices.Two views are provided from the side and from the bottom.

Fig. 2 .
Fig. 2. Normalized B-factors and normalized RMSf behaviors on the whole dataset.A -Normalized B-factor distribution; B -Normalized RMSf distribution; C -Correlation between normalized B-factor distribution and normalized RMSf.

Fig. 3 .
Fig. 3. Persistence of the initial helical state.The frequency of residues remaining in the original assigned state for the three types of helices during simulations with A -α-, B -310-, and C -π-helices.

Fig. 7 .
Fig. 7. Different clusters obtained for the π-helix using the new DSSP assignment protocol.See the legend to Fig. 4.

Table 2 .
Exchange rates of helices expressed as percentages.