INVESTIGATION INTO THE FLUX DISTRIBUTION OF CENTRAL CARBON METABOLISM IN CORYNEBACTERIUM GLUTAMICUM USING PRINCIPAL COMPONENT ANALYSIS

Central carbon metabolism is the main source of energy required by organisms and it provides precursors for other in vivo metabolic processes. The flux flowing through the pathways involved in central carbon metabolism characterizes its biological function and genetic readout between species or environments. In recent years, using a 13C tracer technique, researchers have measured the flux of central carbon metabolism in Corynebacterium glutamicum under a variety of nutritional and environmental changes or genetic modifications. However, there is no integrated and comparative analysis of these measured flux values. In this study, the flux values of central carbon metabolism in Corynebacterium glutamicum that were obtained in other recent studies were consolidated. A preliminary examination of the distribution characteristics of flux values in each metabolic pathway was conducted and the regression relationship between different fluxes was investigated. The principal components of the flux vector were further extracted and aggregated based on the components, and the general features of flux distribution of central carbon metabolism as well as the influence of environmental and genetic factors on the flux distribution were determined. This study provides a foundation for further investigation into the flux distribution and regulation characteristics of central carbon metabolism.


INTRODUCTION
Corynebacterium glutamicum is an aerobic, Grampositive bacterium (Nishimura et al., 2007).C. glutamicum has been utilized in the production of industrial amino acids for nearly 40 years, and is used to produce over 1 million tons of amino acids every year (Wittmann et al., 2004).The main carbon sources for industrial amino acid production are cane molasses, beet molasses, and starch hydrolysate from corn or cassava (Ikeda, 2003).
For decades, improvements have been achieved in fermentation strategies as well as optimization of bacterial strains by genetic engineering techniques, leading to progressively increasing rates of production and/or yields (Jetten and Sinskey, 1995).These optimization strategies have often been based on overcoming the natural feedback regulation mechanisms specific to each biosynthetic pathway of interest (Sonntag et al., 1993;Reinscheid et al., 1994).However, it is now apparent that further improvements in production will require a better understanding of the factors influencing carbon flux through the central pathways to efficiently supply specific biosynthetic pathways of interest with the necessary carbon precursors and coenzymes (Dominguez et al., 1998).Traditionally, the central carbon metabolism (CCM) includes the Embden-Meyerhof-Parnas (EMP) pathway, the pentose phosphate (PP) pathway and the tricarboxylic acid (TCA) cycle (Büescher et al., 2009).CCM is the main source of the energy required by organisms and provides precursors for other in vivo metabolic processes (Bennett et al., 2009).The enzyme activity and protein expression levels of the key enzymes in CCM are genetically distinguishable between different species (Sauer et al., 1999).The process of CCM occurs upstream of microbial fermentation and is thus an essential step in and the main switch for improving the production of metabolites.Investigations into the metabolic regulatory properties of C. glutamicum have made an important contribution to the optimization of production strains and processes (Shi et al., 2014;Zhang et al., 2014).
Considering the significance of CCM in controlling the efficiency of metabolite production and total production volume, an accurate quantification of CCM flux in C. glutamicum under a variety of conditions is required to clarify the regulatory mechanisms underlying CCM.Fluxes of biochemical reactions and pathways can be assessed most extensively and precisely using 13 C isotopic labeling techniques, and to gain detailed information on fluxes through the CCM, 13 C isotopomer analysis is generally carried out in combination with a stoichiometric reaction network.Such approaches have proven useful in clarifying the functional and regulatory activities of cells in their entirety.In recent years, through the use of 13 C tracer technique, researchers have measured the CCM flux in C. glutamicum under a variety of nutritional and environmental changes or genetic modifications such as the response of the central metabolism in C. glutamicum to the use of an NADH-dependent glutamate dehydrogenase and comparative 13 C metabolic flux analysis of pyruvate dehydrogenase complex-deficient, L-valine-producing C. glutamicum.However, an integrated comparative analysis of these measured flux values has not yet been conducted.
In this study, the flux values of CCM in C. glutamicum measured in other recent studies were consolidated, a preliminary exploration of the distribution characteristics of the flux values in each metabolic pathway was conducted, and the regression relationships between different fluxes were investigated.Furthermore, the principal components of the flux vector were extracted and aggregated based on the components, and the general features of flux distribution of CCM and the influence of environmental and genetic factors on the flux distribution were explored.This study provides a foundation for further investigation into the flux distribution and regulation characteristics of CCM.

Network definition
The bioreaction network was taken from previous work and the following metabolic pathways were included in this study: the Embden-Meyerhof-Parnas (EMP) pathway, the pentose phosphate (PP) pathway, the tricarboxylic acid (TCA) cycle, the anaplerotic reaction and the glyoxylate shunt pathway.

Flux value formatting and assembly
The diversity of the reactions and network definitions, the quantity of experimental data, and the required genetic and cultivation information made the assembly of the data both difficult and time consuming.According to the standard of the Kyoto Encyclopedia of Genes and Genomes, the lumped reactions in these studies were broken down into their original forms and flux values were mapped to their precisely corresponding reactions.About 48 groups of flux values from 20 studies were acquired (Supplementary Table ).

Multiple regressions to fill flux data gaps
Since the culture conditions and experimental goals of each specific study were different, the pathways of the metabolic flux values listed in each paper were not identical.Therefore, the question of how to ensure matching of sample data values for statistical analysis had to be addressed.Our strategy was to construct a multiple linear regression equation where the unknown flux value was set as the unknown variable and the known flux value was set as the predictor variable in each sample.In this way, the unknown flux value in each metabolic flux case was established.SPSS 19.0 software was used as the regression tool, and the regression equation was built using the discontinuity method.

Principal component analysis
In the case of all pathways, the metabolic flux values were considered the subject of analysis.Principal component analysis was conducted using the "princomp" function in MATLAB to generate eigenvectors and eigenvalues (Nakayama et al., 2014).The resulting principal components were sorted according to their eigenvalues and a scree plot was constructed.Based on the plot, a component transformation matrix was constructed.Three of the largest principal components (PC1, PC2, and PC3) were chosen for the construction of scatter plots, which showed the distribution of all samples in the principal component (PC1, PC2, or PC3) space.

Flux correlation
Based on the collected data sets, the simple correlation coefficients of each flux reaction were calculated and represented using HeatMap (Fig 1).Certain patterns could be discerned from these results.Due to the relationship between network constraints, there was a strong positive correlation between the flux values of the EMP pathway, the PP pathway, and the TCA cycle, and the absolute values of the correlation coefficients were often >0.7.These observations can easily be explained by the fact that the internal reactions of these pathways are interconnected.Due to material balance constraints, there is naturally a strong correlation between the different flux values.
The intensity maps of correlation coefficients reveal a strong correlation between the flux values of various reactions in the TCA cycle and the EMP pathway, in particular for the reactions occurring after glyceraldehyde 3-phosphate production in the EMP pathway.The downstream reactions of the EMP pathway directly provide substrates for the TCA cycle and a certain correlation between these two pathways is therefore reasonable.The mean correlation coefficients (the mean value of the absolute value of the correlation coefficients) between each reaction and other reactions were obtained from the HeatMap figures.These coefficients reflect the impact of the reaction flux values on the overall flux distributions.The highest coefficient value was determined for the key rate-limiting steps catalyzed by glucose-6-phosphate dehydrogenase (G6PDH), glucose-6-phosphate isomerase (PGI), and several other major branch reactions; and the second highest values were observed for reactions in secondary branches such as those catalyzed by triosephosphate isomerase (TPI), ribulose-phosphate 3-epimerase (RPE), and other enzymes.

FILLING DATA GAPS
The C. glutamicum experimental data in literature were based on different experimental goals and were obtained from different strains and under different experimental conditions.This meant that in some studies, the flux values from one part of the branch pathways in CCM were measured, while in others the flux values from another part of the branch pathways were measured.This resulted in only a small number of studies reporting on values for the same pathways, thus limiting the comparison of flux values across different studies.
In order to ensure that flux values were comparable across different studies, existing flux data were used to speculate on undetermined flux data by the regression method.Specifically, in Tomokazu's study (Shirai, 2007), the flux values of reactions R08549 and R00405, which control the conversion of 2-oxoglutarate to succinate, were not measured, while other values were measured; in other studies, the flux values of reactions R08549, R00405, and 15 other reactions were measured.Therefore, based on the available flux values of R08549, R00405 and 15 other reactions, the regression equations of reactions R08549 and R00405 were constructed by the regression discontinuity method in this study, allowing for the values of R08549 and R00405 to be deduced based on the flux value in study X.
Similarly, based on the different combinations of known and unknown flux values in various studies, the regression relationships between R04779, R01070, R01015, R00200, R00209, R00351, R01324, R00267, R08549, R00405, R00402, R01082 and R00342 were constructed.Some of the main regression equations are listed in Table 1 (Supplementary Tables).The values derived from these regression equations are of course not entirely accurate, but play an important role in the subsequent analysis.

FLUX VALUE DISTRIBUTION
A basic statistical analysis was conducted on the flux values for each reaction, and the degree of variation and mean flux values were compared.The magnitude of flux value change in each reaction was represented as the standard deviation of the flux value.Based on these changes, the reactions were categorized into three types.
The first type includes reactions with a large magnitude of flux value changes, where standard deviations ranged from 700 to 2200 (Fig 2).Besides the pyruvate (PYR), phosphoenolpyruvate (PEP) and citrate synthase reactions, these reactions included mainly those of the PP pathway.This branch pathway had a complementary relationship with glycolysis in terms of glucose-6-phosphate (G6P) consumption, thus forming a redundant relationship.Meanwhile, since the main reaction of this pathway is reversible, it was prone to exhibiting a large range of flux values.
The second type includes reactions with a relatively large magnitude of flux value changes, where the standard deviations were ~500.These reactions are mainly the downstream pathways of glycolysis and the TCA cycle.Since these downstream pathways take up the flux from the upstream pathways and the PP pathway, flux values at this point were always found to have a certain proportional relationship with substrate intake, such that the observed changes were not large.Flux value changes for the TCA cycle fell within a small range due the lower overall carbon flux.
The third type includes reactions with small magnitudes of flux value changes with standard deviations of ~250, and includes mainly the upstream pathways of glycolysis, such as the formation of glyceraldehyde-3-phosphate (GAP) and dihydroxyacetone phosphate (DHAP) from Fructose 1,6-bisphosphate (FBP).The small flux changes observed for these reactions may be due to the limited regulation of the reactions.
The analysis of the mean flux values of each reaction resulted in an interesting observation: the mean flux values of each of the reactions were not entirely random, but were instead close to a few relatively fixed values (Fig 3).For the reactions in the downstream branches of the glycolytic pathway where PEP is generated from GAP, for example, the mean flux values were relatively large (~140).This can be explained by the fact that for these reactions, the majority of the carbon flux is not silenced into biomass, and the carbon exists in the form of three-atom molecules.In contrast, the flux values for the nonoxidative reactions of the PP pathway were found to fall in a very small range (~20).Aside from the two aforementioned reaction types, the mean flux values of other reactions, such as the upstream portion of glycolysis, the TCA cycle and the oxidized portion of the PP pathway, were mainly found to be concentrated in the 40-70 range.This analysis demonstrated the contribution of each enzyme to the reaction flux under physiological conditions.For those enzymes that always catalyze a small amount of reaction flux, this may indicate the total capacity of its enzymatic activity.Therefore, to change the flux value through such an enzyme, the total activity of the enzyme must be increased.Similar analyses may provide further insights.

PRINCIPAL COMPONENT DISTRIBUTION
Principal component analysis was performed on the complete reconstructed data sets.The eigenvalues of the principle components are shown in Fig 4 .The eigenvalue of the first principal component was 12.3, the second 6.2 and the third 2.2.The contribution of the first three principal components to the variance was ≤80%.
All the samples were divided into two groups according to the environment or gene modifica-tion.One group of samples was collected after the growth environment was changed and was compared to the control; this group was known as the environmental change group.The other group of samples was collected from genetically modified strains and was called the genetic modification group.The first three principal components were collected as coordinates and data for all samples is summarized in Fig. 5, where the red and green data points represent the environmental change and genetic modification groups, respectively.From this representation, the genetically and environmentally modified cases were found to be interspersed, while the distribution of the genetically modified cases was sparser than that of the changed environment group.These differences in distribution indicate that, of the data included, environmental changes had a more stringent effect on flux distribution than genetic manipulations did.At the same time, it was also found that genetic modifications had a powerful impact on certain changes in flux values.The PC2 values of multiple genetically modified strains were less than -1, which was not the case for the environmental change group.

Fig 1 .
Fig 1.The HeatMap of Flux Correlation coefficients between each reaction.The x-and y-axes represent the reaction index number for different reactions.The color represents the correlation coefficient values between the reactions indicated by the indices on x and y axes.

Fig 2 .
Fig 2. Flux range of the reactions in central carbon metabolism.The reaction ID on the x-axis represents the reaction index number for different reactions.The bar value on the y-axis is the maximal flux values minus the minimal flux values for the corresponding reaction across all data.

Fig 3 .
Fig 3. Average flux value of the reactions in central carbon metabolism.The reaction ID on x axis represents the reaction index number for different reactions.The bar value on the y-axis is the average flux values for the corresponding reaction across all data.

Fig 4 .
Fig 4. Scree plot of the principle component sorted by Eigen value.The y-axis represents the Eigen value for the principle components.The principle components are ordered, and by definition are therefore assigned a number label, which is the number in the x-axis.

Fig 5 .
Fig 5. Scatter plot of the flux distributions in principle component space.The x-, y-and z-axes represent the values of principle components 1, 2 and 3, respectively.The red point represents the flux distribution of genetically modified strains; the green point denotes the flux distribution of strains from changed environments.