Application of factor analysis in identification of dominant hydrogeo-chemical processes of some nitrogenous groundwater of Serbia

Multivariate statistical analyses are used for reducing large datasets to a smaller number of variables, which explain main hydrogeochemical processes that control water geochemistry. Factor analysis (FA) allows discovering intercorrelations inside the data matrix and grouping of similar variables, i.e. chemical parameters. In this way new variables are extracted, which are called factors, and each factor is explained by some hydrogeochemical process. Applying FA to a dataset that consists of 15 chemical parameters measured on 40 groundwater samples from Serbia, four factors were extracted, which explain 73.9% of total variance in the analyzed dataset. Interpretation of obtained factors indicated several hydrogeochemical processes: the impact of sea water intrusions and volatiles in previous geological periods, solutes diffusion from the marine clay, cation exchange and dissolution of carbonate and silicate minerals.


Introduction
Assessment of the results of chemical analyses of groundwater often involves a large number of data, rendering the interpretation and presentation of all the information available to the researcher rather challenging.Multivariate statistical methods are very useful tools in hydrogeochemical research, as they allow for the organization and simplification of large datasets.They are a significant contributor to the establishment of correlations between the analyzed chemical parameters, but also to the assessment of similarities between samples (i.e.groundwater occurrences).
The goal of multivariate statistical methods is to identify the hydrogeochemical processes that govern the formation of groundwater composition.If the geological and hydrogeological characteristics of the aquifer are known, by applying these methods it is possible to determine the origin and circulation pathways of groundwater.Multivariate statistical methods are also used to define migration factors and the distribution of certain elements.They can point out certain anomalies in the chemical composition of groundwater, for example those of anthropogenic nature (HELENA et al. 1999;CLOUTIER et al. 2008;YIDANA et al. 2008;SU-VEDHA et al. 2009).

Study area
In this research 40 occurrences of Serbian groundwater (Fig. 1) were analyzed and a total of 15 chemical parameters were determined for each sample (macro and micro components, temperature and pH).Analyzed groundwaters are of nitrogenous composition, with a relatively low content of carbon dioxide (in most cases < 100 mg/L CO 2 ).Sampled groundwaters belong to different geological formations, comprised of igneous, sedimentary and metamorphic rocks, and the majority of these groundwater occurrences are located in Inner Dinarides (14 samples), Vardar Zone (20 samples) and Serbian-Macedonian Massif (six samples).Geological, structural and hydrogeological conditions in the area of investigated groundwaters are very complex.Different types of the Proterozoic to Paleozoic crystalline schists are present, and also varieties of Paleozoic and Mesozoic sediments, granitoide intrusions and the Tertiary volcanic rocks, and also characteristic oceanic elements (DIMITRIJEVIĆ 1995).Analyzed groundwaters are from different types of aquifers formed in these rocks, with the predominance of fracture aquifers.
Factor analysis was applied to this dataset to identify the dominant hydrogeochemical factors and processes that lead to the formation of the groundwater composition.

Methods
Factor analysis was applied to a set of hydrogeochemical data comprised of 15 measured chemical composition parameters of 40 groundwater samples collected in Serbia.The concentrations (in mg/L) of the following elements were analyzed: calcium, magnesium, sodium, potassium, chlorine, hydrocarbonate, sulfate, silicon, fluorine, boron, lithium, strontium and carbon dioxide, as was temperature (°C) and pH.IBM SPSS Statistics 19.0 software was used for statistical analysis.
Elementary statistical quantities (arithmetic mean, minimum and maximum values, median, etc.) were determined for the analyzed set of the hydrochemical data.All the variables were subjected to ln-transformation (computation of natural logarithm of all the analyzed data).The transformed data complied with the normal distribution criterion, corroborated by the Kolmogorov-Smirnov test.
The number of the extracted factors was determined based on the Kaiser criterion (KAISER 1960), according to which only those factors whose eigenvalue (characteristic value of the correlation matrix) is greater than one are taken into account.This was consistent with Cattell's scree plot, where factors constituted the X axis and their eigenvalues the Y axis.The curve was cut-off at the point of inflexion and the portion of the curve that exhibited a less steep decline was discarded (CATTELL 1966).To facilitate interpretation of the extracted factors, varimax orthogonal rotation was applied to enhance the contribution of significant variables and reduce that of less significant ones (HELENA et al. 1999;FIELD 2005).

Results
Based on the elementary statistical quantities shown in Table 1, it was concluded that the concentrations of most of the measured parameters did not follow normal distribution.Their distribution histograms were positively skewed, as indicated by distinctly positive coefficients of asymmetry (Table 1).For this reason ln-transformed data were used in factor analysis.
The application of factor analysis to the set of 15 variables (i.e. chemical parameters) determined for 40 groundwater samples produced four factors that to-gether accounted for 73.9% of the total variance of the analyzed data.Table 2 shows the extracted factors, their factor loadings and the attributed percentage of the variance.Factor loadings represent coefficients of correlation between the variables and factors or, in other words, they indicate the relative contribution of a certain variable to each of the extracted factors (FIELD 2005).In this example, only the factor loadings whose absolute value was greater than 0.5 (bolded values in Table 2) were interpreted (STEVENS 1992).It was apparent that several variables exhibited high loadings on each factor, such that the 15 initial variables were classified into four groups, depending on their mutual similarity, to facilitate subsequent interpretation.
The first two factors accounted for nearly 50% of the variance, while the third and the fourth factors accounted for 13.3% and 11.6%, respectively.The first factor featured very high positive loadings of B, Na + and Cl -(> 0.85), as well as high positive loadings of K + , Li + and HCO 3 -(> 0.6).The relatively high loading of F -(0.495) should also be noted.The second factor was characterized by high positive loadings of Ca 2+ , Sr 2+ , Mg 2+ and CO 2 , as well as a high negative loading of pH, where the loading of HCO 3 -(0.473)should not be disregarded.All this is also shown in Fig. 2, where the factor loadings of all variables were plotted: the X axes represents factor 1 (left) and factor 3 (right), the Y axes represents factor 2 (left) and factor 4 (right).The variables that dominate each factor are apparent (marked by the ellipse).
The third and fourth factors accounted for the smaller portion of the variance.This was attributed to hydrogeochemical processes of a more local nature, which take place only in a certain number of groundwater occurrences (CLOUTIER et al. 2008).The third factor was characterized by high positive loadings of temperature and SiO 2 .The fourth factor was dominated by SO 4 2-, but the factor loadings of CO 2 and F - were also relatively high.

Discussion
If the extracted factors are viewed in a geological (primarily lithological) context, it is possible to gain insight into the main hydrogeochemical processes that lead to the formation of the chemical composition of the analyzed groundwater.In factor analysis, often all or at least the main factors are assigned conditional names, indicative of the variables that dominate the given factor.The first factor was dominated by B, Na + , Cl -, K + , Li + and HCO 3 -, such that this factor could be called "natural mineralization" because it contains Na + , Cl -, K + and HCO 3 -that represent the ions of the basic chemical composition.Very high positive loadings of B, Na + and Cl -(> 0.85) in the first factor were attributed to the groundwater mixing with seawater in the geological past, but also to the solutes diffusion from the clays of marine origin (CLOUTIER et al. 2008;REIMANN & BIRKE 2010).Another possible process is cation exchange between Ca 2+ and Mg 2+ from the water and Na + from the aquifer matrix.Namely, as carbonate minerals dissolve, the groundwater becomes enriched with calcium, magnesium and hydrocarbonates, followed by the previously mentioned cation exchange, such that Ca 2+ and Mg 2+ concentrations in groundwater decrease while the Na concentration increases.This theory was supported by the negative factor loadings of Ca 2+ and Mg 2+ , and the positive factor loadings of Na + and HCO 3 - (GUO et al. 2007, CLOUTIER et al. 2008, SALIFU et al. 2011).The positive loadings for boron, potassium, lithium and fluorine of the first factor should also be noted, and they were attributed to paragenesis of these microelements and their similar hydrogeochemical behavior.
The second factor featured elevated positive loadings of Ca 2+ , Sr 2+ , Mg 2+ and CO 2 , and an elevated negative loading of pH.Here too, HCO 3 -needed to be taken into consideration.This factor can be called the "carbonate factor" because the dominant variables indicate the processes of dissolution of carbonate minerals.The presence of carbon-dioxide tends to render groundwater aggressive and enables the dissolution of calcite, dolomite etc., whereby Ca 2+ , Mg 2+ and HCO 3 -ions are released into the groundwater.This is consistent with the high positive loadings of Ca 2+ , Mg 2+ , CO 2 and HCO 3 -.The process takes place in an acidic environment, where the concentration of CO 2 and the pH level are inversely proportional, resulting in a negative factor loading of pH.The high positive factor loading of strontium was attributed to its paragenesis with Ca 2+ .These two elements are chemical- ly similar and Sr 2+ is therefore a frequent ingredient of Ca 2+ minerals (HITCHON 1999).
The third factor highlighted the loadings of temperature and SiO 2 , attributed to the fact that the solubility of silicate minerals increases with increasing temperature (MATTHESS 1981), such that this factor could be called the "silicate factor".The fourth factor featured elevated loadings of SO 4 2-, CO 2 and F -.This association is indicative of the volatiles from volcanic activity in the geological past and the factor was given the name "volcanic volatiles".

Conclusions
Factor analysis is an efficient tool for assessing hydrogeochemical data because of the high data variance caused by a series of geological, hydrogeological and other factors.It enables the identification of the correlations between the analyzed chemical parameters and also their grouping into factors based on similarity, which facilitates subsequent interpretation.In the present case study, factor analysis was applied to extract four dominant factors that accounted for most of the variance (73.9%) of the input dataset, which consisted of 15 chemical parameters measured on 40 groundwater samples from Serbia.The interpretation of obtained factors has indicated several hydrogeochemical processes: the effects of a marine environment and volcanic volatiles in the geological past, the solutes diffusion from the clays of marine origin, cation exchange, and the dissolution of carbonate and silicate minerals.The results uphold the significance of multivariate statistical analysis in the determination of groundwater genesis, or of the factors and processes that govern the formation of the chemical composition of groundwater.

Fig. 2 .
Fig. 2. Plot of factor loadings for the first and the second factors (a) and for the third and the fourth factors (b).The variables that dominate each factor are marked by the ellipse.

Table 1 .
Elementary statistical quantities for the 40 groundwater samples.

Table 2 .
Factor loadings and percentage of variance explained by the four extracted factors, with varimax rotation (values in bold represent loadings with absolute values > 0.5).