Discriminating cereal and pseudocereal species using a binary system of GC–MS data – A pattern recognition approach

Various cultivars of different cereal and pseudocereal species (9 wheat, 8 barley, 1 rye, 3 oat, 2 triticale, 3 spelt, 12 corn, 3 amaranth and 9 buckwheat cultivar samples) were milled into flour, extracted using n-hexane, derivatized with trimethylsulfonium hydroxide solution, and subjected to GC– –MS analysis. Fatty acid methyl esters and non-saponifiable compounds (phytosterols, α-tocopherol and squalene) were identified by comparing mass spectra with the Wiley MS library. A binary system was applied in further data processing: the presence or the absence of a particular lipid component in each sample was coded with either (1) or (0). Major lipid components that were present in all analyzed flour samples were removed from further data analysis, leaving only those that represent a good pattern to differentiate the flour samples according to corresponding cereal/pseudocereal species. Pattern recognition tools (cluster analysis and principal component analysis) were applied to visualize groupings and separations among the samples. The presented approach enables the rapid differentiation of flour samples made from various cereal/pseudocereal species according to their botanical origin and gluten content, thereby, successfully avoiding exact quantitative determinations.


INTRODUCTION
Cereal grains and the flour made from them present a very important staple food in human daily nutrition, especially in the forms of various bakery products, such as bread, cakes, biscuits and many other. 1 Pseudocereals are increasingly becoming a part of this concept, due to their excellent nutritive properties and the health benefits they provide. 2,3Unfortunately, economically motivated fraudulent actions and food adulterations have been common in food manufacture since ancient times. 4These facts are prioritizing various authenticity testing procedures as valuable methods in consumer protection against fraudulent practices. 5Furthermore, economic aspects and authentication of cereal products is very important in consumer health protection, since consumption of products containing undeclared constituents may cause intoxication or problems, such as allergy in sensitized individuals or gluten intolerance. 6There is a clear trend in the international market towards labeling the products with information about their composition and quality, which brings about the need to develop and standardize analytical procedures in order to confirm the information given by the label and to uncover adulteration. 7Many analytical approaches are being applied, with chromatographic techniques coupled to mass spectrometric detection being of great importance because of their outstanding separation ability and unsurpassed molecular identification capability. 8,9Other analytical techniques that are being used for food authenticity and quality testing include various spectroscopic techniques (UV, NIR, MIR, visible, Raman, fluorescence, NMR and ICP-OES), isotopic analysis, electronic nose, PCR and real-time PCR, enzyme-linked immunosorbent assay and thermal analysis.Among all of these, chromatographic techniques definitely represent one of the most important ones used in food authentication and adulteration. 1,5,10,11The main disadvantage of spectroscopic techniques is their inability to adequately address the individual contributions of mixture components, which often coincide in broad, overlapped and unresolved spectral bands. 126][17][18] Many authors have proposed various methods utilizing a GC--MS technique combined with advanced data analysis for the verification of geographic origin, the discrimination of different species and cultivars, and the detection of adulterants in various foods, [19][20][21][22][23][24][25][26][27][28][29][30][31][32][33][34][35]  Fats and oils are usually analyzed using a gas chromatography instrument coupled with flame ionization detector (GC-FID) or a mass spectrometer (GC--MS) after they have been extracted and derivatized.However, GC-FID always suffers from the presence of co-elution of naturally occurring matrix-interfering compounds and may be insufficiently sensitive.GC-MS, while being able to determine the presence of co-eluting compounds, cannot always accurately quan-tify the amounts. 27Applying the proposed binary system in chemometric data processing, the need for accurate quantification of the identified lipid compounds, as well as the utilization of analytical standards, was successfully avoided.In this manner, the complete analysis procedure was significantly simplified and shortened.Furthermore, the influences of various possible peak integration modes, which depend on the GC-MS instrumentation and available software, on the differentiation results could be completely neglected using this method.
Oil content and lipid distribution (fatty acids and non-saponifiable compounds) within seeds have been precisely analyzed in many cereal and pseudocereal species and their products, with the application of methods utilizing chromatographic techniques, in particular GC and HPLC, combined with various detection systems, [36][37][38][39][40][41][42][43][44][45][46][47][48][49][50][51][52]  Multivariate data analysis is often coupled with data-rich instrumental methods.With respect to food fraud, unsupervised multivariate chemometrics approaches may be used as a powerful data-reduction tool used qualitatively for grouping or classifying unknown samples with similar characteristics. 53,54lthough systems based on GC-MS combined with various multivariate classification and pattern recognition methods have been extensively used to measure lipids and chemical composition in general (e.g., protein, moisture, oil) of different cereal and pseudocereal species, to the best of the authors' knowledge, no studies have been conducted reflecting the use of this method for varietal discrimination and traceability of cereals and pseudocereals.
Considering all these aspects, the aim of this study was to develop an integrated approach utilizing chemometric analysis of GC-MS data in a binary form for the discrimination of experimental flour samples according to their corresponding botanical origin.The flour samples were produced from various cultivars of different cereals (order Poales) and pseudoceral (order Caryophyllales) species, i.e., wheat (Triticum aestivum L.), barley (Hordeum vulgare L.), rye (Secale cereale L.), oat (Avena sativa L.), triticale (Triticosecale Wittm.), corn (Zea mays L.), spelt (Triticum spelta L.), amaranth (Amaranthus L.) and buckwheat (Fagopyrum esculentum Moench.).In order to avoid time-consuming determinations of the concentrations of the lipid compounds, a rapid, semi-quantitative approach was developed by creating binary matrices of the obtained GC--MS data.Pattern recognition techniques, i.e., principal component analysis (PCA) and cluster analysis (CA), were applied to the experimental GC-MS binary data (used as descriptors) to characterize and differentiate the observed samples according to their corresponding botanical origin.

Sampling
Samples of all cereal and pseudocereal species analyzed in this study were obtained from the cultivated, living collection of the Institute of Field and Vegetable Crops "NS Seme", Novi Sad, Republic of Serbia, Table I.All species were grown in the same year and on the same experimental field, thus enabling the comparison to be independent of differences in environmental conditions.Austrija (S1), Eko-10 (S2), Nirvana (S3) Amaranth, Amaranthus L.

Sample preparation
About 10 g of each of the 50 cereal and pseudocereal cultivar samples from Table I were ground using a laboratory mill (falling number 3100, Sweden) and homogenized to obtain a uniform flour sample matrix.Each flour sample (≈ 0.5 g) was accurately weighed and poured into a 12 mL cuvette.The cuvette was then filled with 5 mL of n-hexane and vortexed for 2 min, after which the mixture was centrifuged at 2000 rpm for 5 min.Then 3 mL of clear supernatant of each sample was separated into a 10 mL glass beaker and dried under a nitrogen flow.The residue was first dissolved in 400 μL of dichloromethane, and then 100 µL of 0.2 M trimethylsulfonium hydroxide in methanol (TMSH, Macherey-Nagel) was added, thus performing a derivatization into volatile methyl esters (Macherey-Nagel). 55

GC-MS parameters
Analytical procedure was conducted on a GC-MS system, Agilent Technologies 7890 instrument coupled with MSD 5975 equipment (Agilent Technologies, Palo Alto, CA, USA) operating in the electron ionization mode at an energy of 70 eV.A DB-5 MS column (30 m length, 0.25 mm i.d., 25 μm film thickness, 5 % phenyl methylpolysiloxane polymer, Agilent Technologies) was used.The temperature program was 50-130 °C at 30 °C min -1 and 130--300 °C at 10 °C min -1 .The injector temperature was 250 °C.Helium was used as the carrier gas at a constant flow rate of 0.8 mL min -1 .A split ratio of 1:50 was used for the injection of 1 μL of the sample solutions.

Data processing
The GC-MS data in the form of full-scan chromatograms (Supplementary material) were acquired by Agilent MSD Productivity ChemStation software.Compound identifications involved comparisons of the mass spectra with the Wiley 275 MS database using a probabilitybased matching algorithm (a match quality of 95 % minimum was used as a criterion).

Data analysis
The binary data were analyzed using the Statistica 10.0 (StatSoft Inc., Tulsa, OK, USA) software package.Principal component analysis (PCA) was used to discover the possible relations among measured parameters (variables), while the cluster analysis (CA) was primarily used to identify patterns among individual objects and their groups.In our study, the distance between objects was defined by City-block (Manhattan) distance metrics and the complete linkage method was employed for amalgamation of clusters.

Binary system
The binary system was applied in Table II, to label either the absolute presence (code "1") or the absolute absence (code "0") of a particular lipid component, or to label if the cultivars of a given cereal/pseudocereal species in some cases contained a certain lipid compound, and in the others not (labels 0/1 and 1/0).If a certain lipid compound was detected in less than 50 % of the investigated cultivars of a particular cereal/pseudocereal species, it was labeled with 0/1, and if it was present in more than 50 % of the cultivars, it was labeled with 1/0, Table II.Although the mentioned coding system was applied in Table II to present the lipid profiles of the collected flour samples, only the coding values "1" and "0" were subjected to chemometric analysis.The major lipid compounds identified in the flour samples of every cereal and pseudocereal cultivar analyzed are in bold and they include the methyl esters of the most abundant fatty acids: hexadecanoic (C16:0), 9,12-octadecadienoic (C18:2), 9-octadecenoic (C18:1), octadecanoic (C18:0), docosanoic (C22:0) and tetracosanoic (C24:0) acid.Considering that in binary form they do not represent any differences, having the value "1" in every single case, they cannot be taken into account as a pattern for flour discrimination.These non-influential compounds were, therefore, excluded from further data analysis, leaving only those that have a high impact on the separation between the botanical cultivars of the plant species.Some lipid compounds were found to be specific to a single or two species, mostly corn and amaranth.Thus, 10-nonadecenoic acid (C19:1) is detected just in some cultivars of the corn flour samples, nonadecanoic acid (C19:0) in all samples of corn and amaranth cultivars, 9,10-dihydroxyoctadecanoic acid (DHSA) in all cultivars of oat species and some cultivars of barley, tricosanoic acid (C23:0) in all amaranth cultivar and some wheat cultivars, campesterol (CA) in some cultivars of corn, pentacosanoic acid (C25:0) in all samples of corn and amaranth cultivars, stigmasterol (ST) in some cultivars of corn, γ-sitosterol (γSI) in all amaranth cultivars, α-tocopherol (βT) in all samples of wheat, and ethylcholestanol (ECH) in all samples of amaranth cultivars.
Therefore, the application of very sophisticated but expensive techniques, such as triple quadrupole GC and LC-MS/MS systems, for the selective quantification of the target compounds 17 is redundant and unnecessary in this case, considering that the presented approach does not require compound quantification.Instead, it was enough just to determine if these compounds were present in the sample (applying the code "1") or not (applying the code "0").Therefore, it is sufficient to use more accessible and common single quadrupole GC-MS instrument while performing the analysis.

Cluster analyses
A dendrogram of minor fatty acids and non-saponifiable compounds detected in the n-hexane extracts of different wheat, rye, triticale, oat, barley, corn, spelt, amaranth and buckwheat cultivars using complete linkage as an amalgamation rule and the City-block (Manhattan) distance as a measure of the proximity between the samples is shown in Fig. 1.
A dendrogram of classes of various wheat, rye, triticale, oat, barley, corn, spelt, amaranth and buckwheat cultivars is shown in Fig. 2.
The dendrogram of the GC-MS binary data showed the proper distinction between the investigated species (wheat, rye, triticale, oat, barley, corn, spelt, amaranth and buckwheat), due to a high variability between their genotypes.Cultivars of particularly small grain species (triticale, oat and barley) do not show a complete separation in accordance to the botanical origin and thus, flour samples of some barley cultivars (B2, B3) show strong similarities with samples of triticale cultivars (T1, T2), while flour samples of oat cultivars (O1, O2, O3) present strong similarities with some other cultivars of the barley species (B1, B7, B8).On the other hand, it is obvious that using this method flour samples of every spelt cultivar analyzed (by belonging to the genus Triticum56 it is mostly considered as a genotypic subspecies of common wheat) could be completely distinguished from flour samples of every wheat cultivar and every small grain species, in general.In terms of buckwheat, amaranth, spelt, corn and small grain species (wheat, rye, triticale, oat and barley), the proposed GC-MS method could allow visualization of the intrinsic structure of the data set without a priori assumption about the origin of the samples.It is also important to note here that the samples of gluten-free corn and pseudocereals (amaranth and buckwheat) are clearly discriminated from the gluten-containing small grain cereals (wheat, rye, triticale, barley and oats).57-59However, although spelt taxonomically belongs to the Triticum genus, and is botanically very similar to wheat, the distinction of spelt is very important because of its high quality and price, compared to a common wheat, since spelt is suitable for organic agriculture, and is commonly produced in this way.56

PCA analysis
The PCA of the obtained data explained that the first three principal components accounted for 73.11 % of the total variance in the fifteen variables (29.21 %, 26.35 % and 17.55 %, respectively), Fig. 3.
________________________________________________________________________________________________________________________ Available on line at www.shd.org.rs/JSCS/(CC) 2018 SCS.The samples of amaranth (A region), corn (C region) and buckwheat flour (H region) are grouped together and clearly separated from the flour samples of the other analyzed species.These samples present the non-gluten species analyzed in this study.Samples of flour produced from gluten-containing small grain species: wheat, barley, oats, rye and triticale, could be distinguished from the other analyzed flour samples, by forming a separate group.However, similar to cluster analysis, the principal component analysis was also not able to separate them fully based on botanical origin, due to the high biological relations among them.Samples of spelt flour (S region) are also grouped together but close to the flour samples produced from the wheat varieties W3 and W4, because wheat (Triticum aestivum L.) is the closest botanical relative to spelt (not only Triticum spelta L., but also Triticum aestivum subsp.Spelta L.) among all cereal and pseudocereal species analyzed in this study.

CONCLUSIONS
The derivatized hexane extracts of 50 flour samples made of various cultivars belonging to different cereal and pseudocereal species were subjected to GC-MS analysis.The lipid components detected in the flour samples were coded using the binary system, to signify the presence (1) or the absence (0) of a particular component.Applying pattern recognition tools (cluster analysis and principal component analysis) to the binary matrices of the GC-MS data, differentiations and groupings of flour samples according to the corresponding botanical origin and gluten content were obtained.The study of the experimental flour samples evidenced the tremendous effect the botanical origin had on the lipid profile of the analyzed samples.The presence of minor fatty acids and non-saponifiable compounds (phytosterols, α-tocopherol and squalene) could be used as botanical authenticity marker to establish the differences between flour samples of various cereals and pseudocereals.These results constitute a useful base for developing a system for adulteration detection in cereal and pseudocereal flours used in the production of various bakery products.The proposed semi-quantitative approach excludes the use of analytical standards, which are typically used in this kind of analysis, and time-consuming accurate determinations of the profiling lipid concentrations.

SUPPLEMENTARY MATERIAL
The obtained total ion current chromatograms overlaid by Agilent MSD Productivity ChemStation software according to the corresponding cereal and pseudocereal species are available electronically at the pages of the journal website: http://www.shd.org.rs/JSCS/, or from the corresponding author on request.
TABLE II.Lipid components detected in the investigated cereal and pseudocereal cultivars, the corresponding abbreviations and retention times (t R ) a, b, c, d, e t T O R et al.

Fig. 1 .
Fig. 1.Cluster analysis of the lipid components identified in the n-hexane extracts of the analyzed cereal and pseudocereal cultivars.

Fig. 2 .
Fig. 2. Cluster analysis of the analyzed cultivars belonging to the different cereal and pseudocereal species.

Fig. 3 .
Fig. 3. PCA of the cultivars of the investigated cereal and pseudocereal species and the corresponding lipid components detected, a) projection in the PC2-PC3 plane, b) projection in the PC1-PC3 plane, c) 3D scatter plot and d) projection in the PC1-PC2 plane.

Table S -
I of the Supplementary material to this paper.

Table S -
II of the Supplementary material.

TABLE I .
Various cultivars of cereal and pseudocereal species analyzed in this study