IDENTIFICATION AND CHARACTERIZATION OF GENIC MICROSATELLITES IN CUNNINGHAMIA LANCEOLATA ( LAMB . ) HOOK ( TAXODIACEAE )

Genomic resources for conventional breeding programs are extremely limited for coniferous trees, and existing simple sequence repeat markers are usually identified through the laborious process of hybridization screening. Therefore, this study aimed to identify gene-based microsatellites in the Chinese fir, Cunninghamia lanceolata (Lamb.) Hook by screening transcript data. We identified 5200 microsatellites. Trinucleotide motifs were most common (47.94%) and were followed by tetranucleotide motifs (24.92%). The AG/CT motif (43.93%) was the most abundant dinucleotide repeat, whereas AAG/CTT (25.07%) was the most common trinucleotide repeat. A total of 411 microsatellite primer pairs were designed and 97 polymorphic loci were identified by 8 genotypes. The number of alleles per locus (Na) in these polymorphic loci ranged from 2 to 5 (mean, 2.640), the Ho values were 0.000-1.000 (mean, 0.479), and the HE values were 0.125-0.775 (mean, 0.462). The polymorphic information content (PIC) values were 0.110-0.715 (mean, 0.383). Seventy-two of the 97 polymorphic markers (74.23%) were present within genes with predicted functions. In addition, in genetic diversity and segregation analyses of 16 genotypes, only 5.88% of the polymorphic loci displayed segregation distortion at the p<0.05 level. Transferable amplification of a randomly selected set of 30 genic microsatellites showed that transferability decreased with increasing evolutionary distance between C. lanceolata and target conifers. Thus, these 97 genic markers will be useful for genetic diversity analysis, germplasm characterization, genome mapping and marker-assisted breeding in C. lanceolata, and evolutionary genetic analysis in Taxodiaceae.


INTRODuCTION
Chinese fir (Cunninghamia lanceolata (Lamb.)Hook) is mainly distributed in the tropical and subtropical mountainous areas of China and Vietnam.It is an important tree species for timber production and has been extensively planted in southern China for over 3000 years [1,2].Despite great success in conventional breeding programs [3], highly polymorphic genetic markers remain very limited for coniferous trees [4].
Microsatellites (or simple sequence repeats, SSRs) provide codominant marker types for a wide range of genetic applications [5], but identifying them has traditionally been tedious and labor-intensive, expensive and low-throughput [6][7][8].Thus, to date, only 11 genomic SSRs and 28 EST-SSRs in C. lanceolata have recently been reported [9,10].
In recent years, next-generation sequencing technologies have provided exciting means of developing genic SSR markers for non-model organisms [11].Compared with genomic SSRs, genic SSRs are more likely to be linked to gene loci that contribute to morphological phenotypes [5,11].Moreover, they repre-sent useful tools for marker-assisted selection.These markers can also facilitate evolutionary analyses as they are more commonly shared across related taxa than "anonymous" SSRs [11,12].In the present study, we evaluate the frequency and distribution of various types of genic SSRs and develop polymorphic genic SSR markers as genetic tools for C. lanceolata.These novel markers will be useful in future genetic studies and breeding applications in the conifer.

Plant material and data analysis
In this study, a set of eight unique individuals (Table 1), including one sample of C. konishii (=C.lanceolata.var.konishii), from natural populations and a first-generation seed orchard, was used for an analysis of SSR allele diversity and polymorphism screening.The number of alleles (Na), observed heterozygosity (H o ) and expected heterozygosity (H E ) were calculated using POPGEN 1.32 [16].Polymorphism information content (PIC) was derived according to the following formula [17]: where n is the number of alleles at one locus; Pi and Pj are the frequencies of the ith and jth alleles at one locus; and j=I +1.
All unigenes containing polymorphic microsatellites were queried against the GenBank nonredundant protein database using BLASTX with an expected value (E-value) of 10 −5 for the function of genic SSRs.An additional set of 14 progeny from a "D110" ×"6421" cross and their parents were applied for analysis of SSR allele segregation (Table 1).Chisquare tests were used to analyze the allele segregation of 97 loci by JoinMap 4.1 [18].Then, the cluster analysis of all polymorphic loci and all 24 genotypes was conducted based on Nei's unbiased measures of genetic distances [19] using POPGEN 1.32 and MEGA 5 [20].To investigate the transferability of SSR markers in related species, 30 random markers selected from 97 polymorphic SSR markers were used to amplify the genomic DNA from another six species − two Taxodiaceae (Metasequoia glyptostroboides and Glyptostrobus pensilis), two Pinaceae (Pinus massoniana and Cedrus deodara), one Cupressaceae (Platycladus orientalis), and one Cephalotaxaceae (Cephalotaxus fortunei).
All genotypes were conserved at the National Forest Germplasm preservation base of C. lanceolata in Yangkou Forest Farm, Fujian Province, China.Genomic DNA from fresh needles from trees of each genotype was extracted using the cetyltrimethylammonium bromide method [21].

Genic SSR marker amplification
Each PCR reaction mixture (10 μl) contained 1μl 10× reaction buffer (100 mM Tris-HCl, pH 9.0, 100 mM KCl, and 80 mM (NH 4 ) 2 SO 4 ), 2 mM of total dNTP, 0.2 μM each of the forward and reverse primers, approximately 30 ng genomic DNA, 0.5 U Taq DNA polymerase (TaKaRa Biotechnology, Dalian, China), and 1.25 mM MgCl 2 .PCR was performed in a Veriti 96-well thermal cycler (Applied Biosystems, Foster City, California, USA).Samples were incubated at 94°C for 5 min, followed by 20 touchdown cycles, first at 94°C for 45 s, then at Tm+10°C for 45 s (with a 0.5°C reduction in each subsequent cycle), and finally at 72°C for 1 min.Next, the samples were subjected to 20 cycles at 93°C for 45 s, then at Tm for 45 s, and finally at 72°C for 1 min.A final 10-min extension at 72°C was then performed.Tms are listed in Table S1 for different primers.
Fragments resulting from PCR amplifications were detected using 8% polyacrylamide gel elec-trophoresis (1×TBE buffer at 200 V for 1.5 h).The sizes of bands were compared with those of a 50-bp standard DNA ladder (TaKaRa Biotechnology, Dalian, China).For further verification of the accuracy of novel genic SSR polymorphisms, the forward primers of 22 selected SSR loci were labeled with one fluorescent dye (FAM or HEX) at the 5ʹend and tested in eight individuals.Then, we performed capillary electrophoresis using an ABI3730xl DNA Automatic Analyzer with a GeneScan-500LIZ size standard (Applied Biosystems).Based on data results, allele sizes were determined using GeneMaker software (Soft Genet-

RESuLTS AND DISCuSSION
A total of 5200 putative SSRs were identified in 4470 sequences from 62895 unigenes using MISA.SSRs occurred at an overall transcript density of 163.79 SSRs/ Mbp.The low density of SSRs is similar to previous SSR studies that have generally measured lower SSR frequencies for conifers compared with other plants [22,23].It is possible that the low SSR density in conifers is associated with the evolutionary rate and/or adaptive evolution level in large, long-lived conifer trees, leading to slow substitution rates and retention of beneficial genomic blocks and/or mutations [24][25][26].
Among the mononucleotide repeats, A/T was dramatically overrepresented (n=83; 97.6%).The most abundant dinucleotide repeat was AG/CT (n=152; 43.93%), followed by AT/AT (n=132; 38.15%).Conversely, CG/CG was not found in the C. lanceolata transcriptome (Fig. 2A).This is consistent with most angiosperms studied, with AG/CT being the dominant EST-SSR dinucleotide repeat [28][29][30], but differs from most conifer research, including in the loblolly pine [22,23], spruce [22], and sugi [11], in which AT/ AT was the most abundant dinucleotide repeat.When comparing with AT/AT in most angiosperms, a great-er percentage was observed in C. lanceolate.A trend of a decline in the frequency of AT/AT from gymnosperms to angiosperms was observed.However, the specific functions of SSR motifs within a specific plant genes/genome remain poorly understood.
A total of 411 primer pairs were designed.Overall, 97 of the 411 primer pairs were polymorphic among eight genotypes, including C. konishii (Table S1 and  S2).The other primer pairs were monomorphic or gave no product; therefore, we excluded them from further analysis.The polymorphism was observed in 28 EST-SSRs [11] and 10 genomic-SSRs [10] for C. lanceolata.By comparing with previously reported SSRs, all the 97 polymorphic makers were novel primer pairs of new loci, and all were submitted to the NCBI Probe databases (ID: from Pr032066750 to Pr032066846 ).This demonstrated that our method is more efficient for polymorphic EST-SSR development.The high rate of successful amplification of these 97 polymorphic primers in C. konishii was consistent with previous results describing only a few variations between C. konishii and C. lanceolata [31].
To verify the accuracy of novel genic SSR polymorphisms after polyacrylamide gel electrophoresis, 22 SSR loci were randomly selected for both capillary electrophoresis and automatic sequencing.In general, the results of three methods were similar (Fig 3).The PCR product size of only one locus (CFeSSR98 in Fig. 3A) from capillary electrophoresis appears smaller than that shown in the photograph of polyacrylamide gel electrophoresis.Fifty-four amplicons of 22 SSR loci were successfully sequenced with genic SSR primer pairs.Multiple sequence alignment reveals the occurrence of a variable number of repeat motifs in different amplicons of allele and expected sequences along with a few point mutations and insertion/deletions (Fig. 3IV).
Of the 97 polymorphic genic SSRs associated unigenes, 72 (74.23%) shared significant homology to known functions, whereas 25 (25.77%) had no significant match (Table S2).These results were in general agreement with previous work that demonstrated that 30%-49% of conifer genes had little or no sequence similarity with plant genes of known function [24,32].These genic SSR markers will be more likely to be linked to gene loci that contribute to morphological phenotypes than genomic SSR markers [5,10,12,33].
The number of alleles (Na) per locus of 97 markers (Table S2) ranged from 2 to 5, with an average of 2.64 alleles.In addition, the H o values were 0.000-1.000with an average of 0.479, and the H E values were 0.125-0.775with an average of 0.462.Lastly, the PIC values were 0.110-0.715with an average of 0.383.Similar observations were made in previous studies of Cryptomeria japonica, for which the average PIC value was 0.33 [11].However, the PIC was lower, with an average of 0.573, in previous EST-SSR research [10].Differences in identified polymorphisms may be due to different marker techniques or from different plant populations.However, nearly 32% of the markers in a genetic diversity study contained high levels of genetic information according to the suggested criteria of high (PIC>0.5),moderate (0.25<PIC<0.5)and low (PIC<0.25)[34].Of the remaining markers, 47% contained moderate and only 20% contained low levels of  genetic information.Thus, the SSR loci in this study have higher genetic information content than single biallelic markers, such as single nucleotide polymorphisms (SNPs) [35].
Twenty-nine loci were monomorphic in the segregation analyses of investigated population.In a chisquare test, only 5.88% of the 68 polymorphic markers undergo segregation distortion at the p<0.05 level (Tables S2).A dendrogram showed that the 24 C. lanceolata individuals fell into two distinct clusters (Fig. 4).In one cluster, 14 offspring were clustered with two parents.The other cluster consisted of 7 unique individuals from natural populations and C. konishii.Similar results were also observed for amplified fragment length polymorphism (AFLP) data from C. lanceolata and C. konishii [31].
Considering the relatively high polymorphic level, low segregation distortion rate, stable codominance and reproducibility, genic SSR markers in the present study will be useful for marker-assisted selection, conducting linkage mapping, quantitative trait locus (QTL) mapping and population genetic studies to improve breeding of C. lanceolata.
Transferable amplification of a randomly selected set of 30 genic SSRs showed that 30.0%-33.3% of SSRs could be amplified in Metasequoia glyptostroboides and Glyptostrobus pensilis, 16.67%-20.0%could be amplified in Pinus massoniana and Cedrus deodara, whereas 6.67% could be amplified in Platycladus orientalis and Cephalotaxus fortunei (Tables S3).In general, transferability decreased with increasing evolutionary distance between C. lanceolata and target species.Additionally, these markers will be useful tools for comparative genome mapping and evolutionary studies in conifer species in the future because of the greater transferability of gene-based SSRs compared with anonymous SSRs in related taxa.
Previous methods of SSR discovery have been tedious and labor intensive, expensive, and of low throughput [7,8].Moreover, microsatellite markers from genomic libraries represent only those motifs for which the initial fitting was performed by hybridization or enrichment [6,9,36].In addition, the low density and unique distribution of SSRs in C. lanceolata, as the results of this study show, suggest that it might be more challenging to develop SSR markers for this species by traditional methods.Alternatively, in our study, 97 novel polymorphic microsatellite markers, by mining the EST sequences, have been much more efficiently developed than in a previous study by traditional methods [10] in C. lanceolata.These markers will be beneficial for genetic diversity analyses, germplasm characterization, genomic mapping, markerassisted breeding and evolutionary genetic analysis of C. lanceolata.Thus, our results also confirm that the identification of SSRs from the transcriptome is an efficient method for developing gene-based microsatellites for C. lanceolata.

Fig. 1 .
Fig. 1.Frequencies of the various SSR motifs present in C. lanceolata.

Fig. 3 .
Fig. 3. PCR products amplified by two EST-SSR markers by three comprehensive methods.I: Representative gel showing amplification profiles of microsatellite marker (A: CFeSSR98; B: CFeSSR435) and its fragment length polymorphism among eight unique individuals of C. lanceolata.The amplicons are resolved in 8% polyacrylamide gel along with 50 bp DNA size standard.The order of DNA samples from lane 1 to lane 8 within each primer pair image panel is shown in Table 1.GT: Genotype of amplification.II: Representative Genotype of amplification in I (A: CFeSSR98; B: CFeSSR435) performed by capillary electrophoresis.III: Representative gel showing amplification profiles of microsatellite marker (A: CFeSSR98; B: CFeSSR435) among 14 progeny from a "D110" ×"6421" cross and their parents.SER: SSR allele segregation.IV: Multiple sequence alignment of expected sequences and amplicons of microsatellite marker (A: CFeSSR98; B: CFeSSR435) showing the presence of microsatellite repeat motif.Alignment reveals occurrence of variable number of repeat motifs in different amplicons of allele along with a few point mutations and insertion/deletions.ES: expected sequences; AMP: Amplicons of microsatellite marker and size of amplicons.

Fig. 4 .
Fig. 4. Dendrogram generated using UPGMA cluster analysis based on the genetic diversity of 24 C. lanceolata genotypes.

Table 1 .
C. lanceolata genotypes used in the SSR diversity and segregation analyses.All tree needle samples were deposited in The Key Laboratory of Forest Genetics and Biotechnology of the Ministry of Education at Nanjing Forestry University, Nanjing 210037, China.
a Experimental Forests were supplied by The National Forest Germplasm preservation base of Chinese fir (Yangkou National Forest Farm in Fujian Province, China).