IN SILICO ANALYSIS OF TRANSCRIPTION FACTOR BINDING SITES IN PROMOTERS OF GERMIN-LIKE PROTEIN GENES IN RICE

Germins (GERs) and germin-like proteins (GLPs) play important roles in responses to various stresses; however, their function is still not fully understood. Significant insight into their function can be obtained by analyzing their promoters. In the present study, the 5’ upstream promoters (1000 bp) of 43 Asian rice (Oryza sativa var. Japonica) GLP genes were retrieved from the Plant Ensemble, based on the Rice Annotation Project database (RAP-DB). Phylogenetic analysis via MEGA6 showed a narrow genetic background (0.2%) with a Tajima neutrality value (π) of 0.69. Overall, 4234 transcription factor (TF) binding sites (TFBSs) were found on chromosomes 1, 2, 3, 4, 5, 8, 9, 11 and 12 via “MatInspector” from 90 different TF families using a total of 444. Common TFs and DiAlign analyses showed that Arabidopsis homeobox protein (AHBP), MYB-like proteins (MYBL) and vertebrate TATA-box-binding protein (VTBP) were the most abundant, common and evolutionarily conserved elements in the upstream region from 0 to -800. Finding their mutual interaction via Farmworker analysis uncovered three new cis-regulatory modules (VTBP_VTBP, MYBS_MYBS, and AHBP_VTBP), which appear to be decisive for OsGLPs regulation. In silico functional analysis via ModelInspector revealed 77 cis-regulatory modules, each comprised of two elements, among which DOFF_OPAQ_03 and GTBX_MYCL_01 were the most frequent and mostly found on chromosome 8 and 12, indicating that the combinatorial interaction of these elements has a fundamental role in various biological processes. The study revealed the importance of these elements in regulating the expression of OsGLPs that will help in predicting the role of these genes in various stresses and can have application in biotechnology.


INTRODuCTION
Germin (GER) was initially identified in the wheat embryo as a germination-specific marker [1] and later recognized as an oxalate oxidase.Proteins with an average similarity of 50% with GER were referred to as germin-like proteins (GLPs).GERs and GLPs constitute a diverse and ubiquitous families of plant glycoproteins known as the cupin superfamily which is involved in many developmental and stress related processes [2].They possess a single or combination of enzymatic activities, including oxalate oxidase (OXO), superoxide dismutase (SOD), ADP glucose pyrophosphatase/phosphodiesterase (AGPPase) and polyphenol oxidase (PPO), which either act as structural proteins or participate in signal transduction through their receptor function [3].A GLP from wheat leaf apoplast was reported for its ability to in-hibit serine protease [4].The roles of GERs and GLPs in the development of leaf, root, fruit, seed and floral senescence, defense against various biotic (bacteria, viruses, fungi, insects, nematodes, parasites), abiotic (salinity, drought, cold, heat, metal, nutritional) and physical stresses, have been validated [3,5,6], but their functions are not fully understood.However, modern bioinformatics tools provide an opportunity to obtain insight into their functions by analyzing important molecular components that control their spatiotemporal regulation.
In this context, promoter analysis is an important step towards an improved understanding of gene functioning and regulation and is considered a prerequisite for the development of resilient crops through genetic modification.Plant promoters that direct high level of gene expression induced by various stresses are critical for the application of crop biotechnology [7].Further, these promoters can be used for achieving tissue-specific expression against various stresses.The protein binding sites of promoter and corresponding transcriptional factors (TFs) are crucial for transcription and regulation [8].Accurate spatiotemporal regulation of gene expression is vital for developmental and environmental adaptation of an organism, which is in large part accomplished by cis-elements acting as binding sites for TFs [9].Analysis of these individual elements in the promoter and their combinatorial effects can improve our understanding of gene expression.Previously, numerous databases and softwares have been used for in silico promoter analysis to predict its role in gene regulation against various stresses.Chawade et al. [10] developed a putative cold acclimation network in Arabidopsis using microarray data, known promoterbinding sites and corresponding TFs.Similarly, analysis of AtCHS7, AtCHS8 [11] and sucrose transporter gene promoter families of Arabidopsis and rice [12] were performed to investigate potential TFBSs using PLANT CARE, PLACE and MatInspector (a Genomatix software suite).A similar approach was adapted for the OsGα subunit (RGA1) [13] and AtPrx gene (Arabidopsis thaliana peroxidases) promoters.Similarly, using information about putative TFBSs, the role of the 276bp promoter region in tissue-specific expression and development was predicted and verified for HvGERB and HvGERF gene promoters [14].Likewise, the roles of ZmGLP1 and EgGLP promoters in the control of circadian rhythm-oscillated pattern [15,16], PcGer1 in various hormonal stresses [17], TaGLP3 in powdery mildew [18] and HvGer4c and AtGLP13 in pathogenicity [19,20] were first predicted by TFBSs analysis using various bioinformatics tools and subsequently verified.Similarly, due to the presence of seed-specific TFBSs, the BnGLP gene promoter was used to direct and enhance the accumulation of omega-3 long chain molecules by achieving seed-specific expression in transgenic Arabidopsis [21].
In rice, in silico analysis of OsRGLP1 [22], Os-RGLP2 [23] and 52GLP gene promoters from various plant species including rice [24,25] has predicted the existence of numerous TFBSs that participate in responses to wounding, dehydration, light responsiveness, dark-induced senescence, stresses (pathogen and salt), pollen-specific expression, plant growth regulators and elements related to seed storage proteins, etc., with the roles of OsRGLP2 in the response to wounding, dehydration stress and pathogenicity confirmed [26,27].Thus far, no comprehensive study has been conducted on rice GLPs promoters, TFBSs and their putative roles in various processes that could predict their functioning.In view of the importance of promoter analysis and its role in the functional predictability of genes, the current study was designed to analyze all monocupin GLP gene promoters of Asian rice (Oryza sativa.var Japonica) found in the Ensemble database(s) on 30 October 2015, with the aim of identifying the TFBSs in these gene regions and predicting GLP gene functions in rice by applying appropriate bioinformatics tools.

Data retrieval
Forty three OsGLP gene promoters were retrieved from the Asian rice (Oryza sativa ssp.Japonica) genome using the online server of Plants Ensemble (http://plants.ensembl.org/index.html),based on the information obtained from the Rice Genome Annotation Project Database (http://rapdb.dna.affrc.go.jp/), including two already analyzed promoters of OsRGLP1 [28] and OsRGLP2 [23,26] for better comparison.The size of each promoter was purposely picked as a contact figure of "1000" for uniformity.

Phylogenetic analysis
Phylogenetic analysis of the above sequences were conducted using the Molecular Evolution Genetic Analysis 6 (MEGA6) tool [29] by the neighbor-joining tree-making method.Similarly, Tajima's neutrality test of selection was conducted using the same software to find nucleotide diversity.

Analysis of TFBSs
Promoters were searched for putative TFBSs using Mat-Inspector (ver.9.1) [8], with a core and matrix similarities of 1/1, identifying the most frequent and unique cis-elements.Common TFBSs were further searched using the online server of Common TFs with a core and matrix similarities of 0.75/0.75.The position analyses of common TFBSs were performed in Excel (2010).
The sequences were aligned and searched for TFBSs that were common to at least 10 sequences (23%) in the align regions using DiAlign software with a core and matrix similarities of 0.75/0.75.

Module analysis
The common pattern of TFBSs in all studied promoters and their role in gene regulation by mutual interaction were detected by Frameworker software (ver.5.5.8), with a minimum number of two elements in each module.The resulting cis-regulatory promoter modules common to the studied sequences were identified with respect to the organization and relative position of TFBSs using data from MatInspector.In silico functional analysis of the studied promoters was performed by searching predefined, already reported and confirmed functional modules (Plant Modules, ver.5.7) with ModelInspector (ver.5.6.8.7) [30].The different software used for TFBSs analysis (MatInspector, Common TF, DiAlign, Frameworker, and ModelInspector) were provided by the online server of the Genomatix software suite (http://www.genomatix.de/cgibin/eldorado/main.pl?s =78f50a57a64fa8ae8b6532b 5fd0a410e) Genomatix Software, Munich, Germany).

Sequence retrieval
Forty-three 5ʹ upstream promoter regions of GLP genes of the Asian rice (Oryza Sativa ssp.Japonica) genome were retrieved using Plant Ensemble.The name of each sequence, number of base pairs, accession number, chromosomal position and associated reference are given in Table 1.The sequences included two already cloned and computationally analyzed promoter regions of OsRGLP1/OsGLP8-11 [22] and OsRGLP2/ OsGLP8-10 [23] for better comparison.All the OsGLPs promoters belong to the monocupin domain subfamily and are mostly located on chromosomes 3 and 8.

Module analysis
Common TFBSs sharing the same framework of cisregulatory elements were investigated, revealing three novel cis-regulatory modules, of which VTBP_VTBP was the most frequent module, having 56 copies covering 58% of the sequences (25); this was followed by MYBS_MYBS (with 49 copies) and AHBP_VTBP (33 copies).The name, element type, element orientation, parameter used and distance to the next element for each module are shown in Table 3.In silico functional analysis via ModelInspector revealed 77 modules in 33 sequences.Detailed information about modules, including their names, number of copies in each strand, start and end positions, frequency of occurrence and promoters with the highest number of these modules, is presented in Tables 4 and 5.The most frequently occurring modules were DOFF_OPAQ_03 and GTBX_ MYCL_01, occurring in 19 and 11 sequences, respec-        somes 3, 8 and 12, and were mostly related to endosperm-specific expression, dehydration and etiolation, but no such module was found on chromosome 4.An overview of the potential role of GLPs in the light of the predicted modules with respect to their chromosomal position is presented in Table 5.Most of the functional modules were found on the sense strand (41) rather than the antisense strand (36).

Phylogenetic analysis
Phylogenetic analysis of 43 OsGLP gene promoters revealed a narrow genetic background (0.2%), suggest-ing a high similarity that is smaller than the reported value of 31% [24] for seven GLP promoters from different plant species, but concurs with the previous report [25] in which 44 GLP promoters mostly from rice were considered.This could be due to the fact that in our analysis all of the promoters belong to the same species (Oryza sativa ssp.Japonica).Promoters located on the same chromosome shared the highest sequence similarity, which may be due to duplication and representation of the same pattern of cis-regulatory elements, and thus their similar roles in gene expression [31].The phenomenon is more prominent in GLP promoters located on chromosomes 3, 8 and 12, which may be either due to recent or older duplication events, which created highly similar cisregulatory elements that were selectively preserved -end position; the position of each module is given relative to the 5' end; Strand Ori -strand orientation ("+" for is the sense strand, while "-" is the for antisense strand); Chrom.No. -chromosome number; hyphenation represents the same value of the upper cell.as such in order to enable the co-expression of these genes.These results are also supported by earlier studies in OsGLPs [23], GmCHS7 and GmCHS8 gene promoters [11] that reported the existence of common regions in these promoters.However, GLP promoters on chromosomes 1, 2, 4, 5, 9 and 11 exhibited variation that could be the result of diversification in their cis-regulatory elements [32], either as a result of selection pressure and changes in the environment, or because of the accumulation of mutations (due to reduced selection pressure), during which TFBSs copies were modified over time by involvement in new and diverse functional pathways, ultimately resulting in diverse expression patterns.The results are also supported by the high Tajima value (0.69), which represented the change in their cis-regulatory elements.All members of chromosome 8 GLPs (cluster 4), which form a separate lineage, displayed a close relationship with each other, suggesting a similar pattern in their cis-regulatory elements.Previously it was shown that the promoter of OsRGLP8-10/OsRGLP2 gene was induced by salt, BAP and wounding stresses when analyzed via promoter-GUS fusion, with prominent expression in the cell wall, cell membrane, cytoplasm, vein and interveinal area [26].Similarly, most genes of these promoters possess a strong link with the disease resistance pathotype [33], of which OsGLP8-1-12 and OsGLP8-14/OsGLP1 are part of the QTL that provides resistance against rice blast (Magnaporthe oryzae) and sheath blight (Rhizoctonia solani) [2,34].
The close relationship of their promoters points to their functional similarity and thus demands further study against multiple stresses.Similarly, the close relationship of OsRGLP1, OsGLP8-12 and OsGLP8-13 with OsGLP9-1, OsGLP1-4 and OsGLP3-1, respectively, points to their functional similarity.However, the distant relationship between OsGLP8-11, -12 and OsGLP8-13 can be explained by diversification in their cis-regulatory elements.The mechanism of defense provided by chromosome 8 in fungal pathogenicity is conserved among Gramineae members, such as wheat [35], rice [34] and barley [19,36], which need to be properly tested for these promoters.Similar observations were also noted for promoters on chromosomes 3 (cluster 5) and 12 (cluster 1), which points to recent duplication and the presence of similar cis-regulatory elements.However, none of these genes or promoters have thus far been tested against any stresses, but their close relationship with chromosome 8 promoters suggests similar regulatory mechanisms and expression patterns.Of all considered promoters, those located on chromosomes 1 and 5 have a distant relationship to all other, suggesting that they possess distinctive patterns of cis-regulatory elements.

TFBSs analysis
MatInspector revealed considerable diversity in TFBSs, revealing their putative roles in various plant processes.Previously, the roles of OsRGLP1 [22] and OsRGLP2 [24] were predicted by TFBS analysis with PLACE/Signal Scan.However, the present study provides a more detailed analysis.Large number of TFBSs were found on GLP promoters located on chromosomes 3, 8 and 12, which could be the result of clustering and duplication [31].The presence of AHBP, VTBP and MYBL in all promoters suggests that their fundamental roles in regulation are conserved in the upstream aligned region from 0 to -800 bp.Conserved regions were mostly found on the promoters of chromosomes 8 and 12, suggesting their close relationship and similar expression patterns, which is in accordance with the result of the phylogenetic analysis presented in the previous section.These observations not only suggested the importance of these elements from an evolutionary point, "in which nature congregated these elements to a specific region of the GLP promoter in accordance with their demanding function", but also indicated their fundamental role in OsGLPs regulation.AHBP is the most abundant element reportedly involved in embryo, shoot, root patterning, shade growth control, organ fate and stem cell proliferation [37].Most copies of this element were found in OsGLP2-3 (15) and OsGLP3-2 (13) promoters, which shows their importance.Similarly, most of the MYBL elements were found in OsGLP12-3 (10 copies) and OsGLP8-4 (11 copies), which has an important role in GA-regulated expression [38], cotton fiber development [39], endosperm development [40], organogenesis [41], gibberellin signaling [42], seed development [43], BR-induced gene expression, vascular differentiation, senescence, stress responses [44] and nitrate enhancement [39].In the same way, VTBP is critical for promoter activity equally in plants, animals and viruses in gene-specific expression [45].We observed that OsGLP12-4 has 14 copies of this element, showing its crucial role.Interestingly, the highest number of copies of AHBP (-200 to -800 bp), MYB (0 to -400 bp) and VTBP (-600 to -800 bp) elements were congregated at upstream positions (-200 to -800 bp) in the form of clusters, possibly because of increased environmental and selection pressure [46] that led to subfunctionalization and/or neofunctionalization of genes [47].However, these regions need to be examined further by deletion and mutational analysis to confirm their crucial role in gene regulation as has already been reported for HvGerB, HvGerF [14] and AtGER3 gene promoters [48].Aside from these common elements, other important elements include GTBX, which mostly resides in OsGLP3-7 (14 copies) and having role in the light of responsiveness (LRE), senescence [49], drought [50], cold, salt stress [51] and water use efficiency [39].Similarly, the presence of 10 copies of plant-specific NAC transcription factor in OsGLP1-2 points to a role in homeostasis [52] and leaf senescence [53].Likewise, Myc-like basic helix-loop-helix binding factors (MYCL) are involved in controlling light-response, tissue-specific activation of phenylpropanoid biosynthesis genes [54], fruit development [55] and auxin response [56].Analysis showed that OsRGLP1 contained 10 copies of the above-mentioned element.Moreover, several unique TFBSs including EREF and IDRS were also found in OsGLP8-7 and OsGLP1-2 promoters, pointing to its role in the control of the intracellular iron status [57] and floral development [58].Similarly, two copies of the Arabidopsis CDC5 homolog were found in OsGLP1-2 and OsGLP12-4 that are involved in pre-mRNA splicing [59].The presence of unique TFBSs in rice GLPs may define their differential promoter activity which is responsible for distinct gene expression [11].The presence of these novel elements reveals their differential expression and novel functions against various stresses.

Analysis of module
The presence of three novel cis-regulatory modules (AHBP_VTBP, MYBS_MYBS and VTBP_VTBP) in all promoters further confirmed the crucial co-regulated role of VTBP, MYB and AHBP in OsGLP genes expression.The co-occurrence of these elements in such a regular pattern in all promoters points to their fundamental role in OsGLPs regulation.Previously, a similar module (MYCL_MYBL_01) was found to be active during brassinosteroid (BR)-targeted gene expression [44].Similarly, the combined role of MYCS_P1BS and GAMYB_DOF in the regulation of mycorrhizaactivated phosphate transporters [60], seed development and germination was previously validated [43].
Likewise, the synergetic effect of the GC-rich region and TATA box was found to be critical for adam8 promoter activation [45], and the role of DOF and HD-Zip transcription factors was observed to be important in the regulation of cell-specific expression of the Atsuc2 gene (Arabidopsis thaliana sucrose transporter-2 gene) [61].However, a detailed study is needed to further clarify the role of these novel modules in rice GLP genes expression.Further, in silico functional analysis revealed the presence of various functional modules, the highest being DOFF_OPAQ_03 and GTBX_MYCL_01, which have roles in endosperm-specific expression [62], dehydration, etiolation, tuberization and cotyledon-specific expression [63].The observed result is in close agreement with the observed function of GLPs as germination markers [1].OsGLP3-5 and Os-GLP3-6 possess the highest number of these modules, confirming their role in endosperm-specific expression, dehydration and etiolation.Three unique modules, including NACF_LEGB_01, OCSE_DOFF_01 and OPAQ_DOFF_01, that cause iron deficiency-, glutathione S-transferase (GST)-and seed storage-specific expression were identified in OsGLP8-12, OsGLP8-7 and OsGLP3-5 respectively.Other unique modules include AREF_MYCL_01 (found in OsGLP3-4 and OsGLP8-1) and GARP_GARP_01 (found in LOC_Os05g10830, OsGLP3-3) which play a role in BR-and cytokinin-induced expression.Similarly, OsGLP3-1, OsGLP8-1, -6, OsGLP12-1 and OsGLP12-2 each contained 4 modules, revealing their roles in dehydration, endosperm-specific expression and transcription control.The presence of novel modules revealed the functional diversity of rice GLP promoters.The highest number of modules were found in promoters situated on chromosome 8 (30), 3 (23) and 12 (12), which revealed their functional importance related to tuberization-, cotyledon-and endosperm-specific expression, while the presence of several new modules related to hormonal stress and light responsiveness pointed to its diverse role.Most promoters on chromosomes 8 and 12 have nearly the same patterns of cis-regulatory element, revealing their co-regulated role in response to environmental stresses.
This finding is in close agreement with the previous works of different authors that together validate the importance of this region (5185878-7994721) located on chromosome 8 [2,5,23,26,34] in responses to different stresses.This has not only been established in rice but also in other members of Gramineae i.e.Hordeum vulgare, Triticum aestivum and Brachypodium distachyon [34].Overall, the analysis showed that most genes on chromosomes 2, 3 and 12 are regulated by endospermspecific activity, while those of chromosome 8 exhibited more diverse roles, in dehydration, tuberization and in signaling pathways (brassinosteroid, cytokinin, gibberellin, etc.), possibly due to the congregation of specific TFBSs in their promoters.No functional module was found on the promoters of chromosome 4 ((LOC_ Os04g52720), and some other promoters (OsGLP1-1, -2, -3, OsGLP2-3, -4, OsGLP3-2, OsGLP8-2 and Os-GLP9-3)), which may be due to the accumulation of mutations due to reduced selection pressure.A more detailed analysis is presented in Table 5 which provides information about the possible regulatory role of these promoters in different parts of the plant in response to various stresses.

CONCLuSIONS
Recognition of regulatory cis-acting elements is an important step towards an improved understanding of gene expression and its regulatory mechanisms.The presented data show that OsGLP gene promoters are under considerable environmental pressure which has resulted variations in their cis-regulatory elements and phylogenetic relationship.Certain regions (-200 to -800 bp) of these promoters harbor a large number of specific cis-regulatory elements (AHBP, MYBL and TBP) whose interaction appears to be decisive for their regulation.The presence of these elements in the form of functional modules provides evidence for their significant involvement in various fundamental biological processes in response to various stresses.Using the above data, the functioning and expression patterns of these genes and promoters can be predicted to a very high level of certainty, which will pave the way for their use in crop biotechnology.Certain promoters, particularly those located on chromosomes 3, 8 and 12, are of considerable importance and can be used in the development of resistant cultivars against various stresses.

Fig. 1 .
Fig. 1.Phylogenetic analysis of Asian rice (Oryza sativa ssp.Japonica) germin-like protein (GLPs) gene promoters using MEGA6 through Neighbor's joining methods.Numerical values indicate bootstrap support for each node.Bootstrap support values were based on 1000 replicates and are given as percentages.Clade-1 can be distinguished into 5 clusters; each is in a different color.Promoters located on the same chromosome are shown with circles of the same color.Labelled parentheses are used to represent each clade and cluster.

Fig. 2 .
Fig. 2. Positions of common TFBSs found in rice Germin-like protein (OsGLPs) promoters.Each element is represented by a different color.

Fig. 5 .Fig 3 .
Fig. 5.The distribution of MYBL elements in rice germin-like protein gene (OsGLPs) promoters.The positions of elements in graph starts from the 5' end.The red box indicates the highest number of occurrence at two positions, which is equivalent to 0 to -400 bp and -600 to -800 bp relative to the transcription start site.The frequency is given as the number of elements in all sequences.

Fig. 4 .
Fig. 4. Position analysis of vertebrate TATA box binding protein (VTBP) regulatory elements in OsGLPs promoters.The positions in the graph starts from the 5' end.The red box represents the highest number of elements at two positions of OsGLPs promoters, which is equivalent to the upstream region from -0 to -400 bp and -600 to -800 bp relative to the transcription start site.The frequency is given as the number of element in all sequences.

Fig. 6 .
Fig. 6.Conserved positions of MYB (grey), PTBP (blue) and VTBP (pink) in the aligned region of OsGLPs promoters.All conserved elements were found in the upstream region of 0 to -800 bp.

Table 1 .
List of selected germin-like protein genes promoters from Asian rice var.Oryza Sativa ssp.Japonica. S.

Table 3 .
Description of three novel common cis-regulatory modules found in rice germin-like protein (OsGLPs) genes promoters using Frameworker analysis.
S.N. -serial number; Element type -cis-regulatory elements in each module; Matrix sim -conditions used for analysis; Copies -number of copies of each module; Sequences -number of promoters in which a particular module was found (also given as a percentage); Nature -nature of modules: whether they overlap or not.

Table 4 .
Description of the total number of predefined functional modules found in rice germin-like protein (OsGLPs) gene promoters using ModelInspector.S.N.

Module Name Total Copies No of Seq Distribution Function
S.N. -serial number; Total -total number of each module found in all promoters; plus/minus (+/-) -strand orientation ("+"is the sense strand, "-" is the antisense strand); No of seq -number of OsGLPs promoters; Distribution -percentage distribution of the modules in promoters of the plant ModelInspector database.

Table 5 .
Description of the predefined already reported functional modules found in OsGLPs promoters using ModelInspector.