ISOLATION AND STRUCTURAL ANALYSIS OF A GENE CODING FOR A NOVEL TYPE OF ASPARTIC PROTEINASE FROM BUCKWHEAT SEED ( FAGOPYRUM ESCULENTUM MOENCH )

A novel type of aspartic proteinase gene was isolated from the cDNA library of developing buckwheat seeds. This cDNA, FeAPL1, encoded an AP-like protein lacking the plant-specific insert (PSI) domain characteristic of typical plant aspartic proteinases. In addition the corresponding genomic fragment was isolated. It is demonstrated that this gene does not contain introns. Since bioinformatics analysis of the Arabidopsis genome showed that most potential AP genes are intron-less and PSI-less, it appears that “atypical” is an inappropriate word for that class of AP. Isolation of this specific buckwheat gene among the small group of those isolated from other plant species provides a new perspective on the diversity of AP family members in plants.


INTRODUCTION
Proteolytic enzymes are intricately involved in many aspects of plant physiology and development and therefore are the subject of intensive research interest.As one of the major catalytic classes, aspartic proteinases (EC 3.4.23)are widely distributed, not only in plants but also among vertebrates, yeasts, nematode parasites, fungi, and viruses.According to the MEROPS database (http://merops.sanger.ac.uk/), plant APs are distributed among families A1, A3, A11, and A12 of clan AA; and family A22 of clan AD.The majority of plants APs belong to the A1 family, together with pepsin-like enzymes of many origins.
Plant APs have been detected or purified from monocotyledonous and dicotyledonous species as well as gymnosperms (Mutluand Gal, 1999) as proteases which are most active at acidic pH, are specifically inhibited by pepstatin A, and contain two aspartic acid residues indispensable for catalytic activity.They are typically distinguished from their non-plant homologs by the presence of a so-called plant-specific insert (PSI), which is removed during processing and is absent from most mature plant APs.However, there are exceptions to this well-known primary structural organization.These include barley nucellin (Chenand Foolad, 1997), its rice ortholog OsAsp1 (Biet al., 2005), Arabidopsis CDR1 (Xiaet al., 2004), nepenthesins I and II from carnivorous plants (Athaudaet al., 2004), andtobacco CND41 (Katoet al., 2005).A common feature of this group, often refered to us atypical, AP-like, or a novel class of APs, is absence of the PSI domain.Bioinformatics analysis of the Arabidopsis genome sequence revealed 59 AP-like proteins, which provides a new perspective on the diversity of AP family members in plants (Beerset al., 2004).
For the great majority of plant APs, biological functions are still hypothetical and represent a provocative field of investigation.Data related to possible functions come from analysis of the specific expression in certain tissues or under specific conditions, from co-localization studies with putative protein substrates, and from experimental evidence for the processing or degradation of those substrates.Varieties of functions have been pro-

ISOLATION AND STRUCTURAL ANALYSIS OF A GENE CODING FOR A NOVEL TYPE OF ASPARTIC PROTEINASE FROM BUCKWHEAT SEED (FAGOPYRUM ESCULENTUM MOENCH)
posed for atypical APs.For example, CDR1 is an apoplastic aspartic proteinase, overexpression of which causes resistance to virulent Pseudomonas syringae by activating inducible resistance mechanisms (Xia et al., 2004).CND41 is the only DNA-binding aspartic proteinase found in chloroplasts and is involved in degradation of rubisco during leaf senescence (Nakano et al., 1997;Murakami et al., 2000;Kato et al., 2004;Kato et al., 2005).Extracellular nepenthesins have a role in prey digestion (Athauda et al., 2004), while barley nucellin may be involved in nucellar cell death (Chen and Foolad, 1997).
The present paper treats isolation and characterization of cDNA from buckwheat seed encoding for a specific type of aspartic protease.On the basis of its primary structure (characterized by absence of PSI) it should be classified as an AP-like protease.In addition, a genomic fragment containing the corresponding gene and consisting of the coding sequence and the 5' upstream region was also isolated.

Plant material
Buckwheat (Fagopyrum esculentum Moench, cv.Darja) was field-grown in the greenhouse of the Institute of Molecular Genetics and Genetic Engineering, Belgrade.

DNA isolation
Genomic DNA was isolated as described by Dellaporta et al. (1983).Plasmid DNA was isolated using the Qiagen MiniPrep Kit (Qiagen).

Amplification of genomic sequences
The complete coding sequence of FeAPL1 was amplified following a 94°C/3 min PCR reaction (94°C/30 s; 60°C/30 s; 68°C/90 s) x 30 cycles and finall post treatment for 10 min at 70°C with Clontech Advantage™ Polymerase using 100 ng of genomic DNA as the template and primers P1 (5'-ATGCCCACTTCTCTCCTCTTC-3') and P2 (5'-TTAATTTTTGGATCGATCACATTG-3').The sequence of primer P1 was derived from the extreme 5'-end and that of P2 from the extreme 3'-end of the coding sequence of FeAPL1.
To obtain the 5' regulatory region of the FeAPL1 gene, we used the modified RACE method(rapid ampli-fication of cDNA ends) (Chenchik et al., 1995).Adaptorligated genomic DNA (100 ng) partially digested with Eco RV restriction enzyme (10 U/µg DNA at 37ºC for 5 h) was used as the template.For 5' RACE, touchdown PCR amplification (Biometra T1 Thermocycler) was performed in 50 µl of a reaction mixture containing Advantage 2 Polymerase Mix (Clontech), Adaptor primer 1 (Clontech) and gene-specific primers R482 (5'-AGAC-AGTAGGAAAATTTGGTGAGACC-3').Nested PCR was performed with Nested Adaptor Primer 2 (Clontech) and R470 (5' GGAAAATTTGGTGAGAC-CAAGCTGAG-3') primer.The reaction conditions for both PCRs are described in Brkljacic et al. (2005).The PCR products obtained were subjected to Southern blot analysis using the BioPrime ® DNA Labeling System (Gibco BRL) and BluGene ® Nonradioactive Nucleic Acid Detection System (Gibco BRL) with FeAPL1 cDNA as probe.The PCR products that hybridized with the probe were eluted and cloned.

Computer-assisted analysis
Protein sequences were compared using the BLASTP search program (Altschul et al., 1990) and exploring all available sequence databases at the www.ncbi.nlm.nih.govWeb server.Sequence analysis was done using the ExPASy program package (www.expasy.org).Sequence alignment was performed with the ClustalW program.The MatInspector professional (www.genomatix.de),PLACE (www.dna.affrc.go.jp/PLACE/), and PlantCARE (http://intra.psb.ugent.be:8080/PlantCARE/)databases were used for prediction of regulatory elements in the 5' regulatory sequence of the FeAPL1 gene.Prediction of 3D structure of the FeAPL1 protein was performed using the GeneSilico metaserver ( http://genesilico.pl/meta) (Kurowski and Bujnicki, 2003).

RESULTS
We selected and sequenced several clones from the cDNA library of developing buckwheat seed (19-23 DAF stage).Selection was made by substraction of cDNAs coding for the 13S storage polypeptide (Samardzic et al., 2004) and metallothionein (Brkljacic et al., 2005), both over-expressed in the defined developmental stage.One of the selected clones (AY536047) showed amino acid homology of its deduced sequence with a novel class of aspartic proteinases and it was named FeAPL1.
The nucleotide sequence of FeAPL1 consisted of a 1344 bp long open reading frame (coding for 447 amino acids) and 63 bp and 51 bp of 5' and 3'-UTRs, respectively, followed by a poly(A)-tail.The polypeptide deduced from the FeAPL1 coding region predicted Mw of 48.6 kDa and an isoelectric point (pI) of 8.78.Analysis of the deduced amino acid sequence of the FeAPL1 identified the 92 Asp and 313 Asp residues in the DTG/DSG activesite sequence motif characteristic of APs, but unlike most so far identified plant aspartic proteinases, it lacked a plant specific insert (PSI).A hydrophobic signal peptide in the N-terminal region with a predicted cleavage site between amino acid positions 20 and 21 was identified.Twelve Cys residues, in most cases conserved in atypical plant APs, as well as the so called "flap" Tyr residue ( 168 Tyr corresponding to 75 Tyr in the porcine pepsin numbering) conserved in all animal and plant APs, were also noticed.Four N-glycosylation sites were predicted at positions 113 (NCTF), 130 (NKSS), 154 (NCST), and 357 (NITG).Comparison of the deduced amino acid sequence of FeAPL1 with the GeneBank and EMBL databases revealed the highest similarity with other known AP-like proteins, such as nepenthesins (NEP I and II), Arabidopsis CDR1, tobacco CND41, barley nucellin, and rice OsAsp1 (Fig. 1).Similarity was also notable with typical APs and with microbial and animal APs, but it was restricted to the active site domains.An NCBI CD search revealed the conserved domain structure, which can be classified as KOG1339, the same as for porcine pepsinogen, cathepsin E, chicken pepsinogen, cow chymosin and candidapepsin 2. Low similarity with atypical APs was clearly evident (Table 1).
Prediction revealed the tertiary structure of FeAPL1 (shown in Fig. 2) to be common to those of other enzymes belonging to the aspartic proteinase family whose crystallographic structure has been determined (Kervinenet al., 1999).It consisted largely of β-sheets organized in two lobes divided by a deep active-site cleft, which contained the catalytic aspartates.A six stranded βsheet covered the bottom of the cleft.A flexible loop  The FaAPL1 cDNA sequence allowed us to develop a strategy for isolation of the corresponding gene from buckwheat genomic DNA.A genomic fragment gFeAPL1 was obtained by PCR using P1 and P2 primers derived from FeAPL1 as described in the Material and Methods section.The isolated genomic sequence was identical in length to the cDNA sequence amplified with the same primers, indicating that the FeAPL1 gene contained no introns.This was confirmed by sequencing.
In order to isolate the 5'-regulatory region of the corresponding gene, a modified 5'-RACE approach was applied using a gene specific primer designed from the 5' end of FeAPL1.The longest PCR product was cloned and sequenced.The sequence covered 747 bp overlapping with FeAPL1 and an additional 1071 bp of the 5'-upstream noncoding region (DQ241824).Computer analysis of the 5'-regulatory region predicted three putative TATA boxes at positions 52 bp, 166 bp, and 343 bp relative to the translation start site.Regulatory sequences that could be involved in different responses [hormonal (ABRE, ERE), light (Gbox, RITA1), and pathogen defense (W box)] and elements responsible for seed-specific expression (DOF1, GCN4, SPF1, SEF3, SEF4) were found using overlapping data from three different data-bases.

DISCUSSION
Here we report the isolation from buckwheat of fulllength cDNA coding for a specific type of aspartic proteinases, one that could be relegated to the category of the much less explored plant APs characterized by the lack of a PSI region.
The new class of APs is currently represented by only a few members, but information on this family is increasing.Interestingly, bioinformatics analysis of the Arabidopsis genome showed that most of the potential AP genes do not contain a PSI domain (Faroand Gal, 2005).Thus, it appears that "atypical" is an inappropriate word for this class of AP.On the other hand, these remarks raise the question as to why PSI-containing APs are more often discovered in plants if they are not the most abundant.This may be related to their function in highly investigated processes such as seed germination and flowering.It is also possible that they are expressed in greater amounts and thus more easily detected.Another possibility could be that all predicted Arabidopsis AP genes do not produce active enzymes or are pseudogenes.According to the MEROPS database, the majority of plant APs belong to the pepsin-like (A1) family.The plant A1 proteases (with or without PSI) from Arabidopsis have been annotated and grouped into five distinct subfamilies according to their gene structure and amino acid phylogeny.The largest group, A1-1, consists of mostly intron-less genes.The nucellin-like genes, belonging to group A1-2, have seven introns, while twelve introns are found in group numbers of A1-4, most of wich are related to barley phytepsin.Group A1-4 is the only one whose members contain a PSI domain (Beerset al., 2004).A grouping with similar putative domain organization and active site sequence motifs has recently been described (Faroand Gal, 2005).Thus, intronless and PSI-less FeAPL1 should be classified as belonging to group A1-1.Comparison of the deduced amino acid sequence of FeAPL1 with those of the rest of PSI-less APs showed lower similarity compared to typical plant APs.Clearly, structural similarity is not obviously related to the homology of function, as versatile functions were found for both classes of APs (Simõesand Faro, 2004).
The isolated FeAPL1 cDNA will enable us to produce recombinant protein.This is important as a way of obtaining information on its biochemical properties, substrate specificity, and mechanism of activation.Also, recombinant protein will serve as a source for producing specific antibodies which can be used to determine protein localization within tissues and colocalization with potential substrates.These results, together with analyses of expression in different physiological processes and those provoked by stress conditions, could be important steps towards clarifaing the function of FeAPL1.In view of the fact that the functions of plant aspartic proteases are elusive, this will be an important contribution to the resolution of that biological puzzle.

Fig. 2 .
Fig. 2.Tertiary structure of FeAPL1 protein predicted by molecular modeling with the GeneSilico MetaServer.Catalytic 92 Asp and 313 Asp, conserved 168 Tyr in the flexible loop (flap) and six-stranded β-sheets at the bottom of the cleft are shown.

Table 1 .
Similarity between FeAPL1 and PSI-less aspartic proteinases from other plant species, shown as the percentage of identical amino acids.