ANALYSIS OF MBOAT FAMILY REVEALS THE DIVERSITY OF MBOAT1 AMPLIFICATION IN SOLANACEAE

Genes containing an MBOAT (membrane-bound O-acyltransferase) domain form a large gene family in plants whose members play important roles in plant triacylglycerol biosynthesis. Among these genes, most belong to the MBOAT1 subfamily. Here we describe the identification and analysis of MBOAT genes in Solanaceae. Through data mining of four sequenced genomes of Solanaceae, we identified 52 MBOAT members. The MBOAT genes fell into four distinct groups, with MBOAT1 subfamily genes accounting for about half of the total number of genes. Several MBOAT1 genes were present in the genomes of hot pepper, tomato and potato, whereas only one was identified in Nicotiana benthamiana. Most of tomato MBOAT1 genes were localized in chromosomes in the form of clusters, which is the same in potato, indicating that the population of MBOAT1 members was mainly the result of tandem duplication. Some tomato MBOAT1 genes were not expressed, and all MBOAT1 genes were devoid of introns and were significantly shorter than other MBOAT members were. While average pairwise Ka/Ks values were significantly lower within the MBOAT1 subfamily, some MBOAT1 genes showed signs of positive selection.


INTRODUCTION
Gene duplications are one of the principle creative forces in evolution (Ohno, 1970).A consequence of duplication is the emergence of a gene family, a group of genes sharing similar sequences and often similar functions (Demuth and Hahn, 2009;Dayhoff, 1976).In plant evolutionary history, genome or chromosome duplications play important roles in gene duplication.More recent gene duplications were the results of short segmental or single-gene duplications.There are other mechanisms of gene duplication, though, such as retroposition and unequal crossingover (Ober, 2010).Different rounds of events of different types of duplications can lead to a divergence in member number of a gene family in different species or different sub-groups within a species, namely copy number variation (CNV) (Korbel et al., 2008;Freeman et al., 2006).
Whole genome sequencing of a number of genomes provides a possibility to identify complete sets of a gene family, providing a foundation for the analysis of relationships of gene family members among species, and inference of the formation of the family or the subgroup (Zhang and Ma, 2012).Recently, members of the membrane-bond O-acyltransferase (MBOAT) family were comprehensively identified in 14 genomes of plant species.MBOAT members include diacylglycerol acyltransferase 1 (DGAT1), which catalyzes the last and committed step of triacylglycerol biosynthesis in plants, and lysophosphatidylcholine acyltransferase (LPCAT), which converts lysophosphatidylcholine into phosphatidylcholine and vice versa (Ichihara et al., 1988;Stahl et al., 2008;Wang et al., 2013).Also, this family contains homologs of Saccharomyces cerevisiae glycerol uptake protein (Holst, 2000;Wang et al., 2013).
Analysis of the MBOAT family led to the identification of a subfamily, which was named MBOAT1 (Wang et al., 2013).It emerged in terrestrial plants, which demonstrated copy number variation (CNV) among species: MBOAT1 genes are low-copy in spore plants, while this subfamily is present in angiosperms (Wang et al., 2013).Among many functionally identified MBOAT genes, only 2 belong to this subfamily, which were identified to have sterol O-acyltransferase and diacylglycerol sn-3 acetyltransferase activity, respectively (Chen et al., 2007;Durrett et al., 2010).
Every MBOAT family member contains an MBOAT domain, which is a section of sequences that is conserved among all MBOAT members.In the Pfam database, its CLAN code is CL0517 (http:// pfam.xfam.org/).Members of MBOAT1 are devoid of introns, while other MBOAT family members contain multiple introns (Wang et al., 2013).
In this article, through comprehensive identification and subsequent analysis of the MBOAT family in four completed genomes in Solanaceae, we observed a vast variation in copy numbers in the MBOAT1 subfamily among four species.Smaller sequence length, clustering of genes in chromosomes and loss of introns led us to propose the reasons for the population and variety of MBOAT1 family members.

Identification of MBOAT gene family members in Solanaceae
Four Solanaceae species with genome-sequences and annotation information available were used for the gene identification: tomato (Solanum lycopersicum), potato (Solanum tuberosum), Nicotiana benthamiana (Sato et al., 2012;Bombarely et al., 2012;Xu et al., 2011) and Capsicum annuum (Kim et al., 2014).General feature format (GFF) files and FASTA files of peptide and coding sequences were downloaded from the FTP server of Sol Genomics Network (ftp://ftp.solgenomics.net)(Bombarely et al., 2011).The FTP file data were preprocessed to delete star symbols and short sequences to avoid problems with the subsequent BLAST searches.For MBOAT1 family member identification, the methods of Wang et al. (2013) were used.Upon identification of genes, peptide and coding region sequences were retrieved.

Sequence length calculation, prediction of peptide molecular weight and chromosome localization of sequences
GFF data of tomato MBOAT sequences were retrieved from the tomato GFF3 file.Based on GFF data, chromosome localization data were retrieved, and intron numbers were calculated.Domain lengths were calculated by running Pfam_scan downloaded from ftp://sanger.ac.uk/.Also, sequence lengths were calculated, and molecular weight of putative peptides were predicted.Chromosome localization information was visualized by MapChart 2.2 (Voorrips, 2002).

In silico expression analysis of tomato MBOAT family genes
Seven files, representing transcriptome sequencing results of seven tissues (root, stem, leaf, flower and fruit at three different maturation stages: maturegreen, breaker and ripe) of S. lycopersicum, were downloaded from NCBI under GEO accession number GSE33507 (Sato et al., 2012), and were combined into a single file.The format of the file was converted from SRA to FASTQ by using FASTQ-dump under an SRA toolkit, and then converted to FASTA format and curated by our in-house Perl scripts.All the MBOAT coding sequences were used to query the above file using BLAT (Kent, 2002).

Identification and phylogenetic analysis of Solanaceae MBOAT sequences
To identify MBOAT sequences in Solanaceae, the peptide sequences of all 16 putative MBOAT members in Arabidopsis, identified by Wang et al. (2013), were queried one by one against peptide datasets using standalone BLAST.Through screening using Pfam_ scan and visual sequence checking as described in Materials and Methods, we retained sequences with the MBOAT domains we expected, and deleted duplicated sequences.In potato, PGSC0003DMT400004759, PG-SC0003DMT400004760, PGSC0003DMT400004761 and PGSC0003DMT400004762 were deleted, which are putatively results of alternative transcription of PGSC0003DMT400004763. Also, PGSC0003D-MT400032625 and PGSC0003DMT400042543 were removed, which are putative alternative transcription results of PGSC0003DMT400032627 and PG- SC0003DMT400042544, respectively.PGSC0003D-MT400034540 was deleted, as well, whose amino acid sequence only showed a 6-peptide difference at the C-terminal with that of PGSC0003DMT400034541.In N. benthamiana, NbS00001605g0003.1 was deleted as its sequence showed 100% identity with Nb-S00001605g0007.1.Finally, 17, 14, 7 and 14 putative MBOAT members were identified in tomato, potato, N. benthamiana and hot pepper, respectively.Thus, a total of 52 putative MBOAT sequences were identified in Solanaceae.HMM searches came to exactly the same conclusion.
Aligned peptide sequences of the 52 Solanaceae MBOAT members, together with 16 Arabidopsis MBOAT genes, were used to conduct phylogenetic inference using MrBayes.By using a WAG model, a well-resolved Bayesian phylogenetic tree was obtained (Fig. 1).As reported by Wang et al. (2013), Solanaceae MBOAT members were classified phylogenetically into four subfamilies: based on how each subfamily of genes was grouped with Arabidopsis isoforms, the 4 subfamilies were named MBOAT1, LPLAT, GUP and DGAT1, respectively, as in Wang et al. (2013) (Supplemental Fig. 1).Except for tomato and potato GUP genes, more than one isoform was identified for each subfamily in each of three species.Like other eudicots, populated sets of MBOAT1 members were identified in tomato, potato and hot pepper, except N. benthamiana, where only one sequence was identified to belong to the MBOAT1 subfamily (Table 1).A possible reason is that there are unidentified MBOAT1 genes in unsequenced regions in N. benthamiana as only 70% have been assembled in this genome (Bombarely et al., 2012).
At least two genes belonging to the MBOAT1 subfamily were shown to be functional (Chen et al., 2007;Durrett et al., 2010;Wang et al., 2013).However, very few studies reported characterization of the gene members in this subfamily.More research activities are necessary to demonstrate their biological and evolutional significances.

Chromosomal localization of tomato MBOAT1 genes suggested several tandem duplication events
Identification of populated sets of MBOAT1 members indicated that MBOAT1 genes underwent several events of duplication in plants.To investigate how they were duplicated, MBOAT members in tomato were localized to tomato chromosomes, as tomato genome sequencing results were assembled to a supermolecule level and were well annotated.The 17 genes were localized to 6 of 12 tomato chromosomes, among which 11 MBOAT1 genes were localized to chromosomes 7, 11, and 12 (Fig. 2).Interestingly, there were two clusters of MBOAT1 genes.Particularly, 7 MBOAT1 genes (locus IDs: Solyc11g012200.1.1 − Solyc11g012260.1.1)were localized in tandem near the tip of chromosome 11, indicating that there were extensive tandem duplications in the evolutionary history.Likewise, in potato and hot pepper genomes, there were several MBOAT1 clusters as well.For example, there were two five-gene clusters in the same linkage group (PGSC0003D-MP400003366 -PGSC0003DMP400003366, PG-SC0003DMP400023478 -PGSC0003DMP400023482, respectively) in potato genome.It was the same case in Arabidopsis, in which there were several MBOAT1 clusters (Wang et al. 2013).

Several MBOAT1 members were not expressed
Considering the distinct differences in gene number for different subfamilies in the MBOAT family, we checked if some members are silenced by investigating tomato RNA-Seq data.BLAT, a BLAST-like alignment tool, is used for mapping cDNA sequences to genomes.Here we used BLAT to map short read sequences in seven representative tomato tissues as RNA-Seq results to putative MBOAT sequences in tomato to decide if the in silico sequences are really expressed as RNAs.When performing BLAT, we used an identity cutoff at 100 to ensure 100% matching of sequences between the two datasets.At least one read was mapped perfectly to all the sequences in DGAT1, LPLAT and GUP sequences in tomato.However, only 5 of the 11 MBOAT1 sequences were found to have short read sequences mapped (Fig. 2).The results indicated that while DGAT1, LPLAT and GUP sequences are expressed, a subtle set of MBOAT1 sequences are not expressed in frequent duplication events.

MBOAT1 subfamily members had shorter MBOAT domain and no introns
MBOAT domain, a section of sequences, is conserved among all MBOAT members, which facilitates definition of the MBOAT family.Lengths of sequences and domains, peptide molecular weights and intron numbers were calculated for all tomato MBOAT genes.Genes in tomato MBOAT1 subfamily were generally shorter and thus their putative molecular weights were smaller than those of other subfamily members were.The main reason was that the domain lengths of this subfamily were 191-263 amino acids shorter than other members.Further, unlike other MBOAT subfamily members, no introns were identified in MBOAT1 genes (Table 2).Likewise, in potato and N. benthamiana, MBOAT1 subfamily members were significantly shorter than other MBOAT genes (data not shown).

Nonsynonymous and synonymous divergence within subfamilies
To explore nonsynonymous and synonymous divergence, we calculated Ka/Ks values between members within subfamilies.Ka/Ks values within the MBOAT1 subfamily were significantly lower than those within other subfamilies (Fig. 3).In the MBOAT1 subfamily, the average Ka/Ks value was 0.095, while for the 3 others it was 0.13, 0.19 and 0.13, respectively.The case with median numbers was more significant: while the three other median values were all close to 0.14, that of the Ka/Ks value within MBOAT1 was only 0.02, which suggested a more restricted evolutionary path in MBOAT1 subfamilies than in the others.However, there are some outliers in the MBOAT1 family, which indicated some MBOAT1 genes experienced strong positive selections.

Fig. 1 .
Fig. 1.Phylogram of MBOAT peptides in Solanaceae and Arabidopsis.Bayesian phylogeny was performed with 1 million generations.Posterior probabilities are shown by nodes.The scale measures evolutionary distance in substitutions per peptide.Tip labels are symbols for sequences, with the first three letters representing species sources of genes, and letters after vertical symbols showing loci IDs.Clade representing MBOAT subfamily is labeled in red.Ath − Arabidopsis thaliana; Sly − Solanum lycopersicum (tomato); Can − Capsicum annuum (hot pepper); Stu − Solanum tuberosum (potato); Nbe − Nicotiana benthamiana.

Fig. 2 .
Fig. 2. Chromosomal localization of tomato MBOAT genes.Locus IDs of MBOAT1 genes are underlined.Locus IDs whose sequences were not detected to be expressed in RNA-Seq data were labeled with N.

Fig. 3 .
Fig. 3. Boxplot of pairwise Ka/Ks values within respective subfamilies, as calculated by CodeML in PAML package.Pairwise alignment for each pair of sequences within one of four subfamilies was carried out by PAL2NAL, and the Ka/Ks value was obtained from CodeML in the PAML package.

Table 1 .
Numbers of MBOAT members classified in subfamilies

Table 2 .
Sequence information of tomato MBOAT members