IN SILICO ANALYSIS OF BETA-LACTOGLOBULIN GENE IN SOME SELECTED MAMMALIAN SPECIES

This study investigated in silico, the genetic diversity of BetaLactoglobulin (β-Lg) and their evolutionary and differentiation within and among selected mammalian species; and also examined the attendant effects of polymorphism on the functionality of the gene. A total of 21 β-Lg gene sequences with corresponding amino acids belonging to 6 species [cattle (4), buffalo (4), sheep (3), goat (3), pig (3) and horse (4)] were retrieved from GenBank (www.ncbi.nlm.nih.gov). All sequences were trimmed to equal length (500bp) corresponding to the same region. Sequences’ alignment, translation and comparison were done with ClustalW using IUB substitution matrix, gap open penalty of 15 and gap extension penalty of 6.66. The alignment revealed high polymorphism of sequences among extant species. The Dxy inferred using pdistance revealed that sheep and goat had the lowest distance of 0.05 with a maximum distance of 0.65 between goat and horse. The hypothesis of strict neutrality (dN = dS) was rejected for all extant species as allelic sequence evolution was driven by both purifying and positive selection. Only those of pig and buffalo were driven by positive selection. In-silico functional analysis of non-synonymous mutations using PANTHER revealed that, all the 12 amino acid substitutions (10 in cattle and 2 in sheep) did not impair protein function. The Neighbour-Joining phylogeny revealed trans-species evolution, but a species-wise phylogeny was obtained for UPGMA with consensus sequences. Thus, all probed SNPs from this study have no deleterious effect and can be tolerated by breeders when selecting stocks for milk improvement.


Introduction
To select for greater milk production, breeders have traditionally relied upon phenotypic evaluation and selection. In developed countries, phenotypic selection has yielded appreciable degree of successes in predicting genetically superior animal (Staiger, 2007). In United State, average milk production per lactation has doubled in the last forty (Dekkers and Hospital, 2002) while in United Kingdom, production has tripled in the last seventy years (Simm, 1998). More than half of this increased in milk production has been due to improvement in genetics. Despite these increases, phenotypic selection could be improved to give better accuracy by utilizing molecular genetics and bioinformatics selection techniques. Because milk production is a quantitative trait, heritability value is not a perfect predictor of genetic merit of an individual (Staiger, 2007). In addition, milk production can only be measured in females that have reached sexual maturity or calved; this makes it very difficult to analyze males and prepubescent animals. However, with recent development and advances in DNA technology to identify genes, QTL and SNPs, many of these pitfalls can be overcomed because DNA extraction and analysis is not limited by age, sex or time.
Milk proteins have been grouped into casein and whey. Casein accounts for about 80% while whey accounts for 20% of milk protein (Hoffman and Falvo, 2004). Beta-lactoglobulin (β-Lg) is a lipocalin, a widely diverse family, most of which bind small hydrophobic ligands and thus may act as specific transporters, as does serum retinol binding protein. In bovine, β-Lg gene is located on chromosome 11 (Berry et al., 2010).
Milk proteins exhibit genetic polymorphism at both protein and DNA levels and this polymorphism could be due to amino acid substitution or deletion of small peptides along the polypeptide chain (Chin, 1998). Today, a number of studies have indicated that milk production, composition and quality are affected by genetic variants of milk proteins (Chin, 1998). For instance, it has been reported that β-casein A2 and A3 and K-casein are associated with higher milk yield when compared with other variants (Ng-Kwai-Hang et al., 1986). Therefore, identification and probing of milk protein variants provides an important tool for complementing traditional breeding methods in improving the yield and quality of milk and dairy production (Chin, 1998). This study aimed at examining the genetic diversity of β-Lg gene in-silico especially on its evolution and differentiation within and among species and also examining the attendant effects of polymorphism on its functionality in the selected mammalian species.

Multiple Sequence Alignment (MSA) and Translation
Sequences' alignment, translation and comparison were done with Clustal W as described by Larkin et al. (2007) using IUB substitution matrix, gap open penalty of 15 and gap extension penalty of 6.66.

Test of Selection
The relative proportion of non-synonymous substitution per non-synonymous site (d N ) and the number of synonymous substitutions per synonymous site (d S ) were estimated using the method of Nei and Gojobori (1986). The ratio of nonsynonymous to synonymous divergence "ω" or (d N /d S ) was also tested for departure from the neutral expectation of unity using the codon based Zdistribution model as modified by Nei-Gojobori, applying proportion correction.

Functional Analysis
In-silico functional analysis of non-synonymous mutation was estimated using PANTHER (Thomas et al., 2003). PANTHER tools estimates the likelihood of a given non-synonymous (mutation) Coding SNP to initiate a functional change in protein. The subPSEC (substitution position-specific evolutionary conservation) score was also estimated based on alignment of evolutionarily related proteins. The probability that a given variant will cause a deleterious effect on protein function is estimated by P deleterious , such that a subPSEC score of -3 corresponds to a P deleterious of 0.5 (Brunham et al., 2005). The subPSEC score is the negative logarithm of the probability ratio of the wild-type and mutant amino acids at a particular position. PANTHER subPSEC scores, derived from the probabilities of observing the variant amino acids in a PANTHER Hidden Markov Model (HMM), are continuous values from 0 to -10. When subPSEC = 0, the substitution is interpreted as functionally neutral, whereas more negative values of subPSEC predict more deleterious substitutions (Brunham et al., 2005). The published SNPs used for this study include: 10 (cattle): Gly64Asp, Ala118Val, Gln59His, Glu45Gln, Glu158Gly, Pro50Ser, Asp130Tyr, Ile78Met, Glu108Gly, Pro126Leu and 2 (sheep): Tyr20His, Arg148Glu.

Phylogenetic Analysis
Neighbor-Joining NJ tree was constructed each using P-distance model and pairwise deletion gap/missing data treatment. The construction was on the basis of genetic distances, depicting phylogenetic relationships among β-Lg nucleotide sequences of the investigated species. The reliability of the tree was calculated by bootstrap confidence values (Felsenstein, 1985), with 1000 bootstrap iterations using MEGA 5.1 software (Tamura et al., 2011). Similarly, UPGMA tree for each gene was also constructed with consensus sequences; using same model as that of the NJ tree. All sequences were trimmed to (500bp ) equal length corresponding to same region before generating the tree.

Results and Discussion
In Table 1 is shown the estimated distance matrix for β-Lg between consensus sequences of 6 mammalian species. Maximum Dxy was between goat and horse and minimum Dxy was between sheep and goat. The average genetic distance Dxy is an index of divergence within and across species; where Dxy=distance between sequence x and sequence y. The higher the value of Dxy the far apart the two species are. Amongst ruminants, the maximum Dxy value of 0.62 obtained between cattle and buffalo is also evident on the dendrogram which shows evidence of trans-species evolution. The high value between these two species might have arisen from the different regions of DNA used for this study or probably, the included sequences are the divergent regions of the two species. The minimum value between sheep and goat is consistent with recent molecular grouping of sheep and goat. Also the maximum value between goat and horse is consistent with classical grouping as goat is expected to be closer to sheep, cattle, buffalo and pig than horse. This is similar to the findings of Vincent et al. (2014) on β-CN, who reported maximum Dxy between goat and horse. Table 2 shows the means of d S and d N , omega (ω), z-statistics and probability. The 'ω' value reveals that evolution is driven by positive selection for only pig and buffalo whereas all other species are under purifying selection. Insight into the mechanism by which natural selection drives gene functional diversification across different species and lineages is a key issue in biology (Toll-Riera et al. 2011;Yakubu et al. 2013a).  The varying substitutions of amino acids within and across species might be as a result of separate divergence from their common ancestor. According to Marini et al. (2010), as orthologs diverge from their most recent common ancestor, their different evolutionary trajectories lead to divergence in the selective constraints on homologous sites. The comparison of the number of nonsynonymous substitution per non-synonymous sites (d N ; amino acid altering) to the number of synonymous mutations per synonymous sites (d S ; silent mutation) also known as omega (ω = d N /d S ); is a useful estimate of gene selective pressure (Yakubu et al. 2013a). Zhang et al. (2005) noted that omega (ω) >1 implies positive selection, that is, selection has caused some amino acid substitutions that are non-deleterious and that the operative effect of purifying selection is not strong enough to overcome the effect of positive selection. In this study, the null hypothesis of strict neutrality (d N -d S ) was rejected. The estimated (ω) values which range from 0.55-1.09 symbolize the operation of both positive and purifying selection. Only pig and buffalo had ω >1 signifying positive selection; while other species with ω<1 are under purifying selection. Yakubu et al. (2013b), in their study on MHC reported that, positive selection favours new variants thereby increasing allelic polymorphism (Bergstronm and Gyllensten, 1995) which in turn favours the ability of antigen-presenting cells to bind a wide array of self and nonself-peptides, thus conferring higher resistance to infectious diseases (Hedrick and Kim, 2000). Mutation in pig and buffalo increases fitness (Kosakovsky Pond, 2011) and this is expected to reflect on performance. The d N -d S shows the surplus of beneficial or deficit of deleterious non-synonymous substitution. The high 'ω' values might enhance retinol binding, immune response and metabolism of phosphate in the mammary gland (Kontopidis et al., 2003) Table 3 depicts predicted effect of non-synonymous amino acid variants in β-Lg gene. Several SNPs have been reported to serve as biomarkers for exploring the genetic bases of disease conditions (Tariq et al., 2013) and production traits. The prediction of SNPs status is promising in modern genetics analysis and breeding programmes as they have been used to identify those animals with higher breeding value. But it is still a great challenge to identify functional SNPs in production or disease related gene. However, computational approach has helped in overcoming this challenge and this has increased the success rate of genetic association studies. Major interest in both human and animal genetics is to distinguish mutations that increase fitness from those that reduce fitness (Tariq et al., 2013).
In this study, a total of 12 SNPs were sourced and probed, however, all the probed SNPs were beneficial i.e. protein function is not altered in anyway. Therefore, for milk improvement programmes, all probed SNPs should be noted. Figure 1 depicts the deduced amino acid by ClustalW excluding sites with missing/ambiguous data and gaps. A greater level of polymorphism was shown within and across species. The topology of distance-based β-Lg NJ-tree is a clear reflection of trans-species evolution (figure 2) which could be attributed to the different regions used for the study. Random clustering of sequences was exhibited by all the studied species except for horse whose entire sequences aggregated. This distinct embranchment shown by horse is premised on the identical promoter region. Generally, the random clustering may largely be due to regional variability such that those of similar region clustered together. On the contrary, the UPGMA tree revealed species-wise evolutionary history (figure 3); a congruence with classical taxonomy.

Conclusion
In this study, the hypothesis of strict neutrality (d N =d S ) was rejected for all extant species as allelic sequence evolution was driven by both purifying and positive selection. However, only those of pig and buffalo were driven by positive selection. In-silico functional analysis of non-synonymous mutations using PANTHER revealed that, all the 12 amino acid substitutions (10 in cattle and 2 in sheep) did not impair protein function. The phylogeny revealed trans-species evolution, but was species-wise on with consensus sequences.