||
PNAS August 24, 2004 101 (34) 12588-12591;
A diverse array of mechanisms regulate tissue-specific protein levels. Most research, however, has focused on the role of transcriptional regulation. Here we report systematic differences in synonymous codon usage between genes selectively expressed in six adult human tissues. Furthermore, we show that the codon usage of brain-specific genes has been selectively preserved throughout the evolution of human and mouse from their common ancestor. Our findings suggest that codon-mediated translational control may play an important role in the differentiation and regulation of tissue-specific gene products in humans.
With the advent of mRNA expression arrays, researchers have begun to delineate which genes are selectively expressed in which tissues and, in a fundamental way, distinguish one tissue from another (, ). Although such studies help to elucidate expression patterns, the processes underlying differentiation and regulation of tissue-specific proteins remain outstanding problems in developmental and molecular biology. Here, we show that genes selectively expressed in one human tissue can often be discriminated from genes expressed in another tissue purely on the basis of their synonymous codon usage. In particular, we demonstrate that brain-specific genes show a characteristically different codon usage than liver-specific genes; uterus genes differ from testis genes; and ovary genes differ from vulva genes, as well as other pairs of these six tissues.
Although it came as a surprise to early neutral theorists (), it is now clear that codon usage is not random: Among synonymous codons, some codons are used preferentially. Moreover, taxa differ in their codon usage. For example, various species of Drosophila each have their own particular codon biases, and their usage differs significantly from Escherichia coli or Saccharomyces cerevisiae (–). The dominant theory of codon bias for organisms ranging from E. coli to Drosophila posits that preferred codons correlate with the relative abundances of isoaccepting tRNAs, thereby increasing translational efficiency (–).
Synonymous codon choice also affects gene expression in mammals: When nonmammalian genes are to be expressed in mammalian cells, the replacement of mammalian-rare codons with more common synonyms greatly increases gene expression (–). Nevertheless, there is little evidence in mammals of selection on synonymous codons for translational efficiency. Instead, mammalian genomes exhibit large-scale variation in GC content [e.g., isochores ()] in both coding and noncoding regions. The GC content in noncoding regions is correlated with the GC content at the third position of coding regions from the same isochore. Thus, codon biases observed in the human genome have been attributed to neutral processes [such as biased mutation () and gene conversion ()] rather than to selection (). [Early studies on cDNA clones derived from a diverse set of vertebrate genes failed to find evidence for tissue-specific or taxon-specific codon usage ().]
The most common measure of codon bias, called the effective number of codons (ENC), is analogous to the effective number of alleles in population genetics. ENC does not describe the particulars of which codons are more frequent than others but rather measures the overall departure from random synonymous codon choice. As a result, two genes may exhibit the same degree of overall bias (ENC value) and yet differ dramatically in their particular choice of synonymous codons.
For this study, we desire a detailed measure of the “distance” between the synonymous codon usage of two genes. We are not concerned with degree of codon bias in the usual sense, that is, the departure from random synonymous codon choice, but rather with the degree to which genes differ in their encoding of amino acids. Given the coding sequences for a pair of genes, we compare their codon usage by first tabulating the absolute frequency of each codon in each gene. For each amino acid, we compute a two-tailed Fisher exact test () on the n × 2 contingency table given by the frequencies of the amino acid's synonymous codons (e.g., for Ala n = 4: GCC, GCG, GCA, and GCT). As a result, for each amino acid we obtain a P value indicating whether or not the genes use significantly different codons to encode that amino acid. summarizes an example of this analysis by comparing the codon usage of two human genes.
For each codon, we report its absolute frequency of occurrence in each gene and its relative frequency compared with synonymous codons. The P value for each amino acid reflects whether or not the two genes differ in their encoding of the amino acid (Fisher exact test). A complete comparison of all 61 condons is given as Table 2, which is published as on the PNAS web site. The comparison between these genes is typical of comparisons between other genes from their respective tissues, testes and uterus. Gene A, testis-specific glycerol kinase (GI 516123); gene B, endometrial bleeding factor (GI 2058537).
The number of amino acids that exhibit a statistically different encoding is a biologically relevant metric of distance between the codon usage in two genes. All other things being equal (i.e., RNA folding, protein–RNA recognition, transport, etc.), for a fixed pool of tRNAs, this metric should naturally correlate with the difference in translation rates between the two genes. Unlike metrics such as “relative synonymous codon usage” (), which are noisy when applied to individual genes, our measure of codon usage relies on the Fisher Exact test for small sample sizes, and it can be applied to genes that contain only a few examples of each amino acid.
The uterus- and testis-specific genes used in this study (Table 3, which is published as on the PNAS web site) were obtained directly from the tissue-specific lists compiled by Warrington et al. (). The brain, liver, ovary, and vulva genes (Table 3) were taken from the online expression database of Hsiao et al. (). A gene was considered to be brain-specific if, according to the Hsiao database (), its mRNA transcript is present in brain but absent from all but at most two other tissue types tested by Hsiao et al. The criteria for tissue-specific consideration were the same for liver, ovary, and vulva.
Given a dendogram that represents the codon usage of genes in a pair of tissues (e.g., ), we calculate a P value to test whether the observed clustering of genes is nonrandom. The P value is obtained by comparing the observed summed squared distances along the tree between genes of the same tissue against a null distribution produced by randomly permuting the labels of the leaves.
A dendogram reflecting the codon usage of 26 genes selectively expressed in human testis (red) and 16 genes selectively expressed in uterus (blue). Genes are denoted by their GI number. The pairwise distances underlying this tree reflect the degree to which the genes differ in their codon usage. As this tree demonstrates, testis-expressed genes can generally be distinguished from uterus-expressed genes purely on the basis of their synonymous codon usage. The observed separation between these two classes of genes would not have occurred by random chance (P = 0.0008)
For each of the 44 brain-specific genes, the corresponding mouse orthologs were obtained from the ensembl web-site by using ensmart, and they were aligned by using clustalw (). The same procedure was used to produce orthologous alignments of the genes specific to ovary, testes, uterus, liver, and vulva.
On the basis of two extensive microarray mRNA expression studies (, ), we have identified genes that are selectively expressed in six adult healthy human tissues: testis (26 genes), uterus (16 genes), total brain (44 genes), liver (34 genes), ovary (36 genes), and vulva (42 genes). By analyzing expression patterns from only two studies, we limited ourselves to fewer data than are available in large compilations of many expression studies. On the other hand, the expression data we have used are comparable (both studies used the GeneChip HuGeneFL microarray), and they provide a consistent, unbiased method of assigning tissue-specificity. The total number of identified tissue-specific genes is smaller than in previous studies () because we use a conservative, stringent definition of tissue specificity (see Methods). The genes selectively expressed in each of these six tissues are distributed throughout the genome (Table 3), and they have similar distributions of gene sizes (the mean gene length within each tissue is well within one standard deviation of the means of all other tissues.)
We have compared codon usage between pairs of the six tissues. When comparing testis to uterus, for example, we calculate the distance between the codon usage of every pair of genes (including pairs from the same tissue), obtaining a 42-by-42 symmetric matrix of pairwise distances. The distance between two genes is given by the number of amino acids that exhibit significantly different (P < 0.01) codon usage, as defined above. Our results are not sensitive to the particular choice of a threshold P value within 0.001 and 0.05. By using the neighborjoining method (phylip v3.5), we produced a dendogram that graphically represents the measured pairwise distances between the codon usage in the study genes.
shows the dendogram resulting from the codon usage in testis- and uterus-specific genes. Note that virtually all testis-associated genes are clustered in a separate clade from the uterus-associated genes. The observed clustering is the result of systematic differential codon usage between the testis- and uterus-specific genes. indicates that we can generally discriminate between testis- and uterus-expressed genes on the basis of their codon usage alone.
The separation of testis and uterus genes seen in would not have occurred by random chance (P < 0.0008, see Methods). Similarly, indicates that brain-specific genes are easily distinguishable from liver-specific genes on the basis of their codon usage (P < 0.00018). We also find (trees not shown) that ovary-specific genes are distinguishable from vulva genes (P < 0.0032), brain genes are distinguishable from testis genes (P < 0.0044), brain genes are distinguishable from ovary genes (P < 0.00008), and vulva genes are distinguishable from testes genes (P < 0.0092). All but one of these results remain significant even after Bonferroni–Holm correction for multiple hypotheses.
A dendogram reflecting the codon usage of 44 brain-specific genes (red) and 34 liver-specific genes (blue). The observed separation between these two classes of genes would not have occurred by random chance (P = 0.00018).
Despite the results presented above, many pairs of tissue-specific gene sets do not exhibit significantly different codon usage (e.g., liver versus uterus). The evolutionary processes that produce differential codon usage between certain pairs of tissues but not others pose an intriguing question for further research.
It is tempting to hypothesize that the highly nonrandom, tissue-specific codon usage we have observed serves an adaptive function. Although we cannot impute an adaptive function, we can nevertheless demonstrate that the codon usage of brain-specific genes has been selectively preserved far more than expected by chance during the evolution of human and mouse from their common ancestor. For this analysis, we have identified and aligned mouse orthologs for the 44 brain-specific human genes (see Methods) and for the other study tissues.
We considered only those sites in the alignment of the human and mouse brain genes that exhibited either identical or synonymous codons. There are 31,050 such codons, which we concatenated into a single sequence for each organism. The resulting aligned mouse and human sequences are fairly similar in their codon usage. There are only two amino acids that have a significantly different encoding (P < 0.01) between the orthologous sequences.
The overall similarity of codon usage between the mouse and human brain-specific genes does not in itself imply that codon usage has been selectively preserved, because the human and mouse sequences are similar by descent. There are only 8,837 (synonymous) nucleotide mutations between the two sequences. We have applied a randomization test to compare the codon usage of the human and mouse sequences, controlling for their sequence similarity. In each randomization trial, we started with the mouse sequence, and we introduced in randomly chosen synonymous locations the observed number of nucleotide changes (preserving even the number of mutations of each type, A→C, A→T, A→G, C→A, etc.) to produce a randomized version of the human sequence. The resulting randomized sequence has the exact same amino acid and nucleotide composition as the observed human sequence. Moreover, the randomized human sequences contain virtually the same dinucleotide CpG content as the actual human sequence. The mean number of occurrences of CpG in the codons of the randomized sequences agrees with the actual number of CpGs in the observed human sequence (all randomization trials fall within 2% of the observed human CpG content).
Among 10,000 such randomization trials, there were on average 7.53 amino acids that exhibited significantly different encodings between the mouse sequence and the randomized human sequence. There were no examples in which the mouse sequence and the randomized human sequence exhibited fewer than four amino acids with different encodings. In other words, even when controlling for their amino acid compositions, their nucleotide compositions, and their CpG compositions, the human and mouse genes are far more similar in synonymous codon usage than expected by random chance (P < 10–4), given the mutations that have occurred between them. Although the aligned mouse and human sequences exhibit synonymous differences in 28% of their codons, these differences compensate in such a way so as to preserve the overall codon usage. This result suggests that there has been selection to preserve the codon usage of these brain-specific genes throughout the evolution of mouse and human from their common ancestor.
In addition to brain-specific genes, the genes associated with most of the other study tissues also show a highly significant degree of synonymous codon usage preservation compared with their mouse orthologs (P < 0.0032 each for liver, uterus, and vulva.) Notably, however, the synonymous codon usage in testes-specific and, to a lesser extent, ovary-specific genes do not show significant preservation between human and mouse (P = 0.48 and P = 0.058, respectively). This result is analogous to the well-established fact that the protein sequences of reproductive genes, particularly those related to spermatogenesis, have undergone rapid evolution in primates (
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-11-25 00:19
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社