zhlingl的个人博客分享 http://blog.sciencenet.cn/u/zhlingl


2020-12-12=Condon Usage

已有 779 次阅读 2020-12-12 19:30 |个人分类:文献阅读|系统分类:科研笔记

Translational efficiency across healthy and tumor tissues is proliferation‐related


Different tissues express genes with particular codon  usage and anticodon tRNA repertoires. However, the codon–anticodon  co‐adaptation in humans is not completely understood, nor is its effect  on tissue‐specific protein levels. Here, we first validated the accuracy of small RNA‐seq for tRNA quantification across five human cell lines.  We then analyzed the tRNA abundance of more than 8,000 tumor samples  from TCGA, together with their paired mRNA‐seq and proteomics data, to  determine the Supply‐to‐Demand Adaptation. We thereby elucidate that the dynamic adaptation of the tRNA pool is largely related to the  proliferative state across tissues. The distribution of such tRNA pools  over the whole cellular translatome affects the subsequent translational efficiency, which functionally determines a condition‐specific  expression program both in healthy and tumor states. Furthermore, the  aberrant translational efficiency of some codons in cancer, exemplified  by ProCCA and GlyGGT, is associated with poor patient survival. The  regulation of these tRNA profiles is partly explained by the tRNA gene  copy numbers and their promoter DNA methylation.

Quantification of tRNA expression over thousands of small RNA‐seq  samples from The Cancer Genome Atlas unveils the existence of  tissue‐specific translational efficiencies related to proliferation.


  • tRNA abundance is tissue‐ and cancer‐type‐specific.

  • Translational efficiency is globally controlled and the cellular translatome needs to compete for a limiting tRNA pool.

  • Proliferation is the major  determinant of translational efficiency differences among tissues, and  the codon ProCCA appears particularly favored in cancer.

  • Differences at the tRNA pool affect protein translation and subsequently determine specific functional phenotypes.


In the light of the genetic code, multiple 3‐letter  combinations of nucleotides in the mRNA can give rise to the same amino  acid, which are known as synonymous codons. However, despite the  homology at the protein level, these different codons are recognized  distinctly by the transcriptional and translational machineries (Supek, 2016; Hanson & Coller, 2017) and ultimately cause changes at multiple levels of gene expression.  Therefore, the non‐uniform abundance of synonymous codons across  different tissues and among distinct functional gene sets has been  proposed as an adaptive mechanism of gene expression regulation  (Najafabadi et al, 2009), particularly linked to the proliferative state (Gingold et al, 2014). Nevertheless, in human, it is still under debate whether the efficiency of gene expression is the main selective pressure driving the evolution of genomic codon usage (Pouyet et al, 2017).        

The 61 amino‐acid‐coding codons need to be recognized by 46  different tRNA isoacceptors distributed across 428 Pol‐III‐transcribed  tRNA genes (Chan & Lowe, 2016), thus requiring wobble interactions (non‐Watson‐Crick base pairing).  This complexity of the tRNA repertoire is further enhanced by an average of 11–13 base modifications per tRNA and all possible combinations  thereof (Schimmel, 2018). The underlying mechanisms regulating tRNA gene expression and modification are far from resolved (Pan, 2018; Rak et al, 2018). However, it has been established that different conditions and tissues showcase distinct tRNA abundances (Dittmar et al, 2006; Gingold et al, 2014) and codon usages (Najafabadi et al, 2009; Waldman et al, 2010).        

In order to understand such changes in codon–anticodon  co‐adaptation, orthogonal datasets of gene expression including tRNA  quantification are required, which needs to overcome the challenges of  strong secondary structures and abundant chemical modifications. Recent  technological developments have paved the way for sensitive  high‐throughput tRNA sequencing across tissues and conditions (Zheng et al, 2015a; Gogakos et al, 2017a). Aside from these methods and despite the lower coverage, tRNA reads can also be detected from generic small RNA‐seq datasets (Guo et al, 2015, 2016; Pundhir & Gorodkin, 2015; Torres et al, 2015a; Hoffmann et al, 2018). In this context, The Cancer Genome Atlas (TCGA) has been recently used  to investigate the alteration of tRNA gene expression and translational  machinery in cancer, which may play a role in driving aberrant  translation (Zhang et al, 2018, 2019).        

To validate the use of small RNA‐seq for tRNA  quantification, we first compare tRNA levels determined in HEK293 by  well‐established tRNA sequencing methods (Hydro‐tRNAseq and  demethylase‐tRNAseq) (Zheng et al, 2015a; Gogakos et al, 2017a; Mattijssen et al, 2017a), with those obtained by small RNA‐seq. Then, we quantify the tRNA  repertoire of five cell lines using Hydro‐tRNAseq and perform small  RNA‐seq in parallel. Comparison of the tRNA measures obtained by both  approaches shows that it is possible to accurately estimate relative  tRNA abundance of cells and tissues using small RNA‐seq. Furthermore, we show that both types of quantification are informative enough to  distinguish between the five analyzed human cell lines covering multiple tissue types. In consequence, we apply a tRNA‐specific computational  pipeline to re‐analyze 8,534 small RNA‐seq datasets from TCGA (Chu et al, 2016). We find that the tissue specificity of tRNA profiles is largely  proliferation‐related, even within healthy tissues. The tRNA  quantification of TCGA samples enables their comparison with paired and  publicly available mRNA‐seq, proteomic, DNA methylation, and copy number data, which underscores the role of tRNAs in globally controlling a  condition‐specific translational program. We discover multiple codons,  including ProCCA and GlyGGT, whose translational efficiency is  compromised and leads to poor prognosis in cancer. Finally, promoter DNA methylation and tRNA gene copy number arise as two regulatory  mechanisms controlling tRNA abundances in cancer.        


tRNA quantification and modifications from small RNA‐seq data

In order to test how accurately we can extract tRNA  abundance information contained in small RNA sequencing data, we  re‐analyze four publicly available datasets of the cell line HEK293  (Flores et al, 2014a; Data ref: Flores et al, 2014b; Mefferd et al, 2015a; Data ref: Mefferd et al, 2015b; Torres et al, 2015a; Data ref: Torres et al, 2015b,c). In contrast to previous studies analyzing tRNA expression from small RNA‐seq data (Zhang et al, 2018, 2019), we use a computational pipeline specifically developed for the accurate mapping of tRNA reads (Hoffmann et al, 2018) in order to quantify all different isoacceptor species (Fig 1A, see Materials and Methods). To validate the accuracy of these small RNA‐seq quantifications, we  retrieve four datasets of well‐established tRNA sequencing methods  (Hydro‐tRNAseq and demethylase‐tRNAseq) applied to the same cell type  (Zheng et al, 2015a; Data ref: Zheng et al, 2015b; Gogakos et al, 2017a; Data ref: Gogakos et al, 2017b; Mattijssen et al, 2017a; Data ref: Mattijssen et al, 2017b; preprint: Benisty et al, 2019a; Data ref: Benisty et al, 2019b), which autocorrelate in the range of 0.75–0.85 among themselves (Table EV1, Fig EV1A). In comparison, our four HEK293 small RNA‐seq quantifications show an  average Spearman correlation against these four conventional datasets of 0.73. Compared to the Zhang et al (2018) quantification, which correlate in the range of 0.60–0.77 (Table EV1, Fig EV1A), our tRNA‐specific mapping pipeline performs slightly better than the  previously published protocol. It has been reported that there are  tRNA‐derived fragments naturally produced and having other functions  different from translation (Schimmel, 2018), which could confound the tRNA quantification. Although we cannot  exclude the presence of tRNA‐derived fragments in small RNA‐seq datasets (Torres et al, 2019), we found that no differences between reads with or without mismatches  are found when compared to tRNAseq protocols in which tRFs are  specifically removed before sequencing. 我们无法排除小RNA测序数据集中存在tRF,但是我们发现与去除tRF方法中,reads有或无错配并没有差别


  Figure 1.tRNA quantification and modifications from small RNA‐seq data                        

  1. Schematic pipeline for accurate mapping of tRNA reads.

  2. Correlations  between tRNA quantifications by small RNA‐seq and Hydro‐tRNAseq of  matching (correlations within the same cell line) versus non‐matching  (different cell lines) samples. Center values represent the median. The P‐value corresponds to a one‐tailed Wilcoxon rank‐sum test, with nmatching = 9 and nnon‐matching = 63.                          

  3. Overlap of the detected tRNA modifications upon variant calling by both methods.

  4. The TCGA network contains small RNA‐seq data alongside mRNA‐seq, DNA methylation arrays, non‐targeted proteomics, and copy number alteration quantification  comprising 17 tissues.

Correlations between small RNA‐seq and hydro‐tRNAseq protocols, related to Fig 1

  1. The Spearman correlations between all HEK293 tRNA quantifications, including four small RNA‐seq datasets (Data ref: Flores et al, 2014b; Data ref: Mefferd et al, 2015b; Data ref: Torres et al, 2015b,c), four well‐established tRNAseq datasets using Hydro‐tRNAseq and DM‐tRNAseq (Data ref: Zheng et al, 2015b; Data ref: Gogakos et al, 2017b; Data ref: Mattijssen et al, 2017b; Data ref: Benisty et al, 2019b), and one previously published quantification from small RNA‐seq data by Zhang et al (2018). The scatter plots are square‐root‐normalized for better visualization.                          

  2. Principal  component analysis (PCA) and linear discriminant analysis (LDA) of all  absolute tRNA quantifications of HCT116, HELA, HEK293, and BJ  fibroblasts, both from small RNA‐seq and from Hydro‐tRNAseq data. On the left, the first component of the PCA (method‐related) explains 31.59%  variance and the second component (cell‐line‐related) explains 30.83%  variance. On the right, the first component of the LDA separates  completely all four cell lines, regardless of the method.

                       Comparison between raw sequencing data from small RNA‐seq (left) and Hydro‐tRNAseq (right), related to Fig 1

  1. Length distribution of all reads mapping to tRNA genes.

  2. Fraction of the tRNA sequence that is covered by 1 or more reads, for each of the tRNA isoacceptors.

  3. Absolute number of reads mapping along the sequence of tRNAProAGG. The relative abundance between cell lines is not comparable given their different sequencing depths.                          

                       Nucleotide modifications of HEK293 on the consensus tRNA structure, related to Fig 1

  • A, B.  Over the consensus mature tRNA structure of tRNAdb (Jühling et al, 2009), the most frequent detected variants are depicted for both Hydro‐tRNAseq (A) and small RNA‐seq (B). The color code represents the number of  tRNAs showing the modification at each specific position of the tRNA  model.

  • C.  Consensus modifications detected by both methods. The color intensity refers to  the sum of mismatch calls of the two methods altogether. Refer to Table EV3 for a detailed list of modifications for all cell lines.

                       Proliferation is the major driver of tissue specificity in tRNAs                        

  1. Medians of  square‐root‐normalized tRNA abundances across all TCGA tissues. The  color of the tissue labels corresponds to the average Ki67 expression.  Refer to Table EV4 for full cancer type names and number of samples.                          

  2. Principal component analysis (PCA) of the relative anticodon abundances (RAA, see Materials and Methods) of all healthy samples of TCGA, where the color scale corresponds to  the mean tissue expression of Ki67. The Spearman correlations of Ki67  with the components are shown, as well as the samples of most extreme  tissues.                          

  3. Top positive and negative GO terms upon gene set enrichment analysis (GSEA) of the  correlations of the first PCA component against all genes.

tRNA repertoires determine tissue‐specific translational efficiency

Given that different tissues express distinct tRNA repertoires, we  wondered whether they could have an effect in protein translation  elongation. The so‐called translational efficiency is defined as the  rate of protein production from mRNA, and multiple indices and models  can be described to estimate it (Gingold & Pilpel, 2011). In this article, and based on previous studies underscoring the global  control role of codon usage as a competition for a limited tRNA pool  (Gingold et al, 2012; Pechmann & Frydman, 2013; Frumkin et al, 2018), we define the Supply‐to‐Demand Adaptation (SDA) as the balance between  the supply (i.e., the anticodon tRNA abundances) and demand (i.e., the  weighted codon usage based on the mRNA levels) for each of the 60 codons (excluding methionine and stop codons). Furthermore, we normalize both  the codon and anticodon abundances within each amino acid family (i.e.,  relative to the most abundant synonymous codon/anticodon), in order to  remove the effect of amino acid biases and get a cleaner measure of  codon optimality (Eraslan et al, 2019).            

To validate the suitability of SDA in determining the translational  efficiency, we correlate the SDA value of all proteins against the  available proteomics data of paired TCGA samples (Slebos et al, 2015; Mertins et al, 2016), which includes breast and colorectal tissues (tumor only, as no healthy samples are available). Although correlations are modest, both the  protein abundances and the protein‐to‐mRNA ratios correlate  significantly better with SDA than with the classical tRNA Adaptation  Index [tAI] (dos Reis et al, 2003, 2004) or with a relative tAI with normalized weights within each amino acid family [RtAI] (Figs 3A and EV4A and B). In consequence, including the mRNA codon demand into the SDA  metric outperforms other tRNA‐only metrics of translational efficiency.  Furthermore, the correlation of SDA with protein‐to‐mRNA ratio is  slightly but significantly higher than with protein levels alone, which  indicates that the first is a better proxy for the process of  translation (Fig 3A).            

                       tRNA repertoires determine tissue‐specific translational efficiency                        

  1. Three metrics of translational efficiency (the classical tAI, a relative tAI with  normalized weights within each amino acid family, and the  Supply‐to‐Demand Adaptation described in this article) are Spearman  correlated against two proxies of translation (protein abundance and  protein‐to‐mRNA ratio) for all samples for which proteomics data are  available (BRCA, COAD and READ). Center values represent the median.  Statistical differences are determined by sample‐paired two‐tailed  Wilcoxon rank‐sum test (n = 219).                          

  2. Principal  component analysis (PCA) of the SDAw of TCGA, where the color scale  corresponds to the mean tissue expression of Ki67. The Spearman  correlations of Ki67 with the components are shown, as well as the  samples of most extreme tissues. On the right, the top and bottom  proliferation‐ and differentiation‐related codons, as defined by Gingold et al (2014), ordered by their contribution to the first PCA component. Refer to Table EV4 for full cancer type names and number of samples.                          

  3. GSEA of the differential SDA between extreme tissues (ΔSDA = SDAColorectal ‐ SDABrain), showing the top five GO terms with high (left) and low (right) SDA in colorectal versus glial tissues.                          




该博文允许注册用户评论 请点击登录 评论 (0 个评论)



Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2021-9-25 02:17

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社