部分蛋白编码基因起源于non-coding RNA

已有 3299 次阅读 2012-9-14 11:31 |系统分类:论文交流| 基因, 转录组, non-coding

新发表两篇恒河猴转录组医学论文, 发现部分蛋白编码基因起源于non-coding RNA, 建立恒河猴专家数据库; 附上文章链接, 欢迎讨论批评:
Hominoid-Specific De Novo Protein-Coding Genes Originating from Long Non-Coding RNAs
Chen Xie, Yong E. Zhang, Jia-Yu Chen, ..., Liping Wei and Chuan-Yun Li

Tinkering with pre-existing genes has long been known as a major way to create new genes. Recently, however, motherless protein-coding genes have been found to have emerged de novo from ancestral non-coding DNAs. How these genes originated is not well addressed to date. Here we identified 24 hominoid-specific de novo protein-coding genes with precise origination timing in vertebrate phylogeny. Strand-specific RNA–Seq analyses were performed in five rhesus macaque tissues (liver, prefrontal cortex, skeletal muscle, adipose, and testis), which were then integrated with public transcriptome data from human, chimpanzee, and rhesus macaque. On the basis of comparing the RNA expression profiles in the three species, we found that most of the hominoid-specific de novo protein-coding genes encoded polyadenylated non-coding RNAs in rhesus macaque or chimpanzee with a similar transcript structure and correlated tissue expression profile. According to the rule of parsimony, the majority of these hominoid-specific de novoprotein-coding genes appear to have acquired a regulated transcript structure and expression profile before acquiring coding potential. Interestingly, although the expression profile was largely correlated, the coding genes in human often showed higher transcriptional abundance than their non-coding counterparts in rhesus macaque. The major findings we report in this manuscript are robust and insensitive to the parameters used in the identification and analysis of de novogenes. Our results suggest that at least a portion of long non-coding RNAs, especially those with active and regulated transcription, may serve as a birth pool for protein-coding genes, which are then further optimized at the transcriptional level.
RhesusBase: a knowledgebase for the monkey research community
Shi-Jian Zhang, Chujun Liu ....Xiuqin Zhang and Chuan-Yun Li

Although the rhesus macaque is a unique model for the translational study of human diseases, currently its use in biomedical research is still in its infant stage due to error-prone gene structures and limited annotations. Here, we present RhesusBase for the monkey research community ( We performed strand-specific RNA-Seq studies in 10 macaque tissues and generated 1.2 billion 90-bp paired-end reads, covering >97.4% of the putative exon in macaque transcripts annotated by Ensembl. We found that at least 28.7% of the macaque transcripts were previously mis-annotated, mainly due to incorrect exon–intron boundaries, incomplete untranslated regions (UTRs) and missed exons. Compared with the previous gene models, the revised transcripts show clearer sequence motifs near splicing junctions and the end of UTRs, as well as cleaner patterns of exon–intron distribution for expression tags and cross-species conservation scores. Strikingly, 1292 exon–intron boundary revisions between coding exons corrected the previously mis-annotated open reading frames. The revised gene models were experimentally verified in randomly selected cases. We further integrated functional genomics annotations from >60 categories of public and in-house resources and developed an online accessible database. User-friendly interfaces were developed to update, retrieve, visualize and download the RhesusBase meta-data, providing a ‘one-stop’ resource for the monkey research community.

收藏 IP: 159.226.43.*| 热度|


该博文允许注册用户评论 请点击登录 评论 (0 个评论)


Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2023-9-30 23:37

Powered by

Copyright © 2007- 中国科学报社