TickingClock的个人博客分享 http://blog.sciencenet.cn/u/TickingClock

博文

Nature Genetics:通过捕获长read测序高通量注释全长lncRNA

已有 3784 次阅读 2017-11-8 08:45 |个人分类:每日摘要|系统分类:论文交流

High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing


First author:Julien Lagarde; Affiliations: Centre for Genomic Regulation (基因组调控中心,CRG): Barcelona, Spain

Corresponding author: Roderic Guigo & Rory Johnson


Accurate (精确的) annotation of genes and their transcripts is a foundation of genomics, but currently no annotation technique combines throughput (吞吐量) and accuracy. As a result, reference gene collections remain incomplete—many gene models are fragmentary (片断的), and thousands more remain uncataloged (未列入目录的), particularly for long noncoding RNAs (lncRNAs). To accelerate lncRNA annotation, the GENCODE consortium (共同体) has developed RNA Capture Long Seq (CLS), which combines targeted RNA capture with third-generation long-read sequencing. Here we present an experimental (试验性的) reannotation of the GENCODE intergenic (基因间的) lncRNA populations in matched human and mouse tissues that resulted in novel transcript models for 3,574 and 561 gene loci, respectively. CLS approximately doubled the annotated complexity of targeted loci, outperforming (胜过) existing short-read techniques. Full-length transcript models produced by CLS enabled us to definitively (最后地) characterize the genomic features of lncRNAs, including promoter and gene structure, and protein-coding potential. Thus, CLS removes a long-standing bottleneck (瓶颈) in transcriptome annotation and generates manual-quality full-length transcript models at high-throughput scales.




基因及其转录产物的精确注释是基因组学的基础,然而现在的技术并不能把通量和精确性很好地结合到一起。结果就会导致目前的参考基因集存在很多不完整的地方,许多基因的模型是片段化的,还有数千个基因甚至未能归类,尤其是长链非编码RNA(lncRNA)。为了促进lncRNA的注释,GENCODE共同体开发了一套叫做RNA捕获长测序技术(RNA Capture Long Seq, CLS),该技术把定向RNA捕获技术和三代长read测序技术结合在了一起。本文对人类和小鼠组织GENCODE基因间的lncRNA进行了试验性的重注释,结果分别发现了3,574和561基因位点的新转录本。CLS技术大约加倍了之前靶位点的注释完整性,超过了目前已有的所有短reads测序技术。通过CLS技术产生的全长转录本模型使得我们能够最终描述lncRNA的基因组特性,包括启动子和基因结构,甚至是潜在的蛋白编码。因此,CLS技术将长时间挡在转录组注释中的瓶颈给移除掉了,能够在高通量的基础上得到手工级别质量的全长转录本模型。



通讯Roderic Guigo (http://www.crg.eu/en/roderic_guigo)


个人简介:1988年,巴塞罗那大学,统计学博士;1988-1993年,哈佛大学Dana Farber癌症研究所分子生物学计算机研究资源部,博士后研究员;1994-至今,巴塞罗那IMIM研究员;2001-至今,庞培法布拉大学副教授


研究方向:基因预测,基因结构进化,系统发育重建等。



doi: 10.1038/ng.3988


Journal: Nature Genetics
Published online: November 06, 2017.

P.S. 欢迎关注微信公众号:微信号Plant_Frontiers


https://blog.sciencenet.cn/blog-3158122-1084257.html

上一篇:Plos Genetics:AtHKT1驱动拟南芥对盐环境的适应
下一篇:Plant Cell:纳米孔测序应用于植物基因组(In brief)
收藏 IP: 36.152.27.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-4-27 04:28

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部