||
High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing
Corresponding author: Roderic Guigo & Rory Johnson
Accurate (精确的) annotation of genes and their transcripts is a foundation of genomics, but currently no annotation technique combines throughput (吞吐量) and accuracy. As a result, reference gene collections remain incomplete—many gene models are fragmentary (片断的), and thousands more remain uncataloged (未列入目录的), particularly for long noncoding RNAs (lncRNAs). To accelerate lncRNA annotation, the GENCODE consortium (共同体) has developed RNA Capture Long Seq (CLS), which combines targeted RNA capture with third-generation long-read sequencing. Here we present an experimental (试验性的) reannotation of the GENCODE intergenic (基因间的) lncRNA populations in matched human and mouse tissues that resulted in novel transcript models for 3,574 and 561 gene loci, respectively. CLS approximately doubled the annotated complexity of targeted loci, outperforming (胜过) existing short-read techniques. Full-length transcript models produced by CLS enabled us to definitively (最后地) characterize the genomic features of lncRNAs, including promoter and gene structure, and protein-coding potential. Thus, CLS removes a long-standing bottleneck (瓶颈) in transcriptome annotation and generates manual-quality full-length transcript models at high-throughput scales.
基因及其转录产物的精确注释是基因组学的基础,然而现在的技术并不能把通量和精确性很好地结合到一起。结果就会导致目前的参考基因集存在很多不完整的地方,许多基因的模型是片段化的,还有数千个基因甚至未能归类,尤其是长链非编码RNA(lncRNA)。为了促进lncRNA的注释,GENCODE共同体开发了一套叫做RNA捕获长测序技术(RNA Capture Long Seq, CLS),该技术把定向RNA捕获技术和三代长read测序技术结合在了一起。本文对人类和小鼠组织GENCODE基因间的lncRNA进行了试验性的重注释,结果分别发现了3,574和561基因位点的新转录本。CLS技术大约加倍了之前靶位点的注释完整性,超过了目前已有的所有短reads测序技术。通过CLS技术产生的全长转录本模型使得我们能够最终描述lncRNA的基因组特性,包括启动子和基因结构,甚至是潜在的蛋白编码。因此,CLS技术将长时间挡在转录组注释中的瓶颈给移除掉了,能够在高通量的基础上得到手工级别质量的全长转录本模型。
个人简介:1988年,巴塞罗那大学,统计学博士;1988-1993年,哈佛大学Dana Farber癌症研究所分子生物学计算机研究资源部,博士后研究员;1994-至今,巴塞罗那IMIM研究员;2001-至今,庞培法布拉大学副教授
研究方向:基因预测,基因结构进化,系统发育重建等。
doi: 10.1038/ng.3988
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-9-27 09:39
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社