李雷廷的个人博客分享 http://blog.sciencenet.cn/u/llt001

博文

二代深度测序结合三代测序获得了N50为 12.8Mb 的拟南芥Ler基因组

已有 7243 次阅读 2016-7-10 10:48 |系统分类:论文交流

2016年6月27日,PNAS期刊在线发表了《Chromosome-level assembly of Arabidopsis thaliana Ler reveals the extent of translocation and inversion polymorphisms》的研究文章。


尽管世界上广泛报道、解析了各种类型的拟南芥基因组序列,但这些重建的基因组中大多数是基于短 DNA 测序序列与一个参考基因组的比较,而不是独立构建的。这种方法对基因组差异的理解限制为局部的、多数为小范围的结构变异,因为大的重组会被现在的方法所忽略。这篇论文利用从头组装的方法获得了一个常用的拟南芥品系 Landsberg erecta (Ler) 的基因组,并揭示了数百个重组区域。其中一些区域会抑制减数分裂重组,影响着一个世界范围拟南芥群体的单倍型。除了发现的序列差异,这项工作首次比较了两个独立完成的染色体组装水平的拟南芥基因组,并揭示了数百个未知的、生态型特异的基因。


Ler 是仅次于“哥伦比亚”生态型(Col-0)的第二常用的拟南芥品系,它的常用名为Ler-0,是生态型代码 La-1 和一个 erecta 基因突变体的缩写。Ler 的基因组的从头组装并不是在这篇论文中首次报道的,在此之前已经有两篇论文报道过。一篇论文是2011年发表的利用二代测序完成的,N50 为 198 kb;另一篇为 2015 年发表的利用三代测序完成的,N50 为 11.2 Mb。这篇论文报道的 Ler 基因组是基于新的二代测序数据和已经发表的二代、三代测序数据,整合在一起完成的。


总的来说,这篇论文指出了植物重测序的一种新策略,即通过从头组装的方法去揭示基因组间的结构变异。但无疑,从头组装的方法相比于目前常用的低覆盖度重测序方法成本要高很多,难以大规模应用。


标题:Chromosome-level assembly of Arabidopsis thaliana Lerreveals the extent of translocation and inversion polymorphisms

doi:10.1073/pnas.1607532113

重要性:

Despite widespread reports on deciphering the sequences of all kinds of genomes, most of these reconstructed genomes rely on a comparison of short DNA sequencing reads to a reference sequence, rather than being independently reconstructed. This method limits the insights on genomic differences to local, mostly small-scale variation, because large rearrangements are likely overlooked by current methods. We have de novo assembled the genome of a common strain of Arabidopsis thalianaLandsberg erecta and revealed hundreds of rearranged regions. Some of these differences suppress meiotic recombination, impacting the haplotypes of a worldwide population of A. thaliana. In addition to sequence changes, this work, which, to our knowledge is the first comparison of an independent, chromosome-level assembled A. thaliana genome, revealed hundreds of unknown, accession-specific genes.

摘要:

Resequencing or reference-based assemblies reveal large parts of the small-scale sequence variation. However, they typically fail to separate such local variation into colinear and rearranged variation, because they usually do not recover the complement of large-scale rearrangements, including transpositions and inversions. Besides the availability of hundreds of genomes of diverse Arabidopsis thalianaaccessions, there is so far only one full-length assembled genome: the reference sequence. We have assembled 117 Mb of theA. thalianaLandsberg erecta (Ler) genome into five chromosome-equivalent sequences using a combination of short Illumina reads, long PacBio reads, and linkage information. Whole-genome comparison against the reference sequence revealed 564 transpositions and 47 inversions comprising ~3.6 Mb, in addition to 4.1 Mb of nonreference sequence, mostly originating from duplications. Although rearranged regions are not different in local divergence from colinear regions, they are drastically depleted for meiotic recombination in heterozygotes. Using a 1.2-Mb inversion as an example, we show that such rearrangement-mediated reduction of meiotic recombination can lead to genetically isolated haplotypes in the worldwide population of A. thaliana. Moreover, we found 105 single-copy genes, which were only present in the reference sequence or the Ler assembly, and 334 single-copy orthologs, which showed an additional copy in only one of the genomes. To our knowledge, this work gives first insights into the degree and type of variation, which will be revealed once complete assemblies will replace resequencing or other reference-dependent methods.

关注“植物基因组”微信公众号请加微信号:plant-genomes或扫描二维码






https://blog.sciencenet.cn/blog-656335-989733.html

上一篇:南半球海草基因组论文发表于《Plant Physiology》
下一篇:物种树与基因树不一致:拟南芥属所有物种基因组重测序
收藏 IP: 221.178.200.*| 热度|

2 蔡小宁 gaoshannankai

该博文允许注册用户评论 请点击登录 评论 (1 个评论)

数据加载中...
扫一扫,分享此博文

全部作者的精选博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-4-19 03:47

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部