||
第一作者:Shujun Ou
第一单位:美国爱荷华州立大学
通讯作者:Doreen Ware
Abstract
背景回顾:Improvements in long-read data and scaffolding technologies have enabled rapid generation of reference-quality assemblies for complex genomes.
提出问题:Still, an assessment of critical sequence depth and read length is important for allocating limited resources.
解决方法:To this end, we have generated eight assemblies for the complex genome of the maize inbred line NC358 using PacBio datasets ranging from 20 to 75 × genomic depth and with N50 subread lengths of 11–21 kb.
具体结果:Assemblies with ≤30 × depth and N50 subread length of 11 kb are highly fragmented, with even low-copy genic regions showing degradation at 20 × depth. Distinct sequence-quality thresholds are observed for complete assembly of genes, transposable elements, and highly repetitive genomic features such as telomeres, heterochromatic knobs, and centromeres. In addition, we show high-quality optical maps can dramatically improve contiguity in even our most fragmented base assembly.
总结:This study provides a useful resource allocation reference to the community as long-read technologies continue to mature.
摘 要
长读长数据和scaffolding技术的提升对于复杂基因组组装质量的提升具有非常大的促进作用。然而,测序深度以及读长长度的评估对于合理配置资源仍然十分重要。为了这个目标,作者对于玉米自交系NC358基因组进行了三代测序,并利用20×到75×不同深度、11kb到21kb不同读长的PacBio测序数据产生了8个组装版本。作者发现用小于30×测序深度、subread N50长度小于11 kb的PacBio数据组装获得的基因组过于片段化。而当测序深度低到20×的时候,甚至低拷贝基因区的组装也十分不理想。进一步的分析发现基因组上基因、转座子元件以及端粒、异染色质纽和着丝粒等等高重复基因组区域序列的“完美”组装需要的测序深度和读长长度条件不一样。另外,作者还发现高质量的光学图谱可以显著提高基因组组装的连续性。在长读长测序技术日渐成熟的今天,本文的研究为进一步资源合理配置提供了一个参考。
Discussion
Recent innovations in long-read and scaffolding technology have made highly contiguous assembly possible across a wide range of species. We have documented how both the completeness and contiguity of assemblies improve with increasing depth and read length. The biological aims of an investigation must be considered when determining the level of investment in depth of sequence. With long-read sequencing, the low-copy gene space (including tandem gene arrays) can be well assembled with as low as 30× genomic coverage across a range of read lengths. Complete characterization of TEs in complex genomes such as maize will require a greater depth of sequence (~40×) and should employ library preparation protocols that maximize read-length N50. Finally, complete assembly of highly repetitive genomic features such as heterochromatic knobs, telomeres, and centromeres will require substantially more data. In fact, complete assembly of these latter highly repetitive sequences will likely require innovations beyond current sequencing technology.
最近有关长读长和scaffolding技术的进展帮助许多物种基因组的组装在连续性上有了较大的提升。作者的研究揭示了测序深度和读长长度的增加是如何提升基因组组装的完整性和连续性的。制定一份研究的生物学目标时必须考虑到序列测序深度的投入。利用长读长测序技术,只需要30×的测序深度,不同的读长长度都可以将包括串联重复基因阵列在内的低拷贝基因区组装得很好。玉米等复杂基因组中对于转座元件的全面注释需要至少40×的测序深度,并且需要优化文库准备流程以最大化读长的N50长度。最后,异染色质纽、端粒及着丝粒等高重复基因组区域的完全组装需要提供更多的数据。事实上,这些高度重复区域序列的完全组装需要的是现有测序技术的突破。
通讯作者
**Doreen Ware**
个人简介:
加州大学圣迭戈分校,学士;
俄亥俄州立大学,博士;
冷泉港实验室,博士后。
研究方向:植物基因组演化和作物遗传改良。
doi: 10.1038/s41467-020-16037-7
Journal: Nature Communications
Published date: May 08, 2020
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-11-15 08:20
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社