不积小流 , 无以成江海 !分享 http://blog.sciencenet.cn/u/xiongchaoliang

博文

图解SOAPdenovo拼接过程

已有 3445 次阅读 2014-11-30 00:10 |个人分类:【转录组-mRNA分析】|系统分类:科研笔记

我们都知道,测序本身并不难,难就难在基因组的后续组装拼接,因为它涉及到大量需要考虑的问题,如重复、倒位、覆盖率等等,于是如何有效的得到最后的序列或者有意义的Scaffold是做基因组面临的一个很大问题。不同的人去做会得到不同的结果,如N50、N90,scaffold数量等等。

下面简单介绍一下SOAPdenovo组装的一般过程:


 

Schematic overview of the assembly algorithm.
(A) Genomic DNA was fragmented randomly and sequenced using paired-end technology.

Short clones with sizes between 150 and 500 bp were amplifiedand sequenced directly;

while long range (2–10 kb) paired-end libraries were constructed by circularizing DNA, fragmentation, and then

purifying fragments with sizes in the range of 400–600 bp for cluster formation.


(B) The raw or precorrected reads were then loaded into computer memory and de Bruijn graph data structure was used to represent the overlap among the reads.

(C) The graph was simplified by removing erroneous connections (in red color on the graph) and solving tiny repeats by readpath:

(i) Clipping the short tips,

(ii) removing low-coverage links,(iii) solving tiny repeats by read path, and

(iv) merging the bubbles thatwere caused by repeats or heterozygotes of diploid chromosomes.

(D) On the simplified graph, we broke the connections at repeat boundaries and output the unambiguous sequence fragments as contigs.

(E)We realigned the reads onto the contigs and used the paired-end information to join the unique contigs into scaffolds.

(F) Finally, we filled in the intrascaffold gaps,which were most likely comprised by repeats, using the paired-end extracted reads.



https://blog.sciencenet.cn/blog-1509670-847327.html

上一篇:基于NGS的miRNA测序以及接头序列介绍
下一篇:bed 转 bam/sam ?
收藏 IP: 159.226.43.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-5-8 09:14

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部