ljxue的个人博客分享 http://blog.sciencenet.cn/u/ljxue Liangjiao Xue, Bioinformatics is my favorite.

博文

How to covert snp file from MUMmer into VCF format

已有 8309 次阅读 2016-9-1 05:47 |个人分类:Bioinformatics|系统分类:科研笔记| Python, MUMmer, VCF

How to covert snp file from MUMmer into VCF format

By Liangjiao Xue


MUMmer is a traditional software to compare two genomes or assemblies. One SNP table could be generated with show-snps tool, which also includes Indel information. Generally, VCF format is more popular currently. So, we need a tool to do the conversion between two formats.


Something need to be considered during the conversion to get the correct converted VCF files from MUMmer/snps:


1) You need to check the reference sequence to rebuild insertion and deletion.
Instead of reading original reference fasta file, I used "show-snps -x 1", so that the surrounding nucleotides are also reported.


2) For the insertions, if the query sequences are reversely mapped to the references, the orders of nucleotides in query sequence are reversely reported.
So, they needed to be concatenated in reverse order.


3) The coordinates of insertion and deletions.
For insertions, the coordinates in MUMmer/snps are the coordinates of nucleotides before insertions. They need to be kept as the same in VCF files.


For deletions, the coordinates in MUMmer/snps are of the nucleotides that are deleted. The coordinates in VCF should be : first_position_of_deletion_block - 1.


Here is my Python code:

https://github.com/liangjiaoxue/PythonNGSTools/blob/master/MUMmerSNPs2VCF.py



These notes of this code is also listed here:

https://github.com/liangjiaoxue/PythonNGSTools




https://blog.sciencenet.cn/blog-285393-1000040.html

上一篇:第三代测序的组装方法(PacBio)
下一篇:PacBio数据处理中chemistry 组合错误
收藏 IP: 128.192.8.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-11-23 12:55

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部