|||
How to covert snp file from MUMmer into VCF format
By Liangjiao Xue
MUMmer is a traditional software to compare two genomes or assemblies. One SNP table could be generated with show-snps tool, which also includes Indel information. Generally, VCF format is more popular currently. So, we need a tool to do the conversion between two formats.
Something need to be considered during the conversion to get the correct converted VCF files from MUMmer/snps:
1) You need to check the reference sequence to rebuild insertion and deletion.
Instead of reading original reference fasta file, I used "show-snps -x 1", so that the surrounding nucleotides are also reported.
2) For the insertions, if the query sequences are reversely mapped to the references, the orders of nucleotides in query sequence are reversely reported.
So, they needed to be concatenated in reverse order.
3) The coordinates of insertion and deletions.
For insertions, the coordinates in MUMmer/snps are the coordinates of nucleotides before insertions. They need to be kept as the same in VCF files.
For deletions, the coordinates in MUMmer/snps are of the nucleotides that are deleted. The coordinates in VCF should be : first_position_of_deletion_block - 1.
Here is my Python code:
https://github.com/liangjiaoxue/PythonNGSTools/blob/master/MUMmerSNPs2VCF.py
These notes of this code is also listed here:
https://github.com/liangjiaoxue/PythonNGSTools
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-11-23 12:55
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社