||
支持python3,可用来统计基因组上的gap位置,gap大小。输出格式是gff3。当然根据需要修改print那一行可以输出bed等格式
使用格式如下:
python3 getgaps.py genome.fasta > gaps.gff3
#!/usr/bin/env python3
# this code was changed based on biostar "https://www.biostars.org/p/152592/"
# Import necessary packages
import argparse
import re
from Bio import SeqIO
# Parse command-line arguments
parser = argparse.ArgumentParser()
parser.add_argument("fasta")
args = parser.parse_args()
# Open FASTA, search for masked regions, print in GFF3 format
with open(args.fasta) as handle:
i = 0
for record in SeqIO.parse(handle, "fasta"):
for match in re.finditer('N+', str(record.seq)):
i = i+1
print (record.id, ".", "gap", match.start() + 1, match.end(), ".", ".", ".", "Name=gap" + str(i) + ";size=" + str(match.end()-match.start()), sep='\t')
#use the following at CMD: FILENAME.py FILENAME.fasta >> FILENAME.gff3 here
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-12-2 15:50
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社