mashengwei的个人博客分享 http://blog.sciencenet.cn/u/mashengwei

博文

从较大的fasta文件中提取部分序列

已有 5541 次阅读 2016-4-7 23:34 |系统分类:科研笔记| Python, 提取, fasta

#! /usr/bin/env python

'''

contact: shengweima@icloud.com

  usuage: python choose_seq.py test.fa id.txt output.fa

'''

print ("usuage: python choose_seq.py test.fa id.txt output.fa")

import sys

import os

from Bio import SeqIO

f1 = os.path.abspath(sys.argv[1])

f2 = os.path.abspath(sys.argv[2])

f3 = os.path.abspath(sys.argv[3])

input_file = f1

id_file = f2

output_file = f3

wanted = set(line.rstrip("n").split(None,1)[0] for line in open(id_file))

print("Found %i unique identifiers in %s" % (len(wanted), id_file))

records = (r for r in SeqIO.parse(input_file, "fasta") if r.id in wanted)

count = SeqIO.write(records, output_file, "fasta")

print("Saved %i records from %s to %s" % (count, input_file, output_file))

if count < len(wanted):

   print("Warning %i IDs not found in %s" % (len(wanted)-count, input_file))




https://blog.sciencenet.cn/blog-1094241-968584.html

上一篇:ReadCube-“好的”文献查新管理软件
下一篇:python之excel的读写以及发现共同的行
收藏 IP: 218.2.227.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-7-8 09:25

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部