||
Blast+安装和简单使用(Ubuntu)
一、安装
可以sudo apt-get install blastp/blastn
也可以通过下载软件包安装(可参看https://www.ncbi.nlm.nih.gov/books/NBK52640/)
1. 下载安装包ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/
2. tar zxvpf ncbi-blast+2.29-x64-linux.tar.gz
3. 设置环境变量
export PATH=$PATH:$HOME/ncbi-blast-2.2.29+/bin
4. mkdir $HOME/blastdb 建立本地数据库
需要下载数据库并其放入该文件夹
下载数据库:https://ftp.ncbi.nlm.nih.gov/blast/db/
找到refseq_rna.00.tar.gz,并下载解压缩,将其放到blastdb文件夹中
在终端输入blastp或blastn来查看是否安装成功。
blastp -help查看使用说明
二、使用
1.格式化数据库
makeblastdb -in db.fasta -dbtype prot -parse_seqids -out dbname
-in:待格式化的序列文件
-dbtype:数据库类型,prot或nucl
-out:数据库名
2. 序列比对
blastp -query test.fasta -db ./iedb/iedb -outfmt 5 -out "test.blastp@iedb.xml" -evalue 0.00001 -max_target_seqs 5 -num_threads 8
-query:用来查询的输入序列
-db: 指定blast搜索用的数据库
-out:输出结果文件
-evalue: 设置e值cutoff
-max_target_seqs:设置最多的目标序列匹配数(不确定)
-num_threads:指定多少个cpu运行任务
-outfmt 控制输出格式
-outfmt <String>
alignment view options:
0 = pairwise,
1 = query-anchored showing identities,
2 = query-anchored no identities,
3 = flat query-anchored, show identities,
4 = flat query-anchored, no identities,
5 = XML Blast output,
6 = tabular,
7 = tabular with comment lines,
8 = Text ASN.1,
9 = Binary ASN.1
10 = Comma-separated values
Options 6, 7, and 10 can be additionally configured to produce
a custom format specified by space delimited format specifiers.
The supported format specifiers are:
When not provided, the default value is:
'qseqid sseqid pident length mismatch gapopen qstart qend sstart send
evalue bitscore', which is equivalent to the keyword 'std'
Default = `0'
其他参数可以参考【转】:
-query_loc <String>
Location on the query sequence (Format: start-stop)
-strand <String, `both', `minus', `plus'>
Query strand(s) to search against database/subject
Default = `both'
*** General search options
-task <String, Permissible values: 'blastn' 'blastn-short' 'dc-megablast'
'megablast' 'vecscreen' >
Task to execute
Default = `megablast'
-db <String>
BLAST database name
* Incompatible with: subject, subject_loc
-out <File_Out>
Output file name
Default = `-'
-evalue <Real>
Expectation value (E) threshold for saving hits
Default = `10'
-word_size <Integer, >=4>
Word size for wordfinder algorithm (length of best perfect match)
-gapopen <Integer>
Cost to open a gap
-gapextend <Integer>
Cost to extend a gap
-penalty <Integer, <=0>
Penalty for a nucleotide mismatch
-reward <Integer, >=0>
Reward for a nucleotide match
-use_index <Boolean>
Use MegaBLAST database index
-index_name <String>
MegaBLAST database index name
*** BLAST-2-Sequences options
-subject <File_In>
Subject sequence(s) to search
* Incompatible with: db, gilist, negative_gilist, db_soft_mask
-subject_loc <String>
Location on the subject sequence (Format: start-stop)
* Incompatible with: db, gilist, negative_gilist, db_soft_mask, remote
*** Formatting options
-outfmt <String>
alignment view options:
0 = pairwise,
1 = query-anchored showing identities,
2 = query-anchored no identities,
3 = flat query-anchored, show identities,
4 = flat query-anchored, no identities,
5 = XML Blast output,
6 = tabular,
7 = tabular with comment lines,
8 = Text ASN.1,
9 = Binary ASN.1
10 = Comma-separated values
Options 6, 7, and 10 can be additionally configured to produce
a custom format specified by space delimited format specifiers.
The supported format specifiers are:
qseqid means Query Seq-id
qgi means Query GI
qacc means Query accesion
sseqid means Subject Seq-id
sallseqid means All subject Seq-id(s), separated by a ';'
sgi means Subject GI
sallgi means All subject GIs
sacc means Subject accession
sallacc means All subject accessions
qstart means Start of alignment in query
qend means End of alignment in query
sstart means Start of alignment in subject
send means End of alignment in subject
qseq means Aligned part of query sequence
sseq means Aligned part of subject sequence
evalue means Expect value
bitscore means Bit score
score means Raw score
length means Alignment length
pident means Percentage of identical matches
nident means Number of identical matches
mismatch means Number of mismatches
positive means Number of positive-scoring matches
gapopen means Number of gap openings
gaps means Total number of gaps
ppos means Percentage of positive-scoring matches
frames means Query and subject frames separated by a '/'
qframe means Query frame
sframe means Subject frame
When not provided, the default value is:
'qseqid sseqid pident length mismatch gapopen qstart qend sstart send
evalue bitscore', which is equivalent to the keyword 'std'
Default = `0'
-show_gis
Show NCBI GIs in deflines?
-num_descriptions <Integer, >=0>
Number of database sequences to show one-line descriptions for
Default = `500'
-num_alignments <Integer, >=0>
Number of database sequences to show alignments for
Default = `250'
-html
Produce HTML output?
*** Query filtering options
-dust <String>
Filter query sequence with DUST (Format: 'yes', 'level window linker', or
'no' to disable)
Default = `20 64 1'
-filtering_db <String>
BLAST database containing filtering elements (i.e.: repeats)
-window_masker_taxid <Integer>
Enable WindowMasker filtering using a Taxonomic ID
-window_masker_db <String>
Enable WindowMasker filtering using this repeats database.
-soft_masking <Boolean>
Apply filtering locations as soft masks
Default = `true'
-lcase_masking
Use lower case filtering in query and subject sequence(s)?
*** Restrict search or results
-gilist <String>
Restrict search of database to list of GI's
* Incompatible with: negative_gilist, remote, subject, subject_loc
-negative_gilist <String>
Restrict search of database to everything except the listed GIs
* Incompatible with: gilist, remote, subject, subject_loc
-entrez_query <String>
Restrict search with the given Entrez query
* Requires: remote
-db_soft_mask <Integer>
Filtering algorithm ID to apply to the BLAST database as soft masking
* Incompatible with: subject, subject_loc
-perc_identity <Real, 0..100>
Percent identity
-culling_limit <Integer, >=0>
If the query range of a hit is enveloped by that of at least this many
higher-scoring hits, delete the hit
* Incompatible with: best_hit_overhang, best_hit_score_edge
-best_hit_overhang <Real, (>=0 and =<0.5)>
Best Hit algorithm overhang value (recommended value: 0.1)
* Incompatible with: culling_limit
-best_hit_score_edge <Real, (>=0 and =<0.5)>
Best Hit algorithm score edge value (recommended value: 0.1)
* Incompatible with: culling_limit
-max_target_seqs <Integer, >=1>
Maximum number of aligned sequences to keep
*** Discontiguous MegaBLAST options
-template_type <String, `coding', `coding_and_optimal', `optimal'>
Discontiguous MegaBLAST template type
* Requires: template_length
-template_length <Integer, Permissible values: '16' '18' '21' >
Discontiguous MegaBLAST template length
* Requires: template_type
*** Statistical options
-dbsize <Int8>
Effective length of the database
-searchsp <Int8, >=0>
Effective length of the search space
*** Search strategy options
-import_search_strategy <File_In>
Search strategy to use
* Incompatible with: export_search_strategy
-export_search_strategy <File_Out>
File name to record the search strategy used
* Incompatible with: import_search_strategy
*** Extension options
-xdrop_ungap <Real>
X-dropoff value (in bits) for ungapped extensions
-xdrop_gap <Real>
X-dropoff value (in bits) for preliminary gapped extensions
-xdrop_gap_final <Real>
X-dropoff value (in bits) for final gapped alignment
-no_greedy
Use non-greedy dynamic programming extension
-min_raw_gapped_score <Integer>
Minimum raw gapped score to keep an alignment in the preliminary gapped and
traceback stages
-ungapped
Perform ungapped alignment only?
-window_size <Integer, >=0>
Multiple hits window size, use 0 to specify 1-hit algorithm
-off_diagonal_range <Integer, >=0>
Number of off-diagonals to search for the 2nd hit, use 0 to turn off
Default = `0'
*** Miscellaneous options
-parse_deflines
Should the query and subject defline(s) be parsed?
-num_threads <Integer, >=1>
Number of threads to use in the BLAST search
Default = `1'
* Incompatible with: remote
-remote
Execute search remotely?
* Incompatible with: gilist, negative_gilist, subject_loc, num_threads
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-10-19 22:28
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社