Ivy286的个人博客分享 http://blog.sciencenet.cn/u/Ivy286

博文

​Blast+安装和简单使用

已有 15048 次阅读 2017-12-6 16:14 |个人分类:CADD|系统分类:科研笔记

Blast+安装和简单使用(Ubuntu)

一、安装

可以sudo apt-get install blastp/blastn

也可以通过下载软件包安装(可参看https://www.ncbi.nlm.nih.gov/books/NBK52640/)

1. 下载安装包ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/

2. tar zxvpf ncbi-blast+2.29-x64-linux.tar.gz

3. 设置环境变量

export PATH=$PATH:$HOME/ncbi-blast-2.2.29+/bin

4. mkdir $HOME/blastdb 建立本地数据库

需要下载数据库并其放入该文件夹

下载数据库:https://ftp.ncbi.nlm.nih.gov/blast/db/

找到refseq_rna.00.tar.gz,并下载解压缩,将其放到blastdb文件夹中


在终端输入blastp或blastn来查看是否安装成功。

blastp -help查看使用说明


二、使用

1.格式化数据库

makeblastdb -in db.fasta -dbtype prot -parse_seqids -out dbname
-in:待格式化的序列文件
-dbtype:数据库类型,prot或nucl
-out:数据库名

2. 序列比对

blastp -query test.fasta -db ./iedb/iedb -outfmt 5 -out "test.blastp@iedb.xml" -evalue 0.00001 -max_target_seqs 5 -num_threads 8

-query:用来查询的输入序列

-db: 指定blast搜索用的数据库

-out:输出结果文件

-evalue: 设置e值cutoff

-max_target_seqs:设置最多的目标序列匹配数(不确定)

-num_threads:指定多少个cpu运行任务

-outfmt 控制输出格式

-outfmt <String>
  alignment view options:
    0 = pairwise,
    1 = query-anchored showing identities,
    2 = query-anchored no identities,
    3 = flat query-anchored, show identities,
    4 = flat query-anchored, no identities,
    5 = XML Blast output,
    6 = tabular,
    7 = tabular with comment lines,
    8 = Text ASN.1,
    9 = Binary ASN.1
   10 = Comma-separated values

  Options 6, 7, and 10 can be additionally configured to produce
  a custom format specified by space delimited format specifiers.
  The supported format specifiers are:
              When not provided, the default value is:
  'qseqid sseqid pident length mismatch gapopen qstart qend sstart send
  evalue bitscore', which is equivalent to the keyword 'std'
  Default = `0'


其他参数可以参考【转】:

-query_loc <String>
  Location on the query sequence (Format: start-stop)
-strand <String, `both', `minus', `plus'>
  Query strand(s) to search against database/subject
  Default = `both'

*** General search options
-task <String, Permissible values: 'blastn' 'blastn-short' 'dc-megablast'
               'megablast' 'vecscreen' >
  Task to execute
  Default = `megablast'
-db <String>
  BLAST database name
   * Incompatible with: subject, subject_loc
-out <File_Out>
  Output file name
  Default = `-'
-evalue <Real>
  Expectation value (E) threshold for saving hits
  Default = `10'
-word_size <Integer, >=4>
  Word size for wordfinder algorithm (length of best perfect match)
-gapopen <Integer>
  Cost to open a gap
-gapextend <Integer>
  Cost to extend a gap
-penalty <Integer, <=0>
  Penalty for a nucleotide mismatch
-reward <Integer, >=0>
  Reward for a nucleotide match
-use_index <Boolean>
  Use MegaBLAST database index
-index_name <String>
  MegaBLAST database index name

*** BLAST-2-Sequences options
-subject <File_In>
  Subject sequence(s) to search
   * Incompatible with: db, gilist, negative_gilist, db_soft_mask
-subject_loc <String>
  Location on the subject sequence (Format: start-stop)
   * Incompatible with: db, gilist, negative_gilist, db_soft_mask, remote

*** Formatting options
-outfmt <String>
  alignment view options:
    0 = pairwise,
    1 = query-anchored showing identities,
    2 = query-anchored no identities,
    3 = flat query-anchored, show identities,
    4 = flat query-anchored, no identities,
    5 = XML Blast output,
    6 = tabular,
    7 = tabular with comment lines,
    8 = Text ASN.1,
    9 = Binary ASN.1
   10 = Comma-separated values

  Options 6, 7, and 10 can be additionally configured to produce
  a custom format specified by space delimited format specifiers.
  The supported format specifiers are:
           qseqid means Query Seq-id
              qgi means Query GI
             qacc means Query accesion
           sseqid means Subject Seq-id
        sallseqid means All subject Seq-id(s), separated by a ';'
              sgi means Subject GI
           sallgi means All subject GIs
             sacc means Subject accession
          sallacc means All subject accessions
           qstart means Start of alignment in query
             qend means End of alignment in query
           sstart means Start of alignment in subject
             send means End of alignment in subject
             qseq means Aligned part of query sequence
             sseq means Aligned part of subject sequence
           evalue means Expect value
         bitscore means Bit score
            score means Raw score
           length means Alignment length
           pident means Percentage of identical matches
           nident means Number of identical matches
         mismatch means Number of mismatches
         positive means Number of positive-scoring matches
          gapopen means Number of gap openings
             gaps means Total number of gaps
             ppos means Percentage of positive-scoring matches
           frames means Query and subject frames separated by a '/'
           qframe means Query frame
           sframe means Subject frame
  When not provided, the default value is:
  'qseqid sseqid pident length mismatch gapopen qstart qend sstart send
  evalue bitscore', which is equivalent to the keyword 'std'
  Default = `0'
-show_gis
  Show NCBI GIs in deflines?
-num_descriptions <Integer, >=0>
  Number of database sequences to show one-line descriptions for
  Default = `500'
-num_alignments <Integer, >=0>
  Number of database sequences to show alignments for
  Default = `250'
-html
  Produce HTML output?

*** Query filtering options
-dust <String>
  Filter query sequence with DUST (Format: 'yes', 'level window linker', or
  'no' to disable)
  Default = `20 64 1'
-filtering_db <String>
  BLAST database containing filtering elements (i.e.: repeats)
-window_masker_taxid <Integer>
  Enable WindowMasker filtering using a Taxonomic ID
-window_masker_db <String>
  Enable WindowMasker filtering using this repeats database.
-soft_masking <Boolean>
  Apply filtering locations as soft masks
  Default = `true'
-lcase_masking
  Use lower case filtering in query and subject sequence(s)?

*** Restrict search or results
-gilist <String>
  Restrict search of database to list of GI's
   * Incompatible with: negative_gilist, remote, subject, subject_loc
-negative_gilist <String>
  Restrict search of database to everything except the listed GIs
   * Incompatible with: gilist, remote, subject, subject_loc
-entrez_query <String>
  Restrict search with the given Entrez query
   * Requires: remote
-db_soft_mask <Integer>
  Filtering algorithm ID to apply to the BLAST database as soft masking
   * Incompatible with: subject, subject_loc
-perc_identity <Real, 0..100>
  Percent identity
-culling_limit <Integer, >=0>
  If the query range of a hit is enveloped by that of at least this many
  higher-scoring hits, delete the hit
   * Incompatible with: best_hit_overhang, best_hit_score_edge
-best_hit_overhang <Real, (>=0 and =<0.5)>
  Best Hit algorithm overhang value (recommended value: 0.1)
   * Incompatible with: culling_limit
-best_hit_score_edge <Real, (>=0 and =<0.5)>
  Best Hit algorithm score edge value (recommended value: 0.1)
   * Incompatible with: culling_limit
-max_target_seqs <Integer, >=1>
  Maximum number of aligned sequences to keep

*** Discontiguous MegaBLAST options
-template_type <String, `coding', `coding_and_optimal', `optimal'>
  Discontiguous MegaBLAST template type
   * Requires: template_length
-template_length <Integer, Permissible values: '16' '18' '21' >
  Discontiguous MegaBLAST template length
   * Requires: template_type

*** Statistical options
-dbsize <Int8>
  Effective length of the database
-searchsp <Int8, >=0>
  Effective length of the search space

*** Search strategy options
-import_search_strategy <File_In>
  Search strategy to use
   * Incompatible with: export_search_strategy
-export_search_strategy <File_Out>
  File name to record the search strategy used
   * Incompatible with: import_search_strategy

*** Extension options
-xdrop_ungap <Real>
  X-dropoff value (in bits) for ungapped extensions
-xdrop_gap <Real>
  X-dropoff value (in bits) for preliminary gapped extensions
-xdrop_gap_final <Real>
  X-dropoff value (in bits) for final gapped alignment
-no_greedy
  Use non-greedy dynamic programming extension
-min_raw_gapped_score <Integer>
  Minimum raw gapped score to keep an alignment in the preliminary gapped and
  traceback stages
-ungapped
  Perform ungapped alignment only?
-window_size <Integer, >=0>
  Multiple hits window size, use 0 to specify 1-hit algorithm
-off_diagonal_range <Integer, >=0>
  Number of off-diagonals to search for the 2nd hit, use 0 to turn off
  Default = `0'

*** Miscellaneous options
-parse_deflines
  Should the query and subject defline(s) be parsed?
-num_threads <Integer, >=1>
  Number of threads to use in the BLAST search
  Default = `1'
   * Incompatible with: remote
-remote
  Execute search remotely?
   * Incompatible with: gilist, negative_gilist, subject_loc, num_threads



https://blog.sciencenet.cn/blog-2506040-1088509.html

上一篇:autodock使用
收藏 IP: 49.5.0.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-10-19 22:28

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部