||
这是另一个snakemake的案例, 之前介绍过通过简单的方法, 使用snakemake, 这里我们用另一个案例, 看看snakemake的用法.
这里需要时python3, 不支持python2
pip3 install --user snakemake pyaml
这里, 我们新建两个配对的RNA-seq数据, 格式是FASTQ的文件, 然后经过下面两步处理:
创建文件
touch genome.fa
# Make some fake data:
mkdir fastq
touch fastq/Sample1.R1.fastq.gz fastq/Sample1.R2.fastq.gz
touch fastq/Sample2.R1.fastq.gz fastq/Sample2.R2.fastq.gz
创建结果, 使用tree查看:
(base) [dengfei@localhost test]$ tree
.
├── fastq
│ ├── Sample1.R1.fastq.gz
│ ├── Sample1.R2.fastq.gz
│ ├── Sample2.R1.fastq.gz
│ └── Sample2.R2.fastq.gz
└── genome.fa
1 directory, 5 files
将下面代码命名为Snakemake
SAMPLES = ['Sample1', 'Sample2']
rule all:
input:
expand('{sample}.txt', sample=SAMPLES)
rule quantify_genes:
input:
genome = 'genome.fa',
r1 = 'fastq/{sample}.R1.fastq.gz',
r2 = 'fastq/{sample}.R2.fastq.gz'
output:
'{sample}.txt'
shell:
'echo {input.genome} {input.r1} {input.r2} > {output}'
我们下面进行代码的讲解:
SAMPLE
的数组:SAMPLES = ['Sample1', 'Sample2']
数组, SAMPLES
,里面有两个元素: Sample1和Sample2expand
函数, 能够将数组的内容解析给{sample}
rule all:
input:
expand('{sample}.txt', sample=SAMPLES)
quantify_genes
, 里面有input
, output
, shell
, 其中{sample}
是用的rule all
里面的namerule quantify_genes:
input:
genome = 'genome.fa',
r1 = 'fastq/{sample}.R1.fastq.gz',
r2 = 'fastq/{sample}.R2.fastq.gz'
output:
'{sample}.txt'
shell:
'echo {input.genome} {input.r1} {input.r2} > {output}'
snakemake -np
参数介绍
-n 或者—dryrun, 表示只生成命令, 但是不执行命令, 可以预览一下生成的命令.
--dryrun, -n Do not execute anything, and display what would be
done. If you have a very large workflow, use --dryrun
--quiet to just print a summary of the DAG of jobs.
-p 或者—printshellcmds, 表示将生成的shell打印出来
--printshellcmds, -p Print out the shell commands that will be executed.
注意:
-n 不执行, 只打印命令
-p 执行, 同时打印命令(shell)
两者执行的前提是结果文件还没有生成.
例子:
(snake_test) [dengfei@localhost ex2]$ snakemake -np
Building DAG of jobs...
Job counts:
count jobs
1 all
2 quantify_genes
3
[Tue Apr 2 13:49:34 2019]
rule quantify_genes:
input: genome.fa, fastq/Sample1.R1.fastq.gz, fastq/Sample1.R2.fastq.gz
output: Sample1.txt
jobid: 1
wildcards: sample=Sample1
echo genome.fa fastq/Sample1.R1.fastq.gz fastq/Sample1.R2.fastq.gz > Sample1.txt
[Tue Apr 2 13:49:34 2019]
rule quantify_genes:
input: genome.fa, fastq/Sample2.R1.fastq.gz, fastq/Sample2.R2.fastq.gz
output: Sample2.txt
jobid: 2
wildcards: sample=Sample2
echo genome.fa fastq/Sample2.R1.fastq.gz fastq/Sample2.R2.fastq.gz > Sample2.txt
[Tue Apr 2 13:49:34 2019]
localrule all:
input: Sample1.txt, Sample2.txt
jobid: 0
Job counts:
count jobs
1 all
2 quantify_genes
3
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-12-26 21:55
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社