||
(1)输入文件(fasta/fastq)格式
the header needs to have three fields separated by "/", the 2nd files has to be a unique number.The difference two numbers on the third field separated by "_" need to be the same as the sequence length.
(2) .cfg 文件
给了一些example的cfg 文件,目前我用的是如下fc_run.cfg
[General]
input_fofn = input.fofn
input_type = raw
#use_tmpdir = scratch
# length cutoff used for seed reads used for initial mapping (default length was 5000, -1 means determine from genome size and seed coverage)
genome_size = 117000000
seed_coverage = 20
length_cutoff = -1
# length cutoff used for seed reads used for pre-assembly
length_cutoff_pr = 1000
falcon_greedy = False
falcon_sense_greedy=False
# concurrency setting
default_concurrent_jobs = 288
pa_concurrent_jobs = 288
cns_concurrent_jobs = 288
ovlp_concurrent_jobs = 288
# overlapping options for Daligner
pa_HPCdaligner_option = -v -B128 -e0.75 -M24 -l1200 -k14 -h256 -w8 -s100 -t16
ovlp_HPCdaligner_option = -v -B128 -M24 -k24 -h600 -e.95 -l1800 -s100
pa_daligner_option = -e0.75 -l1200 -k14 -h256 -w8 -s100
ovlp_daligner_option = -k24 -h600 -e.95 -l1800 -s100
pa_HPCTANmask_option = -k18 -h480 -w8 -e.8 -s100
pa_HPCREPmask_option = -k18 -h480 -w8 -e.8 -s100
#pa_REPmask_code=1,20;10,15;50,10
pa_DBsplit_option = -x500 -s400
ovlp_DBsplit_option = -s400
# error correction consensus option
falcon_sense_option = --output-multi --min-idt 0.70 --min-cov 4 --max-n-read 200 --n-core 24
# overlap filtering options
overlap_filtering_setting = --max-diff 120 --max-cov 120 --min-cov 2 --n-core 12
# slurm options (says sge but not for rEaLz)
#sge_option_da = -pe smp 5 -q bigmem
#sge_option_la = -pe smp 20 -q bigmem
#sge_option_pda = -pe smp 6 -q bigmem
#sge_option_pla = -pe smp 16 -q bigmem
#sge_option_fc = -pe smp 24 -q bigmem
#sge_option_cns = -pe smp 8 -q bigmem
[job.defaults]
use_tmpdir = true
#job_type = local
#pwatcher_type = fs_based
#job_type = sge
job_name_style = 1
stop_all_jobs_on_failure = true
JOB_QUEUE=default7
submit = bash -c ${JOB_SCRIPT} >| ${JOB_STDOUT} 2>| ${JOB_STDERR}
njobs=16
pwatcher_type = blocking
#job_queue = production
[job.step.da]
NPROC=8
[job.step.pda]
NPROC=8
[job.step.la]
NPROC=2
[job.step.pla]
NPROC=2
[job.step.cns]
NPROC=8
[job.step.asm]
NPROC=24
如下
[General]
max_n_open_files = 1000
[Unzip]
input_fofn= input.fofn
input_bam_fofn= input_bam.fofn
#sge_phasing= -pe smp 12 -q bigmem
#sge_quiver= -pe smp 12 -q sequel-farm
#sge_track_reads= -pe smp 12 -q default
#sge_blasr_aln= -pe smp 24 -q bigmem
#sge_hasm= -pe smp 48 -q bigmem
#unzip_concurrent_jobs = 64
#quiver_concurrent_jobs = 64
#unzip_concurrent_jobs = 12
#quiver_concurrent_jobs = 12
[job.defaults]
NPROC=4
njobs=7
#job_type = SGE
job_type = local
#use_tmpdir = /scratch
pwatcher_type = blocking
job_type = string
submit = bash -C ${CMD} >| ${STDOUT_FILE} 2>| ${STDERR_FILE}
#njobs=120
njobs=8
NPROC=4
#submit = qsub -S /bin/bash -sync y -V \
# -q ${JOB_QUEUE} \
# -N ${JOB_NAME} \
# -o "${JOB_STDOUT}" \
# -e "${JOB_STDERR}" \
# -pe smp ${NPROC} \
# -l h_vmem=${MB}M \
# "${JOB_SCRIPT}"
[job.step.unzip.track_reads]
njobs=1
NPROC=48
[job.step.unzip.blasr_aln]
njobs=2
NPROC=16
[job.step.unzip.phasing]
njobs=16
NPROC=2
[job.step.unzip.hasm]
njobs=1
NPROC=48
[job.step.unzip.quiver]
njobs=2
NPROC=12
(3) 遇见的问题
(3.1)
[ERROR]Task Node(0-rawreads/build) failed with exit-code=1
[ERROR]Some tasks are recently_done but not satisfied: set([Node(0-rawreads/build)])
check /media/bio/yanbo/Kmer2Haplotype/realData/A.thaliana/Pacbio/SRX1715704_sra/Faclon-unzip/0-rawreads/build/run-P0_build_aaaa9d79df733a1e799f10354861351a.bash.stderr
-----------------------------------------------------------------------
+ rm -f raw_reads.db '.raw_reads.*'
#fc_fasta2fasta < my.input.fofn >| fc.fofn
while read fn; do cat ${fn} | python -m falcon_kit.mains.fasta_filter streamed-median - | fasta2DB -v raw_reads -i${fn##*/}; done < my.input.fofn
+ read fn
+ cat /media/bio/yanbo/Kmer2Haplotype/realData/A.thaliana/Pacbio/SRX1715704_sra/relable.fastq
+ python -m falcon_kit.mains.fasta_filter streamed-median -
+ fasta2DB -v raw_reads -irelable.fastq
-------------------------------------------------------------------
may only allow input fasta file
(3.2) no 012822F_reads.fa.
https://github.com/PacificBiosciences/FALCON_unzip/issues/93
remove that contig from the file 3-unzip/reads/ctg_list file
让我们在这个list里面删掉对应的contigs, 让pipeline可以proceed
这个问题是很早之前遇见的,应该是个bug, 现在的版本可能没有了
(3.3) quiver fail
You need to provide the input_bam_fofn fc_unzip.cfg option in order for this to work.
I don't know what is this bam file. Until search bam in https://pb-falcon.readthedocs.io/en/latest/pipeline.html#falcon-unzip website.
The Falcon-unzip module requires both FASTA and PacBio BAM inputs for subreads.
要求测序原始的BAM文件,一般这一步跑不了。
(4)output
https://pb-falcon.readthedocs.io/en/latest/pipeline.html#falcon
FALCON and FALCON-unzip result:
2-asm-falcon/a_ctg_all.fa: all associated contigs
2-asm-falcon/a_ctg.fa: De-duplicated associated fasta file
2-asm-falcon/p_ctg.fa: Fasta file of all primary contigs
3-unzip/all_h_ctg.fa : partially phased primary contigs
3-unzip/all_p_ctg.fa: phased haplotigs
4-quiver/cns_output_/*.fast[a|: final consensus output
step1 command: fc_run fc_run.cfg
step1 output:
0-rawreads/ # Raw read error correction directory
1-preads_ovl/ # Corrected read overlap detection
2-asm-falcon/ # String Graph Assembly
mypwatcher/ # Job scheduler logs
scripts/
sge_log/ # deprecated
step2 command : fc_unzip.py fc_unzip.cfg
3-unzip/
├── all_p_ctg.fa (step3) # partially phased primary contigs
├── all_h_ctg.fa (step3) # phased haplotigs
├── all_p_ctg_edges # primary contig edge list
├── all_h_ctg_edges # haplotig edge list
├── all_h_ctg_ids # haplotig id index
└── all_phased_reads # table of all phased raw reads
step3 command: fc_quiver.py fc_unzip.cfg (use quiver)
4-quiver/cns_output/*.fast[a|q]:falcon-unzip最后的输出结果
参考链接:
https://github.com/PacificBiosciences/FALCON/wiki/Manual
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-11-24 12:30
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社