liyanbo的个人博客分享 http://blog.sciencenet.cn/u/liyanbo

博文

FALCON and FALCON-UNZIP (2): 运行

已有 3569 次阅读 2020-10-10 16:28 |个人分类:工具|系统分类:科研笔记

(1)输入文件(fasta/fastq)格式

 the header needs to have three fields separated by "/", the 2nd files has to be a unique number.The difference two numbers on the third field separated by "_" need to be the same as the sequence length. 


(2) .cfg 文件

给了一些example的cfg 文件,目前我用的是如下fc_run.cfg

[General]

input_fofn = input.fofn

input_type = raw

#use_tmpdir = scratch

# length cutoff used for seed reads used for initial mapping (default length was 5000, -1 means determine from genome size and seed coverage)

genome_size = 117000000

seed_coverage = 20

length_cutoff = -1

# length cutoff used for seed reads used for pre-assembly

length_cutoff_pr = 1000

falcon_greedy = False

falcon_sense_greedy=False

# concurrency setting

default_concurrent_jobs = 288

pa_concurrent_jobs = 288

cns_concurrent_jobs = 288

ovlp_concurrent_jobs = 288

# overlapping options for Daligner

pa_HPCdaligner_option =  -v -B128 -e0.75 -M24 -l1200 -k14 -h256 -w8 -s100 -t16

ovlp_HPCdaligner_option = -v -B128 -M24 -k24 -h600 -e.95 -l1800 -s100

pa_daligner_option = -e0.75 -l1200 -k14 -h256 -w8 -s100

ovlp_daligner_option = -k24 -h600 -e.95 -l1800 -s100

pa_HPCTANmask_option = -k18 -h480 -w8 -e.8 -s100

pa_HPCREPmask_option = -k18 -h480 -w8 -e.8 -s100

#pa_REPmask_code=1,20;10,15;50,10

pa_DBsplit_option = -x500 -s400

ovlp_DBsplit_option = -s400

# error correction consensus option

falcon_sense_option = --output-multi --min-idt 0.70 --min-cov 4 --max-n-read 200 --n-core 24

# overlap filtering options

overlap_filtering_setting = --max-diff 120 --max-cov 120 --min-cov 2 --n-core 12

# slurm options (says sge but not for rEaLz)

#sge_option_da = -pe smp 5 -q bigmem

#sge_option_la = -pe smp 20 -q bigmem

#sge_option_pda = -pe smp 6 -q bigmem

#sge_option_pla = -pe smp 16 -q bigmem

#sge_option_fc = -pe smp 24 -q bigmem

#sge_option_cns = -pe smp 8 -q bigmem

[job.defaults]

use_tmpdir = true

#job_type = local

#pwatcher_type = fs_based

#job_type = sge

job_name_style = 1

stop_all_jobs_on_failure = true

JOB_QUEUE=default7

submit = bash -c ${JOB_SCRIPT} >| ${JOB_STDOUT} 2>| ${JOB_STDERR}

njobs=16

pwatcher_type = blocking

#job_queue = production

[job.step.da]

NPROC=8

[job.step.pda]

NPROC=8

[job.step.la]

NPROC=2

[job.step.pla]

NPROC=2

[job.step.cns]

NPROC=8

[job.step.asm]

NPROC=24


如下

[General]

max_n_open_files = 1000

[Unzip]

input_fofn= input.fofn

input_bam_fofn= input_bam.fofn

#sge_phasing= -pe smp 12 -q bigmem

#sge_quiver= -pe smp 12 -q sequel-farm

#sge_track_reads= -pe smp 12 -q default

#sge_blasr_aln=  -pe smp 24 -q bigmem

#sge_hasm=  -pe smp 48 -q bigmem

#unzip_concurrent_jobs = 64

#quiver_concurrent_jobs = 64

#unzip_concurrent_jobs = 12

#quiver_concurrent_jobs = 12

[job.defaults]

NPROC=4

njobs=7

#job_type = SGE

job_type = local

#use_tmpdir = /scratch

pwatcher_type = blocking

job_type = string

submit = bash -C ${CMD} >| ${STDOUT_FILE} 2>| ${STDERR_FILE}

#njobs=120

njobs=8

NPROC=4

#submit = qsub -S /bin/bash -sync y -V  \

#  -q ${JOB_QUEUE}    \

#  -N ${JOB_NAME}     \

#  -o "${JOB_STDOUT}" \

#  -e "${JOB_STDERR}" \

#  -pe smp ${NPROC}   \

#  -l h_vmem=${MB}M   \

#  "${JOB_SCRIPT}"

[job.step.unzip.track_reads]

njobs=1

NPROC=48

[job.step.unzip.blasr_aln]

njobs=2

NPROC=16

[job.step.unzip.phasing]

njobs=16

NPROC=2

[job.step.unzip.hasm]

njobs=1

NPROC=48

[job.step.unzip.quiver]

njobs=2

NPROC=12



(3) 遇见的问题

(3.1)

[ERROR]Task Node(0-rawreads/build) failed with exit-code=1

[ERROR]Some tasks are recently_done but not satisfied: set([Node(0-rawreads/build)])


check /media/bio/yanbo/Kmer2Haplotype/realData/A.thaliana/Pacbio/SRX1715704_sra/Faclon-unzip/0-rawreads/build/run-P0_build_aaaa9d79df733a1e799f10354861351a.bash.stderr

-----------------------------------------------------------------------

+ rm -f raw_reads.db '.raw_reads.*'

#fc_fasta2fasta < my.input.fofn >| fc.fofn

while read fn; do  cat  ${fn} | python -m falcon_kit.mains.fasta_filter streamed-median - | fasta2DB -v raw_reads -i${fn##*/}; done < my.input.fofn

+ read fn

+ cat /media/bio/yanbo/Kmer2Haplotype/realData/A.thaliana/Pacbio/SRX1715704_sra/relable.fastq

+ python -m falcon_kit.mains.fasta_filter streamed-median -

+ fasta2DB -v raw_reads -irelable.fastq

-------------------------------------------------------------------

may only allow input fasta file


(3.2) no 012822F_reads.fa.

https://github.com/PacificBiosciences/FALCON_unzip/issues/93

remove that contig from the file 3-unzip/reads/ctg_list file

让我们在这个list里面删掉对应的contigs, 让pipeline可以proceed

 

这个问题是很早之前遇见的,应该是个bug, 现在的版本可能没有了


(3.3) quiver fail

You need to provide the input_bam_fofn fc_unzip.cfg option in order for this to work.


I don't know what is this bam file. Until search bam in  https://pb-falcon.readthedocs.io/en/latest/pipeline.html#falcon-unzip website.

The Falcon-unzip module requires both FASTA and PacBio BAM inputs for subreads.


要求测序原始的BAM文件,一般这一步跑不了。



(4)output

https://pb-falcon.readthedocs.io/en/latest/pipeline.html#falcon

FALCON and FALCON-unzip result:

2-asm-falcon/a_ctg_all.fa: all associated contigs

2-asm-falcon/a_ctg.fa: De-duplicated associated fasta file

2-asm-falcon/p_ctg.fa: Fasta file of all primary contigs

3-unzip/all_h_ctg.fa : partially phased primary contigs

3-unzip/all_p_ctg.fa: phased haplotigs

4-quiver/cns_output_/*.fast[a|: final consensus output


step1 command: fc_run fc_run.cfg

step1 output:

0-rawreads/     # Raw read error correction directory

1-preads_ovl/   # Corrected read overlap detection

2-asm-falcon/   # String Graph Assembly

mypwatcher/     # Job scheduler logs

scripts/ 

sge_log/        # deprecated


step2 command : fc_unzip.py fc_unzip.cfg

3-unzip/ 

├── all_p_ctg.fa (step3)               # partially phased primary contigs

├── all_h_ctg.fa (step3)        # phased haplotigs

├── all_p_ctg_edges             # primary contig edge list

├── all_h_ctg_edges             # haplotig edge list

├── all_h_ctg_ids               # haplotig id index

└── all_phased_reads            # table of all phased raw reads


step3 command: fc_quiver.py fc_unzip.cfg (use quiver)

4-quiver/cns_output/*.fast[a|q]:falcon-unzip最后的输出结果

 


参考链接:

https://github.com/PacificBiosciences/FALCON/wiki/Manual




https://blog.sciencenet.cn/blog-1515646-1253855.html

上一篇:FALCON and FALCON-UNZIP (1): 一波三折的安装
收藏 IP: 27.38.82.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-11-24 12:30

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部