博文

FALCON and FALCON-UNZIP (2): 运行

已有 3569 次阅读 2020-10-10 16:28 |个人分类:工具|系统分类:科研笔记

（1）输入文件（fasta/fastq）格式

the header needs to have three fields separated by "/", the 2nd files has to be a unique number.The difference two numbers on the third field separated by "_" need to be the same as the sequence length.

(2) .cfg 文件

给了一些example的cfg 文件，目前我用的是如下fc_run.cfg

[General]

input_fofn = input.fofn

input_type = raw

#use_tmpdir = scratch

# length cutoff used for seed reads used for initial mapping (default length was 5000, -1 means determine from genome size and seed coverage)

genome_size = 117000000

seed_coverage = 20

length_cutoff = -1

# length cutoff used for seed reads used for pre-assembly

length_cutoff_pr = 1000

falcon_greedy = False

falcon_sense_greedy=False

# concurrency setting

default_concurrent_jobs = 288

pa_concurrent_jobs = 288

cns_concurrent_jobs = 288

ovlp_concurrent_jobs = 288

# overlapping options for Daligner

pa_HPCdaligner_option = -v -B128 -e0.75 -M24 -l1200 -k14 -h256 -w8 -s100 -t16

ovlp_HPCdaligner_option = -v -B128 -M24 -k24 -h600 -e.95 -l1800 -s100

pa_daligner_option = -e0.75 -l1200 -k14 -h256 -w8 -s100

ovlp_daligner_option = -k24 -h600 -e.95 -l1800 -s100

pa_HPCTANmask_option = -k18 -h480 -w8 -e.8 -s100

pa_HPCREPmask_option = -k18 -h480 -w8 -e.8 -s100

#pa_REPmask_code=1,20;10,15;50,10

pa_DBsplit_option = -x500 -s400

ovlp_DBsplit_option = -s400

# error correction consensus option

falcon_sense_option = --output-multi --min-idt 0.70 --min-cov 4 --max-n-read 200 --n-core 24

# overlap filtering options

overlap_filtering_setting = --max-diff 120 --max-cov 120 --min-cov 2 --n-core 12

# slurm options (says sge but not for rEaLz)

#sge_option_da = -pe smp 5 -q bigmem

#sge_option_la = -pe smp 20 -q bigmem

#sge_option_pda = -pe smp 6 -q bigmem

#sge_option_pla = -pe smp 16 -q bigmem

#sge_option_fc = -pe smp 24 -q bigmem

#sge_option_cns = -pe smp 8 -q bigmem

[job.defaults]

use_tmpdir = true

#job_type = local

#pwatcher_type = fs_based

#job_type = sge

job_name_style = 1

stop_all_jobs_on_failure = true

JOB_QUEUE=default7

submit = bash -c ${JOB_SCRIPT} >| ${JOB_STDOUT} 2>| ${JOB_STDERR}

njobs=16

pwatcher_type = blocking

#job_queue = production

[job.step.da]

NPROC=8

[job.step.pda]

NPROC=8

[job.step.la]

NPROC=2

[job.step.pla]

NPROC=2

[job.step.cns]

NPROC=8

[job.step.asm]

NPROC=24

如下

[General]

max_n_open_files = 1000

[Unzip]

input_fofn= input.fofn

input_bam_fofn= input_bam.fofn

#sge_phasing= -pe smp 12 -q bigmem

#sge_quiver= -pe smp 12 -q sequel-farm

#sge_track_reads= -pe smp 12 -q default

#sge_blasr_aln= -pe smp 24 -q bigmem

#sge_hasm= -pe smp 48 -q bigmem

#unzip_concurrent_jobs = 64

#quiver_concurrent_jobs = 64

#unzip_concurrent_jobs = 12

#quiver_concurrent_jobs = 12

[job.defaults]

NPROC=4

njobs=7

#job_type = SGE

job_type = local

#use_tmpdir = /scratch

pwatcher_type = blocking

job_type = string

submit = bash -C ${CMD} >| ${STDOUT_FILE} 2>| ${STDERR_FILE}

#njobs=120

njobs=8

NPROC=4

#submit = qsub -S /bin/bash -sync y -V \

# -q ${JOB_QUEUE} \

# -N ${JOB_NAME} \

# -o "${JOB_STDOUT}" \

# -e "${JOB_STDERR}" \

# -pe smp ${NPROC} \

# -l h_vmem=${MB}M \

# "${JOB_SCRIPT}"

[job.step.unzip.track_reads]

njobs=1

NPROC=48

[job.step.unzip.blasr_aln]

njobs=2

NPROC=16

[job.step.unzip.phasing]

njobs=16

NPROC=2

[job.step.unzip.hasm]

njobs=1

NPROC=48

[job.step.unzip.quiver]

njobs=2

NPROC=12

(3) 遇见的问题

（3.1）

[ERROR]Task Node(0-rawreads/build) failed with exit-code=1

[ERROR]Some tasks are recently_done but not satisfied: set([Node(0-rawreads/build)])

check /media/bio/yanbo/Kmer2Haplotype/realData/A.thaliana/Pacbio/SRX1715704_sra/Faclon-unzip/0-rawreads/build/run-P0_build_aaaa9d79df733a1e799f10354861351a.bash.stderr

-----------------------------------------------------------------------

+ rm -f raw_reads.db '.raw_reads.*'

#fc_fasta2fasta < my.input.fofn >| fc.fofn

while read fn; do cat ${fn} | python -m falcon_kit.mains.fasta_filter streamed-median - | fasta2DB -v raw_reads -i${fn##*/}; done < my.input.fofn

+ read fn

+ cat /media/bio/yanbo/Kmer2Haplotype/realData/A.thaliana/Pacbio/SRX1715704_sra/relable.fastq

+ python -m falcon_kit.mains.fasta_filter streamed-median -

+ fasta2DB -v raw_reads -irelable.fastq

-------------------------------------------------------------------

may only allow input fasta file

(3.2) no 012822F_reads.fa.

https://github.com/PacificBiosciences/FALCON_unzip/issues/93

remove that contig from the file 3-unzip/reads/ctg_list file

让我们在这个list里面删掉对应的contigs, 让pipeline可以proceed

这个问题是很早之前遇见的，应该是个bug, 现在的版本可能没有了

(3.3) quiver fail

You need to provide the input_bam_fofn fc_unzip.cfg option in order for this to work.

I don't know what is this bam file. Until search bam in https://pb-falcon.readthedocs.io/en/latest/pipeline.html#falcon-unzip website.

The Falcon-unzip module requires both FASTA and PacBio BAM inputs for subreads.

要求测序原始的BAM文件，一般这一步跑不了。

（4）output

https://pb-falcon.readthedocs.io/en/latest/pipeline.html#falcon

FALCON and FALCON-unzip result:

2-asm-falcon/a_ctg_all.fa: all associated contigs

2-asm-falcon/a_ctg.fa: De-duplicated associated fasta file

2-asm-falcon/p_ctg.fa: Fasta file of all primary contigs

3-unzip/all_h_ctg.fa : partially phased primary contigs

3-unzip/all_p_ctg.fa: phased haplotigs

4-quiver/cns_output_/*.fast[a|: final consensus output

step1 command: fc_run fc_run.cfg

step1 output:

0-rawreads/ # Raw read error correction directory

1-preads_ovl/ # Corrected read overlap detection

2-asm-falcon/ # String Graph Assembly

mypwatcher/ # Job scheduler logs

scripts/

sge_log/ # deprecated

step2 command : fc_unzip.py fc_unzip.cfg

3-unzip/

├── all_p_ctg.fa (step3) # partially phased primary contigs

├── all_h_ctg.fa (step3) # phased haplotigs

├── all_p_ctg_edges # primary contig edge list

├── all_h_ctg_edges # haplotig edge list

├── all_h_ctg_ids # haplotig id index

└── all_phased_reads # table of all phased raw reads

step3 command: fc_quiver.py fc_unzip.cfg (use quiver）

4-quiver/cns_output/*.fast[a|q]:falcon-unzip最后的输出结果

参考链接：

https://github.com/PacificBiosciences/FALCON/wiki/Manual

转载本文请联系原作者获取授权，同时请注明本文来自李艳博科学网博客。
链接地址：https://blog.sciencenet.cn/blog-1515646-1253855.html

上一篇：FALCON and FALCON-UNZIP (1): 一波三折的安装

收藏 IP: 27.38.82.*| 热度|

当前推荐数：0

该博文允许注册用户评论请点击登录评论 (0 个评论)

数据加载中...

返回顶部

李艳博

扫一扫，分享此博文

liyanbo的个人博客分享 http://blog.sciencenet.cn/u/liyanbo

博文

FALCON and FALCON-UNZIP (2): 运行

当前推荐数：0

该博文允许注册用户评论请点击登录评论 (0 个评论)

李艳博

全部作者的其他最新博文

全部精选博文导读

liyanbo的个人博客分享 http://blog.sciencenet.cn/u/liyanbo

博文

FALCON and FALCON-UNZIP (2): 运行

当前推荐数：0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

李艳博

全部作者的其他最新博文

全部精选博文导读

该博文允许注册用户评论请点击登录评论 (0 个评论)