||
前面介绍的SMURF流程的运行以失败告终了,不过这个是这篇文章的参考方法,至于这篇文章改进过的方法,还没有试过,这就试一下,顺便考虑是否能把6区的移植过来,搞个6R呢,可能,算法上有略微的区别,毕竟这篇Science研究的是肿瘤中的含量很少的微生物,用了严格的去污染策略,不管怎样,试试吧!
类似上次那个流程,更加简单了些,只需要安装解压下。
# 安装MCR,这次是新版本的9.7,重要的事说三遍,必须有图形界面gui,否则会安装失败#必须有图形界面gui,必须有图形界面gui# 下载地址,速度还可以,毕竟大公司wget https://ssd.mathworks.com/supportfiles/downloads/R2019b/Release/4/deployment_files/installer/complete/glnxa64/MATLAB_Runtime_R2019b_Update_4_glnxa64.zip# 解压,然后安装,这次不需要加环境变量了,因为变成了脚本的一个参数# 建议新建一个文件夹解压,否则文件挺乱sudo ./install# 克隆脚本git clone https://github.com/NoamShental/5R.git# 开始运行示例还是出现了摄氏,发现是fastq压缩成了zip,解压就好啦,在这个文件夹example_fastq# 文件结构如下example_fastq/ ├── RDB123_ATGAGTGC │ ├── RDB123_ATGAGTGC_L006_R1_001.fastq │ └── RDB123_ATGAGTGC_L006_R2_001.fastq └── RDB1_TTGGTGCA ├── RDB1_TTGGTGCA_L001_R1_001.fastq └── RDB1_TTGGTGCA_L001_R2_001.fastq# 后面发现不建立一个样本一个文件夹也是可以的,脚本会自动复制文件到一个新的Samples文件夹# 结构如下: Samples └── RDB1_TTGGTGCA ├── RDB1_TTGGTGCA_L001_R1_001.fastq └── RDB1_TTGGTGCA_L001_R2_001.fastq# 运行,看结果啦,好像全程使用不超过双核呢,还是样本少,跑不起来./5R_linux/run_main_5R.sh /usr/local/MATLAB/MATLAB_Runtime/v97 ./example_fastq/RDB1_TTGGTGCA ./GG_5R ./example_results/5R_SMURF_example.txt 1261234567891011121314151617181920212223242526
运行过程比较顺畅,基本不报错,除了两个warning
------------------------------------------ Setting up environment variables --- LD_LIBRARY_PATH is .:/usr/local/MATLAB/MATLAB_Runtime/v97/runtime/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v97/bin/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v97/sys/os/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v97/sys/opengl/lib/glnxa64 Input params are: Working on files in directory: ./example_fastq/RDB1_TTGGTGCA Reconstruction using kmers of length: 126 WORKING ON SAMPLE : RDB1_TTGGTGCA Part 1/1 - Block 1/2 Part 1/1 - Block 2/2 Number of reads: 299346 Percent of long enough reads: 1 Percent of good reads: 0.95608 Counting fasta write: 1 历时 5.675209 秒。 Mapped to primers 88% of unique reads Mapped to primers 98% of read counts Loading bacterial DB for region 1 out of 5 from original region 1... Loading bacterial DB for region 5 out of 5 from original region 5 Region 1 out of 5 Keep high freq: 8% of reads Keep high freq: 89% of counts Building matrix M Building matrix A --------------------------------------------... -------------------------------------------- Region 5 out of 5 Keep high freq: 5% of reads Keep high freq: 90% of counts Building matrix M Building matrix A -------------------------------------------- 警告: Make sure PE is supported properly> In solve_iterative_noisy (line 4) In reconstruction_func (line 40) In main_multiple_regions (line 56) In main_5R (line 57)Region 1 out of 5 Keeping reads matched to DB: 95% of reads Keeping reads matched to DB: 97% of counts --------------------------------------------... -------------------------------------------- Region 5 out of 5 Keeping reads matched to DB: 98% of reads Keeping reads matched to DB: 99% of counts -------------------------------------------- Filter out columns (bacteria)警告: TAKE PROPER CARE OF NOT AMPLIFIED REGIONS> In solve_iterative_noisy (line 90) In reconstruction_func (line 40) In main_multiple_regions (line 56) In main_5R (line 57)Normalize frequency counts Build matrix A_L2 Making columns of A unique... Removing included bacterias... Removed 8844 out of 10189 警告: Found 721 bacterias with non even number of reads mapped> In solve_iterative_noisy (line 178) In reconstruction_func (line 40) In main_multiple_regions (line 56) In main_5R (line 57)Starting iterations... Total iterations time: 1.1721 Building the Scott files for level: species Loaded RDB1_TTGGTGCA123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
从过程来看,6V区的运行过程几乎完全一致,看看结果:
Total # of reads 205605domain phylum class order family genus species RDB1_TTGGTGCA Bacteria Actinobacteria Acidimicrobiia Acidimicrobiales Unknown family Unknown genus271 Unknown species1 0.002213 Bacteria Actinobacteria Actinobacteria Actinomycetales Actinomycetaceae Actinomyces Actinomyces massiliensis 0.000603 Bacteria Actinobacteria Actinobacteria Actinomycetales Actinomycetaceae Actinomyces Actinomyces naeslundii 0.000746 Bacteria Actinobacteria Actinobacteria Actinomycetales Actinomycetaceae Actinomyces Actinomyces odontolyticus 0.0009391234567
终于见到物种分类结果啦,未分类的它进行了自编号。
# 复制一份出来,开始cp -r 5R 6Rcd 6R# 观察文件,替换6R需要的文件│ ├── GreenGenes_201305_unique_up_to_3_ambiguous_16S_ffpe5regions_2mm_RL160_region1.mat │ ├── GreenGenes_201305_unique_up_to_3_ambiguous_16S_ffpe5regions_2mm_RL160_region2.mat │ ├── GreenGenes_201305_unique_up_to_3_ambiguous_16S_ffpe5regions_2mm_RL160_region3.mat │ ├── GreenGenes_201305_unique_up_to_3_ambiguous_16S_ffpe5regions_2mm_RL160_region4.mat │ ├── GreenGenes_201305_unique_up_to_3_ambiguous_16S_ffpe5regions_2mm_RL160_region5.mat │ ├── GreenGenes_201305_unique_up_to_3_ambiguous_16S_headers.mat │ └── taxonomy_db.mat# 应该就是这几个文件啦,先删除,文件夹名字就不改啦,防止脚本不识别rm GG_5R/*# 物种注释文件吧?cp ../SMURF/Green_Genes_201305/unique_up_to_3_ambiguous_16S/gg_rdp_taxa_name_calls.mat GG_5R/taxonomy_db.mat# 序列文件cp ../SMURF/Green_Genes_201305/unique_up_to_3_ambiguous_16S_amp6Regions_2mm_RL130/* GG_5R/# 再把测序文件拉来试试,先删除原来的rm -r example_fastq/*cp ../SMURF/Example/* example_fastq/1234567891011121314151617181920
运行下,看看行不行啦
./5R_linux/run_main_5R.sh /usr/local/MATLAB/MATLAB_Runtime/v97 ./example_fastq ./GG_5R ./example_results/5R_SMURF_example.txt 126# 报少个文件,好像是总的,复制原来的过来试试cp ../5R/GG_5R/GreenGenes_201305_unique_up_to_3_ambiguous_16S_headers.mat GG_5R/# 因为没有解压而报错# WORKING ON SAMPLE : Samples# 索引超出数组元素的数目(0)。rm -r example_fastq/Samples/ gunzip example_fastq/*# 还是报错了,应该是要修改matlab代码才能解决,还是转向qiime2-slide这个插件吧Mapped to primers 0% of unique reads Mapped to primers 0% of read counts 索引超出数组元素的数目(0)。 出错 load_bact_DB (line 33)出错 reconstruction_func (line 9)出错 main_multiple_regions (line 56)出错 main_5R (line 57)MATLAB:badsubscript
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2025-1-5 08:18
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社