||
文章:Best practices for analysing microbiomes
杂志:Nature Reviews
时间:2017
自然微生物综述(2017 IF:31.851)于2018年5月23日在线发表了Rob Knight亲自撰写(一作兼通讯)的微生物组领域研究方法综述。对此文的详细介绍可见:http://blog.sciencenet.cn/blog-3334560-1122300.html
本文仅记录个人对于数据分析部分的理解。
一、Marker gene analyses
1)primers的选择对于marker gene分析有着重要作用,为了避免方法学的误差,需要对于protocol作为验证。
2)在处理中,测序错误的处理对于结果非常重要。The first step in analysing marker gene amplicon data is to remove sequencing errors: despite very low sequencing error rates.
3)OTU picking的分析策略可以容纳一部分序列的差异,但同时会因为容纳这种差异导致一些真实的差异被忽略. OTU picking, consolidates similar sequences (usually with a 97% similarity threshold) into single features, merging sequence variants, including those introduced by sequence error, into a single OTU that can be used in subsequent analysis. However, this method misses subtle and real biological sequence variation, such as SNPs that would be consolidated into single OTUs.
4)为了解决OTU picking策略存在的问题,提出Oligotyping策略. Oligotyping improves upon traditional OTU picking by including position-specific information from 16S rRNA sequencing to identify subtle nucleotide variation and by discriminating between closely related but distinct taxa. 文章建议数据分析中用Oligotyping的策略,除一些特殊的目的外,目前QIIME2采用Oligotyping的策略,QIIME1已经不予维护了。
二、Metagenome and Metatranscriptome analyses
1)有3种策略,Read-based profiling,Marker gene methods和assembly. Read-based profiling有Kraken、Bowtie2等,主要依赖于序列库的丰富。Read -based profiling takes the unassembled DNA or mRNA sequence reads and compares them against reference databases to assign taxonomy or anno- tate genes. With the ever-increasing size of modern query data sets and databases, methods are continu- ally being refined to improve the speed of read-based profiling. Marker gene methods有MetaPhlAn2、TIPP. Marker gene methods (such as MetaPhlAn2 and TIPP) use specific genomic regions for taxonomy assign- ment, focusing on universal, single-copy elements. Assembly将reads组成contigs,再进行bining. Another method for analysing metagenome and metatranscriptome sequencing reads is to assemble the short reads into longer sequences (contigs). These contigs can be further sorted or binned by similarity to assemble partial to full genomes of microorganisms. 组装的策略会受物种组成多样性的影响, assembly- based analyses are not universally applicable; higher biodiversity, the presence of many related strains in sam- ples, or low coverage yields fragmented assemblies and can obscure taxa from downstream analyses. 组装的工具有metaSPAdes、MEGAHIT等,bining的工具有MaxBin2、CONCOCT等。assembly工作的复杂性,作者推荐了几个pipeline, Employing integrated workflow tools to automate data processing, such as Anvi’o, ATLAS or MetAMOS, is highly recommended because assembly-based methods are complex.
2)物种注释中数据库的选择有着关键作用。It is important to note that as taxonomic or functional assignment depends on homology between the single read and a reference, database choice is crucial. 例如 PHASTER for bacteriophages, Resfams for antibiotic resistance genes, FOAM for environmen-tal samples, Tara for ocean samples, BGI catalogue for mouse gut samples, MetaHit for human gut samples.
3)为了避免样本比较时因为测序深度导致的差异,一些normlization的策略和方法借鉴了RNA-seq中的思路,TPM、RPKM、edgeR和DESeq2等。
4)beta-diversity中,存在Quantitative metrics 和qualitative metrics. 不同表征方法对结果的分析存在较大影响。Quantitative metrics (Bray–Curtis, Canberra and weighted UniFrac) use feature abun- dance data in calculations, whereas qualitative metrics (binary-Jaccard and unweighted UniFrac) only con- sider the presence or absence of features.
5)差异的分析会因为微生物数据矩阵的稀疏性受到较大影响,相对丰度的表征是关键之一。作者在文中推荐了一些策略。One approach is to force strong biological assumptions on the statistical test: for example, Lovell’s proportionality metric detects only pos- itive correlations. Other tools that are widely applicable and have been optimized for microbiome data, such as SparCC and SPEIC-EASI, assume that few species are correlated, so most correlation coefficients are zero. BAnOCC is another tool for addressing the composi-tionality problem that makes no assumptions about the data. We recommend another approach that does not assume few species are correlated, which is to test for differences between microbial communities using theisometric log ratio transform (ilr).
6)利用一些机器学习的策略建立模型去区分,如此实现特征的筛选。
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-12-19 07:19
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社