xbinbzy的个人博客分享 http://blog.sciencenet.cn/u/xbinbzy

博文

Metastats的原理解读

已有 8732 次阅读 2015-8-22 10:51 |个人分类:科研文章|系统分类:科研笔记| Metastats

文章:Statistical Methods for Detecting Differentially Abundant Features in Clinical Metagenomic Samples

杂志PLoS Comput Biol.

年份:2009


原始数据:

   数据格式如下,其中具体的值可以是任何要分析的值。原文描述是the relative abundance of specific features within each sample。可以是number of 16S rRNA clones assigned to a specific taxon,或者某条pathway上mapping reads数目。应当是每个group都会生成这样的表格,行表示特征,列表示样本!

   


Data normalization:

   表内的值用每个样品中的相对含量表示,where the cell in the ith row and the jthcolumn (which we shall denote fij) is the proportion of taxon i observed in individual j.


Analysis of differential abundance,丰度的差异计算:

   针对每个group,可以针对各菌计算其平均值和标准差值。

 

   然后计算t值检验差异情况


显著性的评估,Assessing significance:

   通过计算t值得到p-value去判断是否存在差异,但是这个依赖于分布是符合正态的!在此并不假定符合正态分布,而是通过permutation的方法去模拟,We do not make this assumption, but rather estimate the null distribution of ti non-parametrically using a permutation method。This procedure, also known as the nonparametric t-test has been shown to provide accurate estimates of significance when the underlying distributions are non-normal。

   进行permutation的操作,we randomly permute the treatment labels of the columns of the abundance matrix and recalculate the t statistics. Note that the permutation maintains that there are n1 replicates for treatment 1 and n2 replicates for treatment 2. Repeating this procedure for B trials, we obtain B sets of t statistics: t10b, …, tM0b, b = 1, …, B, where M is the number of rows in the matrix. For each row (feature), the p-value associated with the observed t statistic is calculated as the fraction of permuted tests with a t statistic greater than or equal to the observed ti:


   当样品量比较少时,This approach is inadequate for small sample sizes in which there are a limited number of possible permutations of all columns. As a heuristic, if less than 8 subjects are used in either treatment, we pool all permuted t statistics together into one null distribution and estimate p-values as:


   选择8这个界限是根据实验的经验设置的,Note that the choice of 8 for the cutoff is simply heuristic based on experiments during the implementation of our method. 当样品量比较少时,Our approach is specifically targeted at datasets comprising multiple subjects for small data-sets approaches such as that proposed by Rodriguez-Brito et. al. might be more appropriate.

   在本软件中用的是1000次permutations, permutations和显著性阈值有一定关系,在一定情况下,permutations和p值之间是一种转换关系。Unless explicitly stated, all experiments described below used 1000 permutations. In general, the number of permutations should be chosen as a function of the significance threshold used in the experiment. Specifically, a permutation test with B permutations can only estimate p-values as low as 1/B (in our case 10−3).


p值校正,Multiple hypothesis testing correction:

   在文中没有用Bonferroni correction,而是利用FDR(false discoverty rata)。In this context, the significance of a test is measured by a q-value, an individual measure of the FDR for each test.


Given an ordered list of p-values, p(1)p(2)≤…≤p(m), (where m is the total number of features), and a range of values λ = 0, 0.01, 0.02, …, 0.90.Next, we fit An external file that holds a picture, illustration, etc.
Object name is pcbi.1000352.e008.jpg with a cubic spline with 3 degrees of freedom, which we denote An external file that holds a picture, illustration, etc.
Object name is pcbi.1000352.e009.jpg, and let An external file that holds a picture, illustration, etc.
Object name is pcbi.1000352.e010.jpg.最后,q-value的计算:we estimate the q-value corresponding to each ordered p-value. First, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000352.e011.jpg. Then for i = m-1, m-2, , 1。


the hypothesis test with p-value An external file that holds a picture, illustration, etc.
Object name is pcbi.1000352.e013.jpg has a corresponding q-value of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000352.e014.jpg. Note that this method yields conservative estimates of the true q-values, i.e. An external file that holds a picture, illustration, etc.
Object name is pcbi.1000352.e015.jpg. Our software provides users with the option to use either p-value or q-value thresholds, irrespective of the complexity of the data.


对于一些特殊特征的处理,Handling sparse counts:

   利用费舍尔检验,We compare the differential abundance of sparsely-sampled (rare) features using Fisher's exact test.


   

   核心在于针对不同的特征分为t检验和Fish exact检验,t检验通过permutation去估算分布模型,从而计算p值,计算好的p值,利用FDR去判断存在显著差异的界限值。

   此软件主要针对两组之间的比较,兼顾考虑了物种中分布广泛的菌(t permutation的分析)和分布稀少的菌(卡方检验)。



https://blog.sciencenet.cn/blog-306699-914886.html

上一篇:HMP计划-Metagenomics: Facts and Artifacts, and Computationa
下一篇:多重校正-How does multiple testing correction work
收藏 IP: 183.13.120.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-6-2 21:37

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部