xbinbzy的个人博客分享 http://blog.sciencenet.cn/u/xbinbzy

博文

PICRSt:基于16s DNA进行功能注释的工具

已有 5905 次阅读 2015-12-30 20:53 |个人分类:科研文章|系统分类:科研笔记| 功能注释, PICRSt

   PICRSt (phylogenetic investigation of communities by reconstruction of unobserved states),a computational approach to predict the functional composition of a metagenome using marker gene data and a database of reference genomes.  是预测样品所具有的功能,基于16s DNA和参考序列数据库。

   此软件的理论依据是:phylogeny and biomolecular function are strongly, if imperfectly, correlated. 物种进化树与生物分子的功能密切相关,Phylogenetic trees based on 16S closely resemble clusters obtained on the basis of shared gene content,利用16s构建出来的进化树与基于shared gene content相仿,为此研究人员可以进化树关系做一些推测,researchers often infer properties of uncultured organisms from cultured relatives.

   基于上述的理论,有一些功能的分析是通过比对近源物种的基因组上进行分析,some 16S studies have extended these intuitions to infer the functional contribution of particular community members by mapping a subset of abundant 16S sequences to their nearest sequenced reference genome.

   PICRSt的原理如下:


   主要分为两步,‘gene content inference’和‘metagenome inference’

   1)In the initial ‘gene content inference’ step, gene content is precomputed for each organism in a reference phylogenetic tree. 利用reference数据库建立进化树,同时考虑基因copy数的情况,可以只操作一次。

   2)The subsequent ‘metagenome inference’ step combines the resulting gene content predictions for all microbial taxa with the relative abundance of 16S rRNA genes in one or more microbial community samples. 根据进化树和16s DNA的丰度去预计样品所具有的功能。

   在第一步中,PICRSt的目的是predicts what genes are present in organisms that have not yet been sequenced based on the genes observed in their sequenced evolutionary relatives,即通过已经功能的物种预测未知的。为此,PICRSt需要利用已知的注释信息,uses existing annotations of gene content and 16S copy number from reference bacterial and archaeal genomes in the IMG database. 这种预测首先要实现的是Prediction of a microbe’s gene content starts by inferring the content of the organism’s last phylogenetic common ancestor with one or more sequenced genomes. 利用进化树中的上一级祖先去进行,如此则需要找到common ancestor,在这步中则需要ancestral state reconstruction的方法. 如此则将进化树中的未含有功能注释的物种进行了注释。The gene contents of each reference genome and inferred ancestral genomes are then used to predict the gene contents of all microorganisms present inthe reference phylogenetic tree.  在这一步中,还会统计maker gene的copy number.

   在第二步中,利用GreenGenes得到的OTU文件,The metagenome inference step relies on a user-provided table of operational taxonomic units (OTUs) for each sample with associated Greengenes identifiers.  基于以下考虑,对OTU进行了normalization的操作,Because 16S rRNA copy number varies greatly among different bacteria and archaea, the user’s table of OTUs is normalized by dividing the abundance of each organism by its predicted 16S copy number. 然后Normalized OTU abundances are then multiplied by the set of gene family abundances calculated for each taxon during the gene content inference step. 最终得到The final output from metagenome prediction is thus an annotated table of predicted gene family counts for each sample,where gene families can be orthologous groups or other identifierssuch as KOs, COGs or Pfams.


参考文章:Predictive functional profiling of microbialcommunities using 16S rRNA marker gene sequences



https://blog.sciencenet.cn/blog-306699-946934.html

上一篇:嵌合体检测工具-UCHIME的原理解读
下一篇:禅道在Linux系统上的安装之路
收藏 IP: 116.24.103.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-6-18 01:42

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部