||
第一作者:Anushua Biswas
第一单位:印度国家化学实验室
通讯作者:Leelavati Narlikar
Abstract
背景回顾:High-throughput sequencing-based assays measure different biochemical activities pertaining to gene regulation, genome-wide. These activities include transcription factor (TF)-DNA binding, enhancer activity, open chromatin, and more. A major goal is to understand underlying sequence components, or motifs, that can explain the measured activity. It is usually not one motif, but a combination of motifs bound by cooperatively acting proteins that confers activity to such regions. Furthermore, regions can be diverse, governed by different combinations of TFs/motifs. 提出问题:Current approaches do not take into account this issue of combinatorial diversity. 主要研究:We present a new statistical framework cisDiversity, which models regions as diverse modules characterized by combinations of motifs, while simultaneously learning the motifs themselves. 特色:Because cisDiversity does not rely on knowledge of motifs, modules, cell type, or organism, it is general enough to be applied to regions reported by most high-throughput assays. 应用1-染色质结构数据:For example, in enhancer predictions resulting from different assays - GRO-cap, STARR-seq, and those measuring chromatin structure - cisDiversity discovers distinct modules and combinations of TF binding sites, some specific to the assay. 应用2-蛋白-DNA结合和ATAC数据:From protein-DNA binding data, cisDiversity identifies potential cofactors of the profiled TF, while from ATAC-seq data it identifies tissue-specific regulatory modules. 应用3-单细胞ATAC数据:Finally, analysis of single-cell ATAC-seq data suggests that regions open in one cell state encode information about future states, with certain modules staying open and others closing down in the next time point. 摘 要
基于高通量测序的分析能够衡量全基因组上与基因调控相关的各种生化活性。这些活性包括转录因子与DNA的结合、增强子活性、开放染色质等。其中一个主要目标是了解潜在的序列成分或基序,可以帮助解释所关注的生化活性。通常来说,这些区域的活性并不是一个基序所能赋予的,而是由协同作用的蛋白所结合的基序组合共同赋予。此外,这些区域可以是多样的,由不同的转录因子/基序组合所控制。目前已开发的方法均未能考虑到组合的多样性。本文中,作者提出了一个叫做“cisDiversity”的新统计框架,该框架将区域建模为以基序组合为特征的不同模块,同时学习基序本身。由于cisDiversity不依赖于对基序、模块、细胞类型或生物体的了解,因此其具有普遍性,可以应用于大多数高通量实验的报告区域。例如,在GRO-cap、STARR-seq和测量染色质结构等不同分析所获得的增强子预测中,cisDiversity发现了不同的转录因子结合位点模块和组合,其中有一些是某些实验所特有的。从蛋白质与DNA的结合数据中,cisDiversity识别了特定转录因子的潜在辅因子;而从ATAC-seq数据中,cisDiversity识别了组织特异性的调控模块。最后,对于单细胞ATAC-seq数据的分析表明,在单个细胞状态中开放的染色质区域编码着有关该细胞后续状态的信息,其中某些模块会保持开放,而其它模块在下一个时间点会闭合。
通讯作者
** Leelavati Narlikar **
研究方向:计算生物学;机器学习。
doi: https://doi.org/10.1101/gr.274563.120
Journal: Genome Research
Published date: July 19, 2021
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2025-1-10 00:26
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社