zhlingl的个人博客分享 http://blog.sciencenet.cn/u/zhlingl

博文

2020-10-22=scRNA-Seq-CNN

已有 1654 次阅读 2020-11-4 20:54 |个人分类:文献阅读|系统分类:科研笔记

Deep learning for inferring gene relationships from single-cell expression data

PNAS December 26, 2019

Significance

Accurate inference of gene interactions and causality is required for pathway reconstruction, which remains a major goal for many studies. Here, we take advantage of 2 recent technological developments, single-cell RNA sequencing and deep learning to propose an encoding scheme for gene expression data. We use this encoding in a supervised framework to perform several different types of analysis using minimal assumptions. Our method, convolutional neural network for coexpression (CNNC), first transforms expression data lacking locality to an image-like object on which convolutional neural networks (CNNs) work very well. We then utilize CNNs for learning relationships between genes, causality inferences, functional assignments, and disease gene predictions. For all of these tasks, CNNC significantly outperforms all prior task-specific methods.

Abstract

Several methods were developed to mine gene–gene relationships from expression data. Examples include correlation and mutual information methods for coexpression analysis, clustering and undirected graphical models for functional assignments, and directed graphical models for pathway reconstruction. Using an encoding for gene expression data, followed by deep neural networks analysis, we present a framework that can successfully address all of these diverse tasks. We show that our method, convolutional neural network for coexpression (CNNC), improves upon prior methods in tasks ranging from predicting transcription factor targets to identifying disease-related genes to causality inference. CNNC’s encoding provides insights about some of the decisions it makes and their biological basis. CNNC is flexible and can easily be extended to integrate additional types of genomics data, leading to further improvements in its performance.

Several computational methods have been developed to infer relationships between genes based on gene expression data. These range from methods for inferring coexpression relationships between pairs of genes (1) to methods for inferring a biological or disease process for a gene based on other genes [either using clustering or guilt by association (2)] to causality inferences (3, 4) and pathway reconstruction methods (5). To date, each of these tasks was handled by a different computational framework. For example, gene coexpression analysis is usually performed using Pearson correlation (PC) or mutual information (MI) (6). Functional assignment of genes is often performed using clustering (7) or undirected graphical models including Markov random fields (8), while pathway reconstruction is often based on directed probabilistic graphical models (4). These methods also serve as an initial step in some of the most widely used tools for the analysis of genomics data including network inference and reconstruction approaches (3, 9, 10), methods for classification based on genes expression (11) and many more.

While successful and widely used, these methods also suffer from serious drawbacks. First, most of these methods are unsupervised. Given the large number of genes that are profiled, and the often relatively small (at least in comparison) number of samples, several genes that are determined to be coexpressed or cofunctional may only reflect chance or noise in the data (12). In addition, most of the widely used methods are symmetric, which means that each pair has only one relationship value. While this is advantageous for some applications (e.g., clustering), it may be problematic for methods that aim at inferring causality (e.g., network reconstruction tasks).

To address these issues, we developed a method, convolutional neural network for coexpression (CNNC), which provides a supervised way (that can be tailored to the condition/question of interest) to perform gene relationship inference. CNNC utilizes a representation of the input data specifically suitable for deep learning. It represents each pair of genes as an image (histogram) and uses convolutional neural networks (CNNs) to infer relationships between different expression levels encoded in the image. The network is trained with positive and negative examples for the specific domain of interest (e.g., known targets of a transcription factor [TF], known pathways for a specific biological process, known disease genes, etc.), and the output can be either binary or multinomial.

We applied CNNC using a large cohort of single-cell (SC) expression data and tested it on several inference tasks. We show that CNNC outperforms prior methods for inferring interactions (including TF–gene and protein–protein interactions), causality inference, and functional assignments (including biological processes and diseases).

Results

We developed CNNC, a general computational framework for supervised gene relationship inference (Fig. 1). CNNC is based on a CNN, which is used to analyze summarized co-occurrence histograms from pairs of genes in single-cell RNA-sequencing (scRNA-seq) data. Given a relatively small labeled set of positive pairs (with either negative or random pairs serving as negative), CNNC learns to discriminate between interacting, causal pairs, negative pairs, or any other gene relationship types that can be defined.






https://blog.sciencenet.cn/blog-565558-1257107.html

上一篇:1104=CRP
下一篇:2020-10-22=cfDNA-eccDNA
收藏 IP: 113.108.133.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

全部作者的精选博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-5-2 17:08

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部