博文

Bulk sequencing for QTL analysis pipeline 8-popoolation2

已有 3257 次阅读 2017-5-12 19:52 |系统分类:科研笔记

PoPoolation2 是通过比较两个混合群体得位点频率的pipeline。

实际上，PoPoolation2 既可以用两个Bi-parental 群体构建混合池，也可以用自然群体的极端材料构建混合池，从而用于GWAS分析。我通常拿到数据后立即用PoPoolation2分析一下定位情况，因为在这些BSA分析方法中，PoPoolation2我认为是是耗时最少的最快捷的方法，其准确性也并不低。

事实上，PoPoolation2的开发者也是属于群体遗传领域里面的大牛，这个是他的发表文章链接https://scholar.google.ca/citations?user=QfLnM80AAAAJ。也专门也写过关于混合群体分析综述

Sequencing pools of individuals — mining genome-wide polymorphism data without big funding，有兴趣的可以去搜索看看。

PoPoolation 2的详细的tutoral也可以从这个链接找到：https://code.google.com/archive/p/popoolation2/wikis/Tutorial.wiki#Data。

这里我只是简单的介绍一下流程：

第一步：index reference

这个取决于你后面的bam文件用什么软件生成，通常我是用bwa生成的bam文件。

bwa index ref.fa.

第二步：map reads到reference genome。

第三步：remove 重复错误mapping的reads。

其实我通常是用GATK的pipeline生成的relan bam文件。

第四步：创建 synchronized file

samtools mpileup -B H.recal.bam L.recal.bam > HL.mpileup

java -ea -Xmx128g -jar mpileup2sync.jar --input HL.mpileup --output HL_java.sync --fastq-type sanger --min-qual 20 --threads 20

Sample of a synchronized file:

2R 2302 N 0:7:0:0:0:0 0:7:0:0:0:0 2R 2303 N 0:8:0:0:0:0 0:8:0:0:0:0 2R 2304 N 0:0:9:0:0:0 0:0:9:0:0:0 2R 2305 N 1:0:9:0:0:0 0:0:9:1:0:0

col1: reference contig
col2: position within the refernce contig
col3: reference character
col4: allele frequencies of population number 1
col5: allele frequencies of population number 2
coln: allele frequencies of population number n

第五步：计算 allele frequency differences

perl snp-frequency-diff.pl --input HL_java.sync --output-prefix HL --min-count 1 --min-coverage 50 --max-coverage 1000

第六步：计算 Fst-values

perl fst-sliding.pl --input HL_java.sync --output HL.fst --suppress-noninformative --min-count 1 --min-coverage 50 --max-coverage 1000 --min-covered-fraction 1 --window-size 1 --step-size 1 --pool-size 180

第七步：计算Fisher's Exact Test: estimate the significance of allele frequency differences

perl fisher-test.pl --input HL_java.sync --output HL.fet --min-count 1 --min-coverage 50 --max-coverage 1000 --suppress-noninformative

第八步：作图

References

Robert Kofler, Ram Vinay Pandey, Christian Schlötterer; PoPoolation2: identifying differentiation between populations using sequencing of pooled DNA samples (Pool-Seq). Bioinformatics 2011; 27 (24): 3435-3436. doi: 10.1093/bioinformatics/btr589

转载本文请联系原作者获取授权，同时请注明本文来自付福友科学网博客。
链接地址：https://blog.sciencenet.cn/blog-247610-1054605.html

上一篇：Bulk sequencing for QTL analysis pipeline 7-BSR-seq
下一篇：母亲节忆母亲

收藏 IP: 70.64.42.*| 热度|

当前推荐数：0

该博文允许注册用户评论请点击登录评论 (0 个评论)

数据加载中...

返回顶部

付福友

扫一扫，分享此博文

农田守望者分享 http://blog.sciencenet.cn/u/sunnycqcn

博文

Bulk sequencing for QTL analysis pipeline 8-popoolation2

当前推荐数：0

该博文允许注册用户评论请点击登录评论 (0 个评论)

付福友

全部作者的其他最新博文

全部精选博文导读

相关博文

农田守望者分享 http://blog.sciencenet.cn/u/sunnycqcn

博文

Bulk sequencing for QTL analysis pipeline 8-popoolation2

当前推荐数：0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

付福友

全部作者的其他最新博文

全部精选博文导读

相关博文

该博文允许注册用户评论请点击登录评论 (0 个评论)