李雷廷的个人博客分享 http://blog.sciencenet.cn/u/llt001

博文

Overlapping variants called by GATK HaplotypeCaller

已有 2489 次阅读 2019-1-31 14:03 |系统分类:科研笔记

Recently, I found that overlapping varaints existed in the results that called by GATK HaplotypeCaller (version 4.0.1.2). I noticed this because I want to build consensus sequences using bcftools consensus, which produced warnings like these:

The site Chr01:597519 overlaps with another variant, skipping...
The site Chr01:600176 overlaps with another variant, skipping...
The site Chr01:914371 overlaps with another variant, skipping...
...

Then I checked the VCF file, which verified the weired variants existing.

Chr01   597511  .       CTTCTTCTCTTTTTTTTT      C       ...
Chr01   597519  .       CT      C       ...

Chr01   600174  .       CTCTTT  C       ...
Chr01   600176  .       CT      C       ...

Chr01   914369  .       GTT     G       ...
Chr01   914371  .       T       TGGGGG        ...

The results were very strange, I don’t know why GATK HaplotypeCallerproduced these overlapping variants.

Furthermore, to update the correponding GFF/GTF files, creating CHAIN format file:

bcftools consensus -c out.chain -f ref.fasta -s sample_name -o out.fasta input.vcf.gz

Then convert the original GTF format file to new GTF file based the created consensus sequences using gtfToGenePredliftOver, and genePredToGtf, which were part of the UCSC Genome Browser utilities.

gtfToGenePred old.gtf old.gp
liftOver -genePred old.gp out.chain out.gp out.unMapped
genePredToGtf file out.gp out.gtf

Another weired thing is that liftOver produced a bunch of unmapped genes (in out.unMapped).



https://blog.sciencenet.cn/blog-656335-1160082.html

上一篇:Picard MarkDuplicates running slower with more CPUs
下一篇:Extract unmapped read pairs from a bam file
收藏 IP: 202.127.144.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

全部作者的精选博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-4-20 03:57

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部