|
Recently, I found that overlapping varaints existed in the results that called by GATK HaplotypeCaller
(version 4.0.1.2). I noticed this because I want to build consensus sequences using bcftools consensus
, which produced warnings like these:
The site Chr01:597519 overlaps with another variant, skipping... The site Chr01:600176 overlaps with another variant, skipping... The site Chr01:914371 overlaps with another variant, skipping... ...
Then I checked the VCF file, which verified the weired variants existing.
Chr01 597511 . CTTCTTCTCTTTTTTTTT C ... Chr01 597519 . CT C ... Chr01 600174 . CTCTTT C ... Chr01 600176 . CT C ... Chr01 914369 . GTT G ... Chr01 914371 . T TGGGGG ...
The results were very strange, I don’t know why GATK HaplotypeCaller
produced these overlapping variants.
Furthermore, to update the correponding GFF/GTF files, creating CHAIN format file:
bcftools consensus -c out.chain -f ref.fasta -s sample_name -o out.fasta input.vcf.gz
Then convert the original GTF format file to new GTF file based the created consensus sequences using gtfToGenePred
, liftOver
, and genePredToGtf
, which were part of the UCSC Genome Browser utilities.
gtfToGenePred old.gtf old.gp liftOver -genePred old.gp out.chain out.gp out.unMapped genePredToGtf file out.gp out.gtf
Another weired thing is that liftOver
produced a bunch of unmapped genes (in out.unMapped
).
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-11-25 07:07
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社