ljxue的个人博客分享 http://blog.sciencenet.cn/u/ljxue Liangjiao Xue, Bioinformatics is my favorite.

博文

Vcf.pm and tabix

已有 4866 次阅读 2014-10-14 23:09 |个人分类:Bioinformatics|系统分类:科研笔记| vcftools, tabix

Yesterday, I wrote one code to load a vcf file with 20million lines into the memory.

It's a huge hash, taking more than 5 G of memory.

Today, I checked the perl module Vcf.pm. It's very fast and memory-efficency.


Generally, I beleive their codes have been better tested than mine. I vote for this tool.


BTW, this module is based on tabix, which zip the gene-feature file and create one index file.

You can image there are thounands small files in the zipped file. As they are well indexed, the inforamtion will be easy to search rather than read from the very begining.

This tool support many formats, such as gff, bed, sam, vcf and psltab and you can also define your format.


I will zip all my gene files whenever there is a chance.


http://samtools.sourceforge.net/tabix.shtml


 






https://blog.sciencenet.cn/blog-285393-835701.html

上一篇:Github
下一篇:R code to split strings
收藏 IP: 128.192.8.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-6-13 18:03

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部