|||
Yesterday, I wrote one code to load a vcf file with 20million lines into the memory.
It's a huge hash, taking more than 5 G of memory.
Today, I checked the perl module Vcf.pm. It's very fast and memory-efficency.
Generally, I beleive their codes have been better tested than mine. I vote for this tool.
BTW, this module is based on tabix, which zip the gene-feature file and create one index file.
You can image there are thounands small files in the zipped file. As they are well indexed, the inforamtion will be easy to search rather than read from the very begining.
This tool support many formats, such as gff, bed, sam, vcf and psltab and you can also define your format.
I will zip all my gene files whenever there is a chance.
http://samtools.sourceforge.net/tabix.shtml
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-6-14 11:15
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社