李雷廷的个人博客分享 http://blog.sciencenet.cn/u/llt001

博文

Picard MarkDuplicates running slower with more CPUs

已有 2515 次阅读 2019-1-31 14:01 |系统分类:科研笔记

Generally, for a program that support multi-threading, the elapsed time will reduce with the increasing number of used CPUs. However, I found a strang case that picard MarkDuplicates will run slower with more CPUs. When I run picard MarkDuplicates in a node with 160 CPUs, it will cost 1 to 2 hours, but it only cost about half an hour for a node with 32 CPUs.

picard MarkDuplicates is a java-based program without the option to set number of threads. I found it will automatically detect the available CPUs and try to use them all. For the reason picard MarkDuplicatesrunning slower with more CPUs, I assume it cost too much time on looping spliting jobs and retracting data.

Another case that running time has non-linear relationship with number of CPUs is RAxML (unlike picard MarkDuplicatesRAxML allows you to set number of threads). A few years ago, I tested the elapsed time with number of CPUs from 2 to 20 and found that RAxML working faster when you set number of threads as 8 to 10.




https://blog.sciencenet.cn/blog-656335-1160081.html

上一篇:Crisflash: 基于VCF文件设计CRISPR guide RNA的软件
下一篇:Overlapping variants called by GATK HaplotypeCaller
收藏 IP: 202.127.144.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

全部作者的精选博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-3-29 09:25

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部