博文

VASP优化：NCORE、NPAR、KPAR设置

已有 29718 次阅读 2020-5-21 10:20 |个人分类:结构优化|系统分类:科研笔记

关注：

1) 并行计算参数设置

2) PBS文件解读

1、PBS文件解读

#!/bin/bash

#PBS -N test_vasp

#PBS -l select=1:ncpus=8:mpiprocs=8

#PBS -l walltime=720:00:00

#PBS -q workq

#PBS -j oe

cd $PBS_O_WORKDIR

VASP_EXEC=/data/apps/vasp-5.4.4/vasp

NP=`cat $PBS_NODEFILE | wc -l`

echo -n "start time " > log.$PBS_JOBNAME.$PBS_JOBID

date >> log.$PBS_JOBNAME.$PBS_JOBID

# for GK & YW users

touch WAVECAR PCDAT IBZKPT XDATCAR OSZICAR CONTCAR CHG PROCAR OUTCAR EIGENVAL DOSCAR CHGCAR

sleep 10

export OMP_NUM_THREADS=1

./calypso.x > log 2>&1

#mpirun -machinefile $PBS_NODEFILE -np $NP $VASP_EXEC

echo -n "end time " >> log.$PBS_JOBNAME.$PBS_JOBID

date >> log.$PBS_JOBNAME.$PBS_JOBID

2、NCORE警告

running on 8 total cores

distrk: each k-point on 8 cores, 1 groups

distr: one band on 1 cores, 8 groups

using from now: INCAR

vasp.5.4.4.18Apr17-6-g9f103f2a35 (build Mar 11 2020 21:12:28) complex

POSCAR found type information on POSCAR Li Y H

POSCAR found : 3 types and 48 ions

scaLAPACK will be used

-----------------------------------------------------------------------------

| |

| W W AA RRRRR N N II N N GGGG !!! |

| W W A A R R NN N II NN N G G !!! |

| W W A A R R N N N II N N N G !!! |

| W WW W AAAAAA RRRRR N N N II N N N G GGG ! |

| WW WW A A R R N NN II N NN G G |

| W W A A R R N N II N N GGGG !!! |

| |

| For optimal performance we recommend to set |

| NCORE= 4 - approx SQRT( number of cores) |

| NCORE specifies how many cores store one orbital (NPAR=cpu/NCORE). |

| This setting can greatly improve the performance of VASP for DFT. |

| The default, NCORE=1 might be grossly inefficient |

| on modern multi-core architectures or massively parallel machines. |

| Do your own testing !!!! |

| Unfortunately you need to use the default for GW and RPA calculations. |

| (for HF NCORE is supported but not extensively tested yet) |

| |

-----------------------------------------------------------------------------

| |

| ADVICE TO THIS USER RUNNING 'VASP/VAMP' (HEAR YOUR MASTER'S VOICE ...): |

| |

| You have a (more or less) 'large supercell' and for larger cells |

| it might be more efficient to use real space projection opertators |

| So try LREAL= Auto in the INCAR file. |

| Mind: For very accurate calculation you might also keep the |

| reciprocal projection scheme (i.e. LREAL=.FALSE.) |

| |

-----------------------------------------------------------------------------

running on 16 total cores

distrk: each k-point on 16 cores, 1 groups

distr: one band on 4 cores, 4 groups

using from now: INCAR

vasp.5.4.4.18Apr17-6-g9f103f2a35 (build Mar 11 2020 21:12:28) complex

POSCAR found : 3 types and 48 ions

scaLAPACK will be used

-----------------------------------------------------------------------------

| |

| ADVICE TO THIS USER RUNNING 'VASP/VAMP' (HEAR YOUR MASTER'S VOICE ...): |

| |

| You have a (more or less) 'large supercell' and for larger cells |

| it might be more efficient to use real space projection opertators |

| So try LREAL= Auto in the INCAR file. |

| Mind: For very accurate calculation you might also keep the |

| reciprocal projection scheme (i.e. LREAL=.FALSE.) |

| |

-----------------------------------------------------------------------------

3、摘录解读

如果是并行环境的话
KAPR和NPAR比较重要，影响计算时间。
有了NPAR，因为其优先级高，所以NCORE就不必要了，NCORE等于总核数除以NPAR

NPAR的选择，可以跑几秒钟任务断掉然后 grep NBAND OUTCAR,然后选个适中的数，一般可以选总核数开平方附近的数，让NPAR可以整除NBAND的数就可以了

k-points NKPTS = 4 k-points in BZ NKDIM = 4 number of bands NBANDS= 22

NPAR= SORT(NBANDS)

KPAR的做法可以和NPAR一样需要grep的是k-point

更严谨的说，你如果做的计算量大，需要测试这些参数，和测试ENCUT K点一样只不过你衡量的标准是一个任务在不同参数下的总核时 core.h

NPAR= SORT(NBANDS)

NCORE=number cores/NPAR

以前计算的时候从来没有关心过NPAR这个参数，只知道它是跟并行计算有关，一般不加。后来才知道这个NPAR参数若是设置的得当，可以提高并行计算的速度。

NCORE(算一个能带需要的核)×NPAR=N(总核数)

说明书上说一般设置为√N，例如对于我们使用的计算资源来讲，若你用4个节点，每个节点上有12个核，一共48个核，√48约为6.9，那NPAR可以设为6或者8。节点使用的越多，NPAR的影响就越大。

NBANDS也应该是NPAR的整数倍，如果不是它会自动增加到NPAR的倍数。

在计算前最好明确你算的体系的能带数目，因为若是总核数大于NBANDS，则会造成资源的浪费。

https://www.vasp.at/wiki/index.php/NCORE

NCORE

From VaspwikiJump to navigation Jump to search

NCORE = [integer]
Default: NCORE = 1

Description: NCORE determines the number of compute cores that work on an individual orbital (available as of VASP.5.2.13).

VASP currently offers parallelization and data distribution over bands and/or over plane wave coefficients, and as of VASP.5.3.2, parallelization over k-points (no data distribution, see KPAR). To obtain high efficiency on massively parallel systems or modern multi-core machines, it is strongly recommended to use all at the same time.

Most algorithms work with any data distribution (except for the single band conjugated gradient, which is considered to be obsolete).

NCORE determines how many cores share the work on an individual orbital. The current default is NCORE=1, meaning that one orbital is treated by one core.

NPAR is then set to the total number of cores. If NCORE equals the total number of cores, NPAR is set to 1. This implies data distribution over plane wave coefficients only: all cores will work together on every individual band, i.e., the plane wave coefficients of each band are distributed over all cores. This is usually very slow and should be avoided.

NCORE=1 is the optimal setting for platforms with a small communication bandwidth and is a good choice for up to 8 cores, as well as for machines with a single core per node and a Gigabit network. However, this mode substantially increases the memory requirements, because the non-local projector functions must be stored entirely on each core. In addition, substantial all-to-all communications are required to orthogonalize the bands.

On massively parallel systems and modern multi-core machines we strongly urge to set

NPAR≈number-of-cores{\displaystyle {\textrm {NPAR}}\approx {\sqrt {{\textrm {number-of-cores}}}}} ${\textrm {NPAR}}\approx {\sqrt {{\textrm {number-of-cores}}}}$

NCORE=number-of-cores-per-node{\displaystyle {\textrm {NCORE}}={\textrm {number-of-cores-per-node}}} ${\textrm {NCORE}}={\textrm {number-of-cores-per-node}}$

In selected cases, we found that this improves the performance by a factor of up to four compared to the default, and it also significantly improves the stability of the code due to reduced memory requirements.

NCORE is available from VASP.5.2.13 on, and is more handy than the previous parameter NPAR. The user should either specify NCORE or NPAR, where NPAR takes a higher preference. The relation between both parameters is

NCORE=number-of-cores/NPAR{\displaystyle {\textrm {NCORE}}={\textrm {number-of-cores}}/{\textrm {NPAR}}} ${\textrm {NCORE}}={\textrm {number-of-cores}}/{\textrm {NPAR}}$
[NPAR像节点数？】

The optimum settings for NPAR and LPLANE depend strongly on the type of machine you are using. Some recommended setups:

LINUX cluster linked by Infiniband, modern multicore machines:

On a LINUX cluster with multicore machines linked by a fast network we recommend to set

LPLANE = .TRUE. NCORE  = e (e.g. 4 or 8) LSCALU = .FALSE. NSIM   = 4

If very many nodes are used, it might be necessary to set LPLANE = .FALSE., but usually this offers very little advantage. For long (e.g. molecular dynamics runs), we recommend to optimize NPAR by trying short runs for different settings.

LINUX cluster linked by 1 Gbit Ethernet, and LINUX clusters with single cores:

On a LINUX cluster linked by a relatively slow network, LPLANE must be set to .TRUE., and the NPAR flag should be equal to the number of cores:

LPLANE = .TRUE. NCORE  = 1 LSCALU = .FALSE. NSIM   = 4

Mind that you need at least a 100 Mbit full duplex network, with a fast switch offering at least 2 Gbit switch capacity to find usefull speedups. Multi-core machines should be always linked by an Infiniband, since Gbit is too slow for multi-core machines.

Massively parallel machines (Cray, Blue Gene):

On many massively parallel machines one is forced to use a huge number of cores. In this case load balancing problems and problems with the communication bandwidth are likely to be experienced. In addition the local memory is fairly small on some massively parallel machines; too small keep the real space projectors in the cache with any setting. Therefore, we recommend to set NPAR on these machines to √# of cores (explicit timing can be helpful to find the optimum value). The use of LPLANE=.TRUE. is only recommended if the number of nodes is significantly smaller than NGX, NGY and NGZ.

In summary, the following setting is recommended

LPLANE = .FALSE. NPAR   = sqrt(number of cores) NSIM   = 1

NPAR = [integer]
Default: NPAR = number of cores

Description: NPAR determines the number of bands that are treated in parallel.

NPAR determines how many bands are treated in parallel. The current default is NPAR=number of cores, meaning that one orbital is treated by one core. NCORE is then set to 1. If NPAR=1, NCORE is set to the number of cores. This implies data distribution over plane wave coefficients only: all cores will work together on every individual band, i.e., the plane wave coefficients of each band are distributed over all cores. This is usually very slow and should be avoided.

NPAR=number of cores is the optimal setting for platforms with a small communication bandwidth and is a good choice for up to 8 cores, as well as for machines with a single core per node and a Gigabit network. However, this mode substantially increases the memory requirements, because the non-local projector functions must be stored entirely on each core. In addition, substantial all-to-all communications are required to orthogonalize the bands. On massively parallel systems and modern multi-core machines we strongly u

KPAR

Jump to navigation Jump to search

KPAR = [integer]
Default: KPAR = 1

Description: KPAR determines the number of k-points that are to be treated in parallel (available as of VASP.5.3.2). Also, KPAR is used as parallelization tag for Laplace transformed MP2 calculations.

VASP currently offers parallelization and data distribution over bands and/or over plane wave coefficients (see NCORE and NPAR), and as of VASP.5.3.2, parallelization over k-points. To obtain high efficiency on massively parallel systems or modern multi-core machines, it is strongly recommended to use all at the same time. Most algorithms work with any data distribution (except for the single band conjugated gradient, which is considered to be obsolete).

The set of k-points is distributed over KPAR groups of compute cores, in a round-robin fashion. This means that a group of N=(# of cores/KPAR) compute cores together work on an individual k-point (choose KPAR such that it is an integer divisor of the total number of cores). Within this group of N cores that share the work on an individual k-point, the usual parallelism over bands and/or plane wave coefficients applies (as set by means of the NCORE and NPAR tags).

Note: the data is not distributed additionally over k-points.

Note: KPAR becomes obsolete if LMP2LT or LSMP2LT are set and specifies the number of plane-waves treated in parallel, see here for more information.

转载本文请联系原作者获取授权，同时请注明本文来自叶小球科学网博客。
链接地址：https://blog.sciencenet.cn/blog-567091-1234210.html

上一篇：晶界及位错模型构建
下一篇：VASP报错：forrtl: error (78): process killed (SIGTERM)

收藏 IP: 182.137.41.*| 热度|

当前推荐数：0

该博文允许注册用户评论请点击登录评论 (0 个评论)

数据加载中...

返回顶部

叶小球

扫一扫，分享此博文

xiaoqiugood的个人博客分享 http://blog.sciencenet.cn/u/xiaoqiugood

博文

VASP优化：NCORE、NPAR、KPAR设置

https://www.vasp.at/wiki/index.php/NCORE

NCORE

KPAR

当前推荐数：0

该博文允许注册用户评论请点击登录评论 (0 个评论)

叶小球

全部作者的其他最新博文

全部精选博文导读

xiaoqiugood的个人博客分享 http://blog.sciencenet.cn/u/xiaoqiugood

博文

VASP优化：NCORE、NPAR、KPAR设置

https://www.vasp.at/wiki/index.php/NCORE

NCORE

KPAR

当前推荐数：0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

叶小球

全部作者的其他最新博文

全部精选博文导读

该博文允许注册用户评论请点击登录评论 (0 个评论)