|||
VASP.5.2.12编译: Intel Fortran+MPI+MKL
2014–05–19 09:56:26
并行版本VASP编译
编译器: Intel Fortran
并行库: Intel MPI
数学库: Intel MKL
准备工作
1 . Fortran编译器, MPI库与MKL库安装好. 若系统使用module, 只须load即可.
module show intel/13.1.0
-------------------------------------------------------------------/share/apps/modules/Modules/modulefiles/intel/13.1.0:module-whatis Intel Compiler module-whatis Version: 13.1.0 module-whatis Category: compiler, runtime support module-whatis Description: Intel Compiler Family (C/C++/Fortran for x86_64) module-whatis URL: http://www.intel.com/cd/software/products/asmo-na/eng/compilers/284132.htm prepend-path PATH /share/apps/intel/composer_xe_2013.2.146/bin prepend-path MANPATH /share/apps/intel/composer_xe_2013.2.146/man/en_US prepend-path INCLUDE /share/apps/intel/composer_xe_2013.2.146/mkl/include:/share/apps/intel/composer_xe_2013.2.146/ipp/include prepend-path LD_LIBRARY_PATH /share/apps/intel/composer_xe_2013.2.146/lib/intel64 prepend-path LIBRARY_PATH /share/apps/intel/composer_xe_2013.2.146/lib/intel64 prepend-path NLS_PATH /share/apps/intel/composer_xe_2013.2.146/lib/intel64/locale/%l_%t/%N setenv COMPILER_TYPE intel setenv COMPILER_VERSION 13.1.0 setenv INTEL_LICENSE_FILE 28518@192.168.100.1 -------------------------------------------------------------------
module show impi/4.1.0
-------------------------------------------------------------------/share/apps/modules/Modules/modulefiles/impi/4.1.0:module-whatis Intel Compiler module-whatis Version: 4.1.0 module-whatis Category: compiler, runtime support module-whatis Description: Intel Compiler Family (C/C++/Fortran for x86_64) module-whatis URL: http://www.intel.com/cd/software/products/asmo-na/eng/compilers/284132.htm setenv version 4.1.0 setenv Intel_FC_Home /share/apps/intel/impi/4.1.0.030 prepend-path PATH /share/apps/intel/impi/4.1.0.030/intel64/bin prepend-path MANPATH /share/apps/intel/impi/4.1.0.030/man prepend-path LD_LIBRARY_PATH /share/apps/intel/impi/4.1.0.030/intel64/lib setenv I_MPI_ROOT /share/apps/intel/impi/4.1.0.030 setenv I_MPI_FABRICS shm:tmi setenv TMI_CONFIG /share/apps/intel/impi/4.1.0.030/intel64/etc/tmi.conf setenv INTEL_LICENSE_FILE 28518@192.168.100.1 -------------------------------------------------------------------
module show mkl/13.1.0
-------------------------------------------------------------------/share/apps/modules/Modules/modulefiles/mkl/13.1.0:module-whatis Intel MKL module-whatis Version: 13.1.0 module-whatis Category: compiler, runtime support module-whatis Description: Intel Compiler Family (C/C++/Fortran for x86_64) module-whatis URL: http://www.intel.com/cd/software/products/asmo-na/eng/compilers/284132.htm setenv MKL_ROOT /share/apps/intel/composer_xe_2013.2.146/composer_xe_2013.2.146/mkl -------------------------------------------------------------------
2 . 检查编译器, 运行库, 路径无误
which ifort 给出
/share/apps/intel/composer_xe_2013.2.146/bin/ifort
which mpiifort 给出
/share/apps/intel/impi/4.1.0.030/intel64/bin/mpiifort
which mpirun 给出
/share/apps/intel/impi/4.1.0.030/intel64/bin/mpirun
编译
下载VASP源码, vasp.5.2.12.tar.gz与vasp.5.lib.tar.gz
解压
tar -xzvf vasp.5.2.12.tar.gz, 得文件夹 vasp.5.2
tar -xzvf vasp.5.lib.tar.gz, 得文件夹 vasp.5.lib
编译库文件, 简单, 直接使用makefile.linux_ifc_P4
将19行FC=ifc改为FC=mpiifort
make -f makefile.linux_ifc_P4
得libdmy.a和linpack_double.o, 即成功
编译主程序, 复杂, 牵涉到数学库, FFT库, 并行库的选择, 需要修改makefile.linux_ifc_P4.
原则是尽可能使用Intel自家的东西, 简单且效率好, 故使用MKL及其自带的FFTW, 并行库使用IntelMPI
编译器选项可在Intel官网查询
一份修改好的makefile及其简单注释如下
将其保存为makefile
make
得vasp即成功, 编译中有警告, 但不致命
# MKL及其FFTW路径
MKLROOT=/share/apps/intel/composer_xe_2013.2.146/composer_xe_2013.2.146/mkl
FFTWROOT=${MKLROOT}/include/fftw
# 扩展名
.SUFFIXES: .inc .f .f90 .F
# 预处理扩展名
SUFFIX=.f90
# MPI Fortran编译器, 链接器, 可使用绝对路径
# 增加FFTW路径, 以便使用MKL自带的FFTW
FC=mpiifort -I${FFTWROOT}
FCL=$(FC)
# fpp预处理选项
CPP_=fpp -f_com=no -free -w0 $*.F $*$(SUFFIX)
# 编译选项. 注意
# 1. 行尾必须有空格, 数目不限
# 2. byterecl必须使用, 否则WAVECAR文件极大
FFLAGS = -FR -lowercase -assume byterecl
# 优化选项
# 增加 -xHost -axAVX
OFLAG=-O2 -ip -ftz -xHost -axAVX
# 其他编译选项
OFLAG_HIGH = $(OFLAG)
OBJ_HIGH =
OBJ_NOOPT =
DEBUG = -FR -O0
INLINE = $(OFLAG)
# MKL, BLAS, LAPACK使用选项
# 1. CPP选项中必须设置 -DRPROMU_DGEMV -DRACCMU_DGEMV
# 2. 使用静态库, 速度可能稍快
# 3. 可参考https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor
BLAS=$(MKLROOT)/lib/intel64/libmkl_blas95_lp64.a
LAPACK=$(MKLROOT)/lib/intel64/libmkl_lapack95_lp64.a
# 链接选项, 使用静态库
LINK=-Wl,--start-group \
$(MKLROOT)/lib/intel64/libmkl_intel_lp64.a \
$(MKLROOT)/lib/intel64/libmkl_core.a \
$(MKLROOT)/lib/intel64/libmkl_intel_thread.a \
-Wl,--end-group \
-lpthread -liomp5 -lmpi -lm
# CPP并行选项
# NGZhalf charge density reduced in Z direction
# wNGZhalf gamma point only reduced in Z direction
# scaLAPACK use scaLAPACK (usually slower on 100 Mbit Net)
# avoidalloc avoid ALLOCATE if possible
# PGF90 work around some for some PGF90 / IFC bugs
# CACHE_SIZE 1000 for PII,PIII, 5000 for Athlon, 8000-12000 P4, PD
# RPROMU_DGEMV use DGEMV instead of DGEMM in RPRO (depends on used BLAS)
# RACCMU_DGEMV use DGEMV instead of DGEMM in RACC (depends on used BLAS)
# tbdyn MD package of Tomas Bucko
CPP = $(CPP_) -DMPI -DHOST=\"LinuxIFC\" -DIFC \
-DCACHE_SIZE=4000 -DPGF90 -Davoidalloc -DNGZhalf \
-DMPI_BLOCK=8000 \
-DRPROMU_DGEMV -DRACCMU_DGEMV
# MPI库
LIB = -L../vasp.5.lib -ldmy \
../vasp.5.lib/linpack_double.o $(LAPACK) $(BLAS)
# 使用MKL自带的FFTW
FFT3D = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o
# 或使用VASP自带的FFTW
#FFT3D = fftmpi.o fftmpi_map.o fft3dfurth.o fft3dlib.o
# 一般规则, 编译命令行, 以下不可修改
BASIC= symmetry.o symlib.o lattlib.o random.o
SOURCE= base.o mpi.o smart_allocate.o xml.o \
constant.o jacobi.o main_mpi.o scala.o \
asa.o lattice.o poscar.o ini.o mgrid.o xclib.o vdw_nl.o xclib_grad.o \
radial.o pseudo.o gridq.o ebs.o \
mkpoints.o wave.o wave_mpi.o wave_high.o \
$(BASIC) nonl.o nonlr.o nonl_high.o dfast.o choleski2.o \
mix.o hamil.o xcgrad.o xcspin.o potex1.o potex2.o \
constrmag.o cl_shift.o relativistic.o LDApU.o \
paw_base.o metagga.o egrad.o pawsym.o pawfock.o pawlhf.o rhfatm.o paw.o \
mkpoints_full.o charge.o Lebedev-Laikov.o stockholder.o dipol.o pot.o \
dos.o elf.o tet.o tetweight.o hamil_rot.o \
steep.o chain.o dyna.o sphpro.o us.o core_rel.o \
aedens.o wavpre.o wavpre_noio.o broyden.o \
dynbr.o rmm-diis.o reader.o writer.o tutor.o xml_writer.o \
brent.o stufak.o fileio.o opergrid.o stepver.o \
chgloc.o fast_aug.o fock.o mkpoints_change.o sym_grad.o \
mymath.o internals.o dynconstr.o dimer_heyden.o dvvtrajectory.o vdwforcefield.o \
hamil_high.o nmr.o pead.o mlwf.o subrot.o subrot_scf.o \
force.o pwlhf.o gw_model.o optreal.o davidson.o david_inner.o \
electron.o rot.o electron_all.o shm.o pardens.o paircorrection.o \
optics.o constr_cell_relax.o stm.o finite_diff.o elpol.o \
hamil_lr.o rmm-diis_lr.o subrot_cluster.o subrot_lr.o \
lr_helper.o hamil_lrf.o elinear_response.o ilinear_response.o \
linear_optics.o linear_response.o \
setlocalpp.o wannier.o electron_OEP.o electron_lhf.o twoelectron4o.o \
ratpol.o screened_2e.o wave_cacher.o chi_base.o wpot.o local_field.o \
ump2.o bse_te.o bse.o acfdt.o chi.o sydmat.o dmft.o \
rmm-diis_mlr.o linear_response_NMR.o
vasp: $(SOURCE) $(FFT3D) $(INC) main.o
rm -f vasp
$(FCL) -o vasp main.o $(SOURCE) $(FFT3D) $(LIB) $(LINK)
makeparam: $(SOURCE) $(FFT3D) makeparam.o main.F $(INC)
$(FCL) -o makeparam $(LINK) makeparam.o $(SOURCE) $(FFT3D) $(LIB)
zgemmtest: zgemmtest.o base.o random.o $(INC)
$(FCL) -o zgemmtest $(LINK) zgemmtest.o random.o base.o $(LIB)
dgemmtest: dgemmtest.o base.o random.o $(INC)
$(FCL) -o dgemmtest $(LINK) dgemmtest.o random.o base.o $(LIB)
ffttest: base.o smart_allocate.o mpi.o mgrid.o random.o ffttest.o $(FFT3D) $(INC)
$(FCL) -o ffttest $(LINK) ffttest.o mpi.o mgrid.o random.o smart_allocate.o base.o $(FFT3D) $(LIB)
kpoints: $(SOURCE) $(FFT3D) makekpoints.o main.F $(INC)
$(FCL) -o kpoints $(LINK) makekpoints.o $(SOURCE) $(FFT3D) $(LIB)
clean:
-rm -f *.g *.f *.o *.L *.mod *.f90; touch *.F
main.o: main$(SUFFIX)
$(FC) $(FFLAGS)$(DEBUG) $(INCS) -c main$(SUFFIX)
xcgrad.o: xcgrad$(SUFFIX)
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcgrad$(SUFFIX)
xcspin.o: xcspin$(SUFFIX)
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcspin$(SUFFIX)
makeparam.o: makeparam$(SUFFIX)
$(FC) $(FFLAGS)$(DEBUG) $(INCS) -c makeparam$(SUFFIX)
makeparam$(SUFFIX): makeparam.F main.F
#
# MIND: I do not have a full dependency list for the include
# and MODULES: here are only the minimal basic dependencies
# if one strucuture is changed then touch_dep must be called
# with the corresponding name of the structure
#
base.o: base.inc base.F
mgrid.o: mgrid.inc mgrid.F
constant.o: constant.inc constant.F
lattice.o: lattice.inc lattice.F
setex.o: setexm.inc setex.F
pseudo.o: pseudo.inc pseudo.F
poscar.o: poscar.inc poscar.F
mkpoints.o: mkpoints.inc mkpoints.F
wave.o: wave.F
nonl.o: nonl.inc nonl.F
nonlr.o: nonlr.inc nonlr.F
$(OBJ_HIGH):
$(CPP)
$(FC) $(FFLAGS) $(OFLAG_HIGH) $(INCS) -c $*$(SUFFIX)
$(OBJ_NOOPT):
$(CPP)
$(FC) $(FFLAGS) $(INCS) -c $*$(SUFFIX)
fft3dlib_f77.o: fft3dlib_f77.F
$(CPP)
$(F77) $(FFLAGS_F77) -c $*$(SUFFIX)
.F.o:
$(CPP)
$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)
.F$(SUFFIX):
$(CPP)
$(SUFFIX).o:
$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)
# special rules
#-----------------------------------------------------------------------
# these special rules are cummulative (that is once failed
# in one compiler version, stays in the list forever)
# -tpp5|6|7 P, PII-PIII, PIV
# -xW use SIMD (does not pay of on PII, since fft3d uses double prec)
# all other options do no affect the code performance since -O1 is used
fft3dlib.o : fft3dlib.F
$(CPP)
$(FC) -FR -lowercase -O2 -c $*$(SUFFIX)
fft3dfurth.o : fft3dfurth.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)
fftw3d.o : fftw3d.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)
wave_high.o : wave_high.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)
radial.o : radial.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)
symlib.o : symlib.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)
symmetry.o : symmetry.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)
wave_mpi.o : wave_mpi.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)
wave.o : wave.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)
dynbr.o : dynbr.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)
asa.o : asa.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)
broyden.o : broyden.F
$(CPP)
$(FC) -FR -lowercase -O2 -c $*$(SUFFIX)
us.o : us.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)
LDApU.o : LDApU.F
$(CPP)
$(FC) -FR -lowercase -O2 -c $*$(SUFFIX)
运行测试
利用VASP自带的bench.Hg.tar.gz进行测试
1 . 解压 tar -xzvf bench.Hg.tar.gz
2 . 复制INCAR, KPOINTS, POSCAR, POTCAR四个文件到vasp.5.2文件夹下
3 . 单核运行 ./vasp, 耗时45.221s, 屏幕输出
running on 1 nodesdistr: one band on 1 nodes, 1 groupsvasp.5.2.12 11Nov11 complex......entering main loop N E dE d eps ncg rms rms(c)RMM: 1 -0.514507058760E+05 -0.51451E+05 -0.13177E+05 316 0.780E+02RMM: 2 -0.527604338595E+05 -0.13097E+04 -0.23675E+04 316 0.234E+02RMM: 3 -0.529743353776E+05 -0.21390E+03 -0.41254E+03 316 0.116E+02RMM: 4 -0.531145169975E+05 -0.14018E+03 -0.15769E+03 316 0.784E+01RMM: 5 -0.531789029672E+05 -0.64386E+02 -0.67142E+02 316 0.452E+01RMM: 6 -0.532264453365E+05 -0.47542E+02 -0.47991E+02 720 0.309E+01RMM: 7 -0.532330334403E+05 -0.65881E+01 -0.94371E+01 762 0.919E+00 0.871E+00RMM: 8 -0.532322794427E+05 0.75400E+00 -0.37182E+01 697 0.816E+00 0.265E+00RMM: 9 -0.532327283030E+05 -0.44886E+00 -0.88476E+00 702 0.383E+00 0.129E+00RMM: 10 -0.532327148448E+05 0.13458E-01 -0.69686E-01 695 0.120E+00 0.550E-01RMM: 11 -0.532327089541E+05 0.58908E-02 -0.18550E-01 693 0.501E-01 0.247E-01RMM: 12 -0.532327075118E+05 0.14423E-02 -0.34613E-02 691 0.226E-01 0.756E-02RMM: 13 -0.532327075990E+05 -0.87187E-04 -0.65477E-03 688 0.823E-02 1 F= -.53232708E+05 E0= -.53232710E+05 d E =0.749678E-02
4 . 多核并行 mpirun -np 12 ./vasp, 耗时7.931s, 屏幕输出
running on 12 nodesdistr: one band on 3 nodes, 4 groupsvasp.5.2.12 11Nov11 complex......entering main loop N E dE d eps ncg rms rms(c)RMM: 1 -0.514507058760E+05 -0.51451E+05 -0.13177E+05 316 0.780E+02RMM: 2 -0.527604338595E+05 -0.13097E+04 -0.23675E+04 316 0.234E+02RMM: 3 -0.529743353776E+05 -0.21390E+03 -0.41254E+03 316 0.116E+02RMM: 4 -0.531145169975E+05 -0.14018E+03 -0.15769E+03 316 0.784E+01RMM: 5 -0.531789029672E+05 -0.64386E+02 -0.67142E+02 316 0.452E+01RMM: 6 -0.532264453365E+05 -0.47542E+02 -0.47991E+02 720 0.309E+01RMM: 7 -0.532330334403E+05 -0.65881E+01 -0.94371E+01 762 0.919E+00 0.871E+00RMM: 8 -0.532322794427E+05 0.75400E+00 -0.37182E+01 697 0.816E+00 0.265E+00RMM: 9 -0.532327283030E+05 -0.44886E+00 -0.88476E+00 702 0.383E+00 0.129E+00RMM: 10 -0.532327148448E+05 0.13458E-01 -0.69686E-01 695 0.120E+00 0.550E-01RMM: 11 -0.532327089541E+05 0.58908E-02 -0.18550E-01 693 0.501E-01 0.247E-01RMM: 12 -0.532327075118E+05 0.14423E-02 -0.34613E-02 691 0.226E-01 0.756E-02RMM: 13 -0.532327075990E+05 -0.87187E-04 -0.65477E-03 688 0.823E-02 1 F= -.53232708E+05 E0= -.53232710E+05 d E =0.749678E-02
多核与单核结果一致, 说明并行无误
5 . 将OSZICAR与OSZICAR.ref, OUTCAR与OUTCAR.ref做比较, 所得结果有所不同, 原因在于OSZICAR.ref与OUTCAR.ref所用版本为 vasp.4.4.4 10Jan99
说明
Windows下面的编译, 原则方法相同, 但需要注意的地方更多, 暂不推荐
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-11-22 19:53
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社