||
关注:
VASP计算异常终止及解决办法
1、报错:
Image PC Routine Line Source
vasp.5.4.4 0000000004EA0CCA Unknown Unknown Unknown
libpthread-2.12.s 00000036CB60F710 Unknown Unknown Unknown
libmpi.so.12.0 00002AE947CF2C18 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
vasp.5.4.4 0000000004EA0CCA Unknown Unknown Unknown
libpthread-2.12.s 00000036CB60F710 Unknown Unknown Unknown
libmpi.so.12.0 00002B4B72541407 Unknown Unknown Unknown
libmpi.so.12 00002B4B722B9B65 PMPIDI_CH3I_Progr Unknown Unknown
libmpi.so.12.0 00002B4B7245C243 Unknown Unknown Unknown
libmpi.so.12.0 00002B4B722654B8 Unknown Unknown Unknown
libmpi.so.12 00002B4B722696E6 PMPI_Allreduce Unknown Unknown
libmpifort.so.12. 00002B4B71E17FF1 mpi_allreduce_ Unknown Unknown
vasp.5.4.4 0000000000446975 Unknown Unknown Unknown
vasp.5.4.4 00000000007F4426 Unknown Unknown Unknown
vasp.5.4.4 0000000000D9D2BE Unknown Unknown Unknown
vasp.5.4.4 00000000013917BD Unknown Unknown Unknown
vasp.5.4.4 000000000136E8A1 Unknown Unknown Unknown
vasp.5.4.4 0000000000438D1E Unknown Unknown Unknown
libc-2.12.so 00000036CB21ED5D __libc_start_main Unknown Unknown
vasp.5.4.4 0000000000438C29 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
vasp.5.4.4 0000000004EA0CCA Unknown Unknown Unknown
libpthread-2.12.s 00000036CB60F710 Unknown Unknown Unknown
libmpi.so.12.0 00002AE947CF2C18 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
vasp.5.4.4 0000000004EA0CCA Unknown Unknown Unknown
libpthread-2.12.s 00000036CB60F710 Unknown Unknown Unknown
libmpi.so.12.0 00002B4B72541407 Unknown Unknown Unknown
libmpi.so.12 00002B4B722B9B65 PMPIDI_CH3I_Progr Unknown Unknown
libmpi.so.12.0 00002B4B7245C243 Unknown Unknown Unknown
libmpi.so.12.0 00002B4B722654B8 Unknown Unknown Unknown
libmpi.so.12 00002B4B722696E6 PMPI_Allreduce Unknown Unknown
libmpifort.so.12. 00002B4B71E17FF1 mpi_allreduce_ Unknown Unknown
vasp.5.4.4 0000000000446975 Unknown Unknown Unknown
vasp.5.4.4 00000000007F4426 Unknown Unknown Unknown
vasp.5.4.4 0000000000D9D2BE Unknown Unknown Unknown
vasp.5.4.4 00000000013917BD Unknown Unknown Unknown
vasp.5.4.4 000000000136E8A1 Unknown Unknown Unknown
vasp.5.4.4 0000000000438D1E Unknown Unknown Unknown
libc-2.12.so 00000036CB21ED5D __libc_start_main Unknown Unknown
vasp.5.4.4 0000000000438C29 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
2、解决方案
http://muchong.com/html/201005/2084704.html
从上面的错误结果可以看出,有些共享库找不到。一般来说,并行程序(如vasp)的编译是在主节点上进行的,而运行是在各个计算节点进行的。有些人(尤其是某些公司)在安装机群时,将软件都装在主节点上,通过网络共享(nfs)的方式发布到各计算节点上。不过也有些是在各计算节点上全部装一遍。无论是哪一种情况,你需要去查找在各计算节点上是否能找到共享库。你用下面的命令查看一下目前已指定的共享位置都有哪些:
echo $LD_LIBRARY_PATH
然后看你那些需要共享的库文件所在的目录是否出现在上面命令的结果中。
例如看第一个错误的情况:locate libmpi.so.12.0
locate libmpi.so.12.0
/usr/mpi/gcc/openmpi-1.10.3rc4/lib64/libmpi.so.12.0.3
libmpi.so.12.0.3文件在/usr/mpi/gcc/openmpi-1.10.3rc4/lib64/中,可你的共享库路径中只有/mpi/lib,显然计算节点是无法找到这个共享文件的,所以你得手工加上。方法是,将下面语句加到你主目录下的.bashrc(或者.bash_profile)文件中去:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/mpi/gcc/openmpi-1.10.3rc4/lib64
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/mpi/gcc/openmpi-1.10.3rc4/lib64:/lib:/lib/i686/nosegneg:/lib64
重复以上步骤,直到将所有出现错误的共享库文件都能正确地被计算机搜索到,
echo $LD_LIBRARY_PATH
/public/home/users/application/compiler/parallel_studio_xe_2017/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64:/public/home/users/application/compiler/parallel_studio_xe_2017/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin:/public/home/users/application/compiler/parallel_studio_xe_2017/compilers_and_libraries_2017.4.196/linux/mpi/intel64/lib:/public/home/users/application/compiler/parallel_studio_xe_2017/compilers_and_libraries_2017.4.196/linux/mpi/mic/lib:/public/home/users/application/compiler/parallel_studio_xe_2017/compilers_and_libraries_2017.4.196/linux/ipp/lib/intel64:/public/home/users/application/compiler/parallel_studio_xe_2017/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin:/public/home/users/application/compiler/parallel_studio_xe_2017/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin:/public/home/users/application/compiler/parallel_studio_xe_2017/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64/gcc4.4:/public/home/users/application/compiler/parallel_studio_xe_2017/debugger_2017/iga/lib:/public/home/users/application/compiler/parallel_studio_xe_2017/debugger_2017/libipt/intel64/lib:/public/home/users/application/compiler/parallel_studio_xe_2017/compilers_and_libraries_2017.4.196/linux/daal/lib/intel64_lin:/public/home/users/application/compiler/parallel_studio_xe_2017/compilers_and_libraries_2017.4.196/linux/daal/../tbb/lib/intel64_lin/gcc4.4:/public/software/mpi/openmpi/1.6.5/intel/lib:/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64:/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64::/opt/gridview//pbs/dispatcher/lib:/usr/local/lib64:/usr/local/lib
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-11-21 20:32
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社