User:Manoj

From UFRC
Jump to navigation Jump to search

VASP BENCHMARKING

Intel Machine ( E5-2643 @ 3.30GHz)

VASP Native FFT Library

Following library and flags were used:

MKLDIR    = $(HPC_MKL_DIR)
MKLLIBS   = -lmkl_intel_lp64 -lmkl_sequential -lmkl_core
FFTLIB = -lfftw3xf
INCS = -I$(MKLDIR)/include/fftw
FFT_OBJS = fftmpi.o fftmpi_map.o fftw3d.o fft3dlib.o
FFLAGS =  -free -names lowercase -assume byterecl
OFLAG  = -O2 -xsse2 -unroll-aggressive -warn general

As a first check, SIMD were changed and following is the result for MgMOS (For input file, please ask Charles Taylor or Manoj Srivastava):

SIMD Instruction Time(s)
sse2 158
sse4.1 156
sse4.2 155
avx 155
ssse3 156

MKL FFTs (via FFTW wrappers)

Upon profiling the code, we found that the code spends most of its time in the FFT libraries, so the next step is to change FFTW libraries. Following changes were made:

FFT_OBJS = fftmpi_map.o fftmpiw.o fftw3d.o fft3dlib.o

(The change here is replacement of "fftmpi.o" of the original VASP makefile with "fftmpiw.o")

MKLDIR    = $(HPC_MKL_DIR)
MKLLIBS   = -lmkl_intel_lp64 -lmkl_sequential -lmkl_core
FFTLIB = -lfftw3xf
INCS = -I$(MKLDIR)/include/fftw
FFLAGS = -free -names lowercase -assume byterecl
OFLAG  = -O2 -xsse2 -unroll-aggressive -warn general

Upon making above changes, about 60% improvement on the run time of the code was found on the Intel machine (E5-2643 @ 3.30GHz). Following table depicts the run time variation with SIMD instruction sets:

SIMD Instruction Time(s)
sse2 97
sse4.1 95
sse4.2 94
avx 94
ssse3 94

FFTW FFTs (via FFTW wrappers)

We further compiled VASP by using FFT library from FFTW with following flags:

MKLDIR    = $(HPC_MKL_DIR)
MKLLIBS   = -lmkl_intel_lp64 -lmkl_sequential -lmkl_core
FFTLIB = -lfftw3xf
INCS = -I$(MKLDIR)/include/fftw
FFT_OBJS = fftmpi.o fftmpi_map.o fftw3d.o fft3dlib.o
FFLAGS =  -free -names lowercase -assume byterecl
OFLAG  = -O2 -xsse2 -unroll-aggressive -warn general

As the performance of SIMD instruction set was not substantial, we only tried one sets. Following is the result: