How to build Molcas-7.4 on Intel Westmere with Infiniband network
How to build Molcas-7.4 on Intel Westmere with Infiniband network
About MOLCAS
MOLCAS is a quantum chemistry software developed by scientists to be used by scientists. It is not primarily a commercial product and it is not sold in order to produce a fortune for its owner (the Lund University). The authors have tried in MOLCAS to assemble their collected experience and knowledge in computational quantum chemistry. MOLCAS is a research product and it is used as a platform by the Lund quantum chemistry group in their work to develop new and improved computational tools in quantum chemistry. Most of the codes in the software have newly developed features and the user should not be surprised if a bug is found now and then.
Official website : http://www.teokem.lu.se/molcas/
This document explains how to build Molcas-7.4 on Intel Westmere with Infiniband network using the follow software:
- Intel Compiler Suite 11.1.072 (also includes MKL)
- GlobalArrays 1.4.1
- OpenMPI-1.4.2*
* OpenMPI was build with ICS-11.1.072, BLCR-0.8.2, OFED-1.5.1 and with SGE flags, and GlobalArrays was build with OpenMPI-1.4.2.
The performance results for OpenMPI and Intel-MPI, are almost the same. Finally we chose OpenMPI because of easy SGE integration and the checkpointing & restart options.
It's important to know that this build it's highly optimised for our environment, and obviously, if you have other network or architecture, you will have to investigate what kind of compilers, libraries and parallel environments offers to you the best performance.
Previous
Before evaluate other compilers, libraries and compiling options, we obtains the best performance with this proceeding.
Environment Set Up
First at all, I load the modules needed to build this software. We used to integrate the dependencies inside the module files. In this case, when we load the OpenMPI environtment, this module loads the Intel Compiler Suite, BLCR and OFED modules also.
# module load OpenMPI/1.4.2_ics-11.1.072_ofed-1.5.1_blcr-8.2
# module load intel_mkl/11.1.072
# module list
Currently Loaded Modulefiles:
1) intel_compiler_suite/11.1.072
2) blcr/0.8.2
3) OFED/1.5.1
4) OpenMPI/1.4.2_ics-11.1.072_ofed-1.5.1_blcr-8.2
5) intel_mkl/11.1.072
Molcas can do several class of calculations, and the scalability of this kind of processes can be improved if you use MPI or OpenMP. That's why we will build Molcas twice, one for MPI only and other for OpenMP only.
MPI version
Global Arrays
First, we build the global arrays
# tar -xvf molcas74.tar
# mv molcas74 molcas74_ompi
# cd molcas74_ompi/
# cd g
# gmake TARGET=LINUX64 FC=mpif77 CC=mpicc | tee -a make_nehalem-64.log
at the end, you will find these output:
gmake[1]: Leaving directory `/scratch/jblasco/MOLCAS/molcas74_ompi/g/global/testing'
------------------------------------------------------------
An executable test program for GA is ./global/testing/test.x
There are also other test programs in that directory.
------------------------------------------------------------
Also, to test your GA programs, suggested compiler/linker
options are as follows.
GA libraries are built in /scratch/jblasco/MOLCAS/molcas74_ompi/g/lib/LINUX64
INCLUDES = -I/scratch/jblasco/MOLCAS/molcas74_ompi/g/include
For Fortran Programs:
FLAGS = -g -Vaxlib -O3 -w -cm -xW -tpp7 -i8
LIBS = -L/scratch/jblasco/MOLCAS/molcas74_ompi/g/lib/LINUX64 -lglobal -lma -llinalg -larmci -ltcgmsg -lm
For C Programs:
LIBS = -L/scratch/jblasco/MOLCAS/molcas74_ompi/g/lib/LINUX64 -lglobal -lma -llinalg -larmci -ltcgmsg -lm -lm
In order to verify this compilation, we run ./global/testing/test.x
# ./global/testing/test.x
GA Statistics for process 0
------------------------------
create destroy get put acc scatter gather read&inc
calls: 11 10 1.45e+04 1563 1565 42 40 100
number of processes/call 1.00e+00 1.00e+00 1.00e+00 9.52e-01 1.00e+00
bytes total: 3.19e+06 2.17e+06 2.58e+05 5.87e+04 5.73e+04 8.00e+02
bytes remote: 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00
Max memory consumed for GA by this process: 676192 bytes
All tests successful
And after 0.002 seconds, will appears at the end of the output that all tests are successful.
The standard proceeding is to execute ./configure -setup, but Molcas-7.4 is older than ICS that we will use.
We can parse all the makefiles with the right include prescription for MKL or do it best and modify the configure file.
We suggest you to append this lines on you configure file.
# Workarround MKL 11.1.072 @ XRQTC
mkl11.1.072 )
if [ ! -f "src/blas_util/PACKAGE" ]; then
sbin/uninstall blas_util
needModule=1
fi
if [ ! -f "src/essl_util/PACKAGE" ]; then
sbin/uninstall essl_util
needModule=1
fi
if [ ! -f "src/lapack_util/PACKAGE" ]; then
sbin/uninstall lapack_util
needModule=1
fi
MKLINCLUDE=/opt/intel/Compiler/11.1/072/mkl/include
MKLPATH=/opt/intel/Compiler/11.1/072/mkl/lib/em64t
# MPI Only
XLIB="-L${MKLPATH} -I${MKLINCLUDE} -I${MKLINCLUDE}/em64t/lp64 -lmkl_lapack95_lp64 -Wl,--start-group ${MKLPATH}/libmkl_intel_lp64.a ${MKLPATH}/libmkl_sequential.a ${MKLPATH}/libmkl_core.a -Wl,--end-group -lpthread"
# OpenMP Only
#XLIB="-L$MKLPATH -I$MKLINCLUDE -I$MKLINCLUDE/em64t/lp64 -lmkl_lapack95_lp64 -Wl,--start-group $MKLPATH/libmkl_intel_lp64.a $MKLPATH/libmkl_intel_thread.a $MKLPATH/libmkl_core.a -Wl,--end-group -liomp5 -lpthread"
# Hybrid MPI + OpenMP
#XLIB="-L$MKLPATH -I$MKLINCLUDE -I$MKLINCLUDE/em64t/lp64 -lmkl_lapack95_lp64 -Wl,--start-group $MKLPATH/libmkl_intel_lp64.a $MKLPATH/libmkl_intel_thread.a $MKLPATH/libmkl_core.a -Wl,--end-group -liomp5 -lpthread"
;;
You may note that the follow configure line calls -blas mkl11.1.072, and there is nothing on -par_args, because the OpenMPI was compiled with the flag (--with-sge).
# ./configure -compiler intel -parallel mpi -par_root /aplic/MPI/OpenMPI/1.4.2_ics-11.1.072_ofed-1.5.1_blcr-8.2 -par_run /aplic/MPI/OpenMPI/1.4.2_ics-11.1.072_ofed-1.5.1_blcr-8.2/bin/mpirun -par_type distributed -par_args " " -blas mkl11.1.072 -blas_dir /opt/intel/Compiler/11.1/072/mkl/lib/em64t -speed fast
Configuration of MOLCAS version 7.4 patch level 045.
OS type is Linux-x86_64
Creating system-specific information using following
./configure parameters :
OS ........ Linux-x86_64
COMPILER .. intel
SPEED ..... fast
PARALLEL .. yes
MSGPASS ... mpi
ADRMODE ... default
USEOMP ....
USEDFLAGS.. -compiler intel -parallel mpi -par_root /aplic/MPI/OpenMPI/1.4.2_ics-11.1.072_ofed-1.5.1_blcr-8.2 -par_run /aplic/MPI/OpenMPI/1.4.2_ics-11.1.072_ofed-1.5.1_blcr-8.2/bin/mpirun -par_type distributed -par_args -blas mkl11.1.072 -blas_dir /opt/intel/Compiler/11.1/072/mkl/lib/em64t -speed fast
Generate cfg/lists.cfg
Generate doc/manual/users.guide/programs.tex
Generate data/prgms.cntrl
Found system-specific configuration file cfg/Linux-x86_64.cfg
Including cfg/Linux-x86_64.cfg (first pass)
Locating standard commands :
SH = /usr/bin/sh
CP = /bin/cp
MV = /bin/mv
RM = /bin/rm
LS = /bin/ls
TR = /usr/bin/tr
AWK = /usr/bin/awk
SED = /usr/bin/sed
GREP = /usr/bin/grep
HEAD = /usr/bin/head
MORE = /bin/more
CHMOD = /bin/chmod
CPP = /usr/bin/cpp
FIND = /usr/bin/find
MKDIR = /bin/mkdir
MAKE = /usr/bin/gmake
LN = /bin/ln
WC = /usr/bin/wc
CAT = /bin/cat
AR = /usr/bin/ar
TIME = /usr/bin/time
UUENCODE = /usr/bin/uuencode
MD5SUM = /usr/bin/md5sum
PERL = /usr/bin/perl
RANLIB = /usr/bin/ranlib
Including cfg/Linux-x86_64.cfg (second pass)
Locating compilers
F77 = /opt/intel/Compiler/11.1/072/bin/intel64/ifort
F90 = /opt/intel/Compiler/11.1/072/bin/intel64/ifort
CC = /opt/intel/Compiler/11.1/072/bin/intel64/icc
***
*** Warning: configuring parallel build (mpi) with the following hardwired parameters, please check!
***
GAOPTIONS=FC=ifort MPI_LIB=/aplic/MPI/OpenMPI/1.4.2_ics-11.1.072_ofed-1.5.1_blcr-8.2/lib MPI_INCLUDE= LIBMPI="-lmpi" USE_MPI=yes
GALIB= -L../../g/lib/LINUX64 -ltcgmsg-mpi -lglobal -larmci -lpario -lma -L/aplic/MPI/OpenMPI/1.4.2_ics-11.1.072_ofed-1.5.1_blcr-8.2/lib -lmpi
RUNBINARY=/aplic/MPI/OpenMPI/1.4.2_ics-11.1.072_ofed-1.5.1_blcr-8.2/bin/mpirun -np $CPUS $program < /dev/null
*** Uninstallation blas_util complete
*** Uninstallation essl_util complete
*** Uninstallation lapack_util complete
Generate cfg/lists.cfg
Generate doc/manual/users.guide/programs.tex
Generate data/prgms.cntrl
Creating file : molcas
Type full path to install driver script
or choose a directory from the list
(Enter - accept, n - next, q - quit)
install driver script to [/usr/local/bin] /aplic/molcas/7.4_patch045/ics-11.1.072/ompi-1.4.2_only
molcas driver will be installed to /aplic/molcas/7.4_patch045/ics-11.1.072/ompi-1.4.2_only
Creating file : Symbols
Creating file: molcas.rte
Creating file : src/Include/configinfo.fh
Generation of Makefiles completed
Configuration completed.
To download GUI for molcas, type: molcas getextra
To complete installation you must run make in this directory
We suggest you to review the molcas.rte,
# runtime environment for molcas
OS='Linux-x86_64'
PAR_TYPE='distributed'
DEFMOLCASMEM='256'
DEFMOLCASDISK='20000'
RUNSCRIPT='$program $input'
RUNBINARY='/aplic/MPI/OpenMPI/1.4.2_ics-11.1.072_ofed-1.5.1_blcr-8.2/bin/mpirun -np $CPUS $program < /dev/null'
And then, just run make
# make | tee -a make_molcas74_ompi_only_westmere6.log
In order to verify the performance and the results, you will need to execute molcas verify performance, and molcas timming.
You will find the failed tests on ./Test/Failed_Tests/, if you are lucky, you will find this directory empty.
OpenMP Only version
To build the OpenMP version, you only have to repeat the first steeps and modify the configure file as follow:
# Workarround MKL 11.1.072 @ XRQTC
mkl11.1.072 )
if [ ! -f "src/blas_util/PACKAGE" ]; then
sbin/uninstall blas_util
needModule=1
fi
if [ ! -f "src/essl_util/PACKAGE" ]; then
sbin/uninstall essl_util
needModule=1
fi
if [ ! -f "src/lapack_util/PACKAGE" ]; then
sbin/uninstall lapack_util
needModule=1
fi
MKLINCLUDE=/opt/intel/Compiler/11.1/072/mkl/include
MKLPATH=/opt/intel/Compiler/11.1/072/mkl/lib/em64t
# MPI Only
#XLIB="-L${MKLPATH} -I${MKLINCLUDE} -I${MKLINCLUDE}/em64t/lp64 -lmkl_lapack95_lp64 -Wl,--start-group ${MKLPATH}/libmkl_intel_lp64.a ${MKLPATH}/libmkl_sequential.a ${MKLPATH}/libmkl_core.a -Wl,--end-group -lpthread"
# OpenMP Only
XLIB="-L$MKLPATH -I$MKLINCLUDE -I$MKLINCLUDE/em64t/lp64 -lmkl_lapack95_lp64 -Wl,--start-group $MKLPATH/libmkl_intel_lp64.a $MKLPATH/libmkl_intel_thread.a $MKLPATH/libmkl_core.a -Wl,--end-group -liomp5 -lpthread"
# Hybrid MPI + OpenMP
#XLIB="-L$MKLPATH -I$MKLINCLUDE -I$MKLINCLUDE/em64t/lp64 -lmkl_lapack95_lp64 -Wl,--start-group $MKLPATH/libmkl_intel_lp64.a $MKLPATH/libmkl_intel_thread.a $MKLPATH/libmkl_core.a -Wl,--end-group -liomp5 -lpthread"
;;
And your configure command will be like these:
./configure -compiler intel -par_type smp -blas mkl11.1.072 -blas_dir /opt/intel/Compiler/11.1/072/mkl/lib/em64t -speed fast -OMP
then, just run make, and that's all.
# make | tee -a make_molcas74_smp_westmere6.log
Now it's time to verify the performance and scalability for MPI version and OpenMP version. With the follow scripts you could send several jobs to your batch queue and get the performance and scalability for your architecture.
molcas-iqtc04_smp.sh
#!/bin/bash
SFILE=SubmitScripts/molcas-verify-performance-smp.sub
for j in iqtc04
do
cat $SFILE | sed s/CLUSTER/$j/g > molcas-${j}
TFILE=molcas-${j}
HOLD=""
JOBS=""
for i in 1 2 4 6 8 12
do
cat $TFILE | sed s/ncores/$i/g > OUT/$TFILE-${i}.sub
qsub $HOLD -q $j.q@g1noden10 -pe "smp" $i OUT/$TFILE-${i}.sub > OUT/$TFILE-${i}.jobid
sleep 2
JOB="MOLCAS-smp-$i"
HOLD="-hold_jid $JOB"
done
done
rm $TFILE
molcas-iqtc04_ompi.sh
#!/bin/bash
SFILE=SubmitScripts/molcas-verify-performance-ompi_only.sub
for j in iqtc04
do
cat $SFILE | sed s/CLUSTER/$j/g > molcas-${j}
TFILE=molcas-${j}
HOLD="-hold_jid MOLCAS-smp-12"
JOBS=""
for i in 1 2 4 6 8 12
do
cat $TFILE | sed s/ncores/$i/g > OUT/$TFILE-${i}.sub
qsub $HOLD -q $j.q@g1noden10 -pe "ompi_*" $i OUT/$TFILE-${i}.sub > OUT/$TFILE-${i}.jobid
sleep 2
JOB="MOLCAS-mpi-$i"
HOLD="-hold_jid $JOB"
done
done
rm $TFILE



