Sailfish supports running simulations across multiple hosts. Distributed simulations currently require either SSH access to each machine or a cluster with a GPU-aware PBS scheduler (e.g. Torque). In both cases, all node machines need to be able to connect to each other using TCP connections.
A distributed Sailfish simulation is started just like a local simulation, the only difference being the presence of the cluster_spec config parameter, whose value is a path to a cluster definition file. If the path is relative, the controller will look for it in the current working directory as well as in the .sailfish directory in the user’s home.
Optionally, the cluster_sync argument can also be specified to automatically sync files from the controller to the nodes. A common value to use here is $PWD:., which means “sync the contents of the current directory ($PWD) to the working directory on every node (.)”.
A cluster definition file is a Python script that contains a global nodes list of MachineSpec instances. Each MachineSpec defines a cluster node. Here is sample file defining two nodes:
from sailfish.config import MachineSpec
nodes = [
MachineSpec('ssh=user1@hosta//chdir=/home/user1/sailfish-cluster',
'hosta', cuda_nvcc='/usr/local/cuda/bin/nvcc',
gpus=[0, 1]),
MachineSpec('ssh=user2@hostb//chdir=/home/user2/sailfish-cluster',
'hostb', cuda_nvcc='/usr/local/cuda/bin/nvcc',
gpus=[0, 2], block_size=256)
]
The two nodes are located on hosta and hostb respectively, and different user accounts are used to get access to both hosts. The cuda_nvcc and block_size parameters are standard Sailfish config parameters than can be specified from the command line for single-host simulations. Here we use them to provide a location of the NVCC compiler and to use a larger CUDA block size on hostb (e.g. because it has Fermi-generation GPUs). The first argument of MachineSpec is an execnet address string. See the execnet documentation for more information about all supported features. For common configurations, just modifying the above example should work fine.
Note that each node definition contains a list of available GPUs. This information is used for load-balancing purposes. By default, only the first available GPU ([0]) will be used, even on a multi-GPU machine.
Running a Sailfish simulation on a PBS cluster requires you to create two script files – one to set up the environment on each node, and a standard PBS job description script. The first script will be used internally by the Sailfish controller, and for consistency it is probably a good idea to use it in the job description script as well. The location of the script can be specified using the --cluster_pbs_initscript command line option, which has a default value of sailfish-init.sh. Below we give example scripts for the user myuser on the ZEUS cluster which is part of the PLGRID infrastructure. Some details such as paths will probably differ on your cluster, so you will need to adjust them accordingly.
Environment setup script ($HOME/sailfish-init.sh):
#!/bin/bash
# Add paths for Python modules that are not present by default on the cluster and
# were installed manually for this user.
export PATH="$PATH:/storage/myuser/bin/"
export PYTHONPATH="/storage/myuser/lib/python2.7/site-packages:$PWD:$PBS_O_WORKDIR:$PYTHONPATH"
# Assume that the job was submitted from a directory containing a Sailfish installation.
cd $PBS_O_WORKDIR
# The ZEUS cluster uses the Modules system to install additional software. The
# settings below will normally be set automatically by PBS for the _first_ task started
# within a job. We need them to be available also on later tasks started by Sailfish,
# so we copy add the settings into this script. You can see the environment set up
# for your job by starting an interactive job with: qsub -I -q <queue> -l nodes=1:ppn:1
# and then running the env command.
export MODULE_VERSION=3.2.7
export MODULE_VERSION_STACK=3.2.7
export MODULEPATH=/software/local/Modules/versions:/software/local/Modules/$MODULE_VERSION/modulefiles:/software/local/Modules/modulefiles
export MODULESHOME=/software/local/Modules/3.2.7
function module {
eval `/software/local/Modules/$MODULE_VERSION/bin/modulecmd bash $*`
}
# Load additional modules, make sure python points to python2.7 for which the
# Python modules installed by myuser were previously configured.
module add python/current
module add numpy/1.6.1
module add cuda/4.0.17
alias python=python2.7
PBS job script ($HOME/sailfish-test.pbs):
#!/bin/sh
#PBS -l nodes=2:ppn=1:gpus=1
#PBS -N test_sailfish
#PBS -q gpgpu
. $HOME/sailfish-init.sh
python ./examples/lbm_cylinder_multi.py --lat_nx=2046 --lat_ny=30000 --block_size=256 --mode=benchmark --vertical \
--every=500 --max_iters=2000 --blocks=2 --log=/mnt/lustre/scratch/people/myuser/test.log --verbose
Once you have both scripts in place and a Sailfish installation in $HOME/mysailfish, you can submit the job by running:
cd $HOME/mysailfish
qsub ../sailfish-test.pbs
Sailfish currently does not support InfiniBand (IB) explicitly. It is however possible to utilize IB interconnects by using the libsdp library which makes it possible to automatically replace TCP connections with SDP ones. To do so, you need to:
- make sure the ib_sdp kernel module is loaded on the computational nodes,
- add export LD_PRELOAD=libsdp.so to your sailfish-init.sh script,
- run your simulation with the --cluster_pbs_interface=ib0 parameter (this assumes the IB interface on the computational nodes is ib0).
If the --cluster_pbs option is set to true (default) and the $PBS_GPUFILE environment variable is set, Sailfish will assume it is running on a PBS cluster. A cluster specification will be dynamically built using the contents of the $PBS_GPUFILE. For each machine listed in this file, pbsdsh will be used to execute --cluster_pbs_initscript (sailfish-init.sh in the previous section), followed by python sailfish/socketserver.py (with a random port). The socket server will then be used to establish an execnet channel to the node and to start a local machine master.