Distributed simulations on a cluster ==================================== Sailfish supports running simulations across multiple hosts. Distributed simulations currently require either SSH access to each machine or a cluster with a GPU-aware PBS scheduler (e.g. Torque). In both cases, all node machines need to be able to connect to each other using TCP connections. Using direct SSH connections ---------------------------- A distributed Sailfish simulation is started just like a local simulation, the only difference being the presence of the ``cluster_spec`` config parameter, whose value is a path to a cluster definition file. If the path is relative, the controller will look for it in the current working directory as well as in the ``.sailfish`` directory in the user's home. Optionally, the ``cluster_sync`` argument can also be specified to automatically sync files from the controller to the nodes. A common value to use here is ``$PWD:.``, which means "sync the contents of the current directory (``$PWD``) to the working directory on every node (``.``)". Cluster definition files ^^^^^^^^^^^^^^^^^^^^^^^^ A cluster definition file is a Python script that contains a global ``nodes`` list of :class:`MachineSpec` instances. Each :class:`MachineSpec` defines a cluster node. Here is sample file defining two nodes:: from sailfish.config import MachineSpec nodes = [ MachineSpec('ssh=user1@hosta//chdir=/home/user1/sailfish-cluster', 'hosta', cuda_nvcc='/usr/local/cuda/bin/nvcc', gpus=[0, 1]), MachineSpec('ssh=user2@hostb//chdir=/home/user2/sailfish-cluster', 'hostb', cuda_nvcc='/usr/local/cuda/bin/nvcc', gpus=[0, 2], block_size=256) ] The two nodes are located on ``hosta`` and ``hostb`` respectively, and different user accounts are used to get access to both hosts. The ``cuda_nvcc`` and ``block_size`` parameters are standard Sailfish config parameters than can be specified from the command line for single-host simulations. Here we use them to provide a location of the NVCC compiler and to use a larger CUDA block size on ``hostb`` (e.g. because it has Fermi-generation GPUs). The first argument of :class:`MachineSpec` is an execnet address string. See the execnet documentation for more information about all supported features. For common configurations, just modifying the above example should work fine. Note that each node definition contains a list of available GPUs. This information is used for load-balancing purposes. By default, only the first available GPU (``[0]``) will be used, even on a multi-GPU machine. Using a PBS cluster ------------------- Running a Sailfish simulation on a PBS cluster requires you to create two script files -- one to set up the environment on each node, and a standard PBS job description script. The first script will be used internally by the Sailfish controller, and for consistency it is probably a good idea to use it in the job description script as well. The location of the script can be specified using the ``--cluster_pbs_initscript`` command line option, which has a default value of ``sailfish-init.sh``. Below we give example scripts for the user ``myuser`` on the ZEUS cluster which is part of the PLGRID infrastructure. Some details such as paths will probably differ on your cluster, so you will need to adjust them accordingly. Environment setup script (``$HOME/sailfish-init.sh``):: #!/bin/bash # Add paths for Python modules that are not present by default on the cluster and # were installed manually for this user. export PATH="$PATH:/storage/myuser/bin/" export PYTHONPATH="/storage/myuser/lib/python2.7/site-packages:$PWD:$PBS_O_WORKDIR:$PYTHONPATH" # Assume that the job was submitted from a directory containing a Sailfish installation. cd $PBS_O_WORKDIR # The ZEUS cluster uses the Modules system to install additional software. The # settings below will normally be set automatically by PBS for the _first_ task started # within a job. We need them to be available also on later tasks started by Sailfish, # so we copy add the settings into this script. You can see the environment set up # for your job by starting an interactive job with: qsub -I -q -l nodes=1:ppn:1 # and then running the env command. export MODULE_VERSION=3.2.7 export MODULE_VERSION_STACK=3.2.7 export MODULEPATH=/software/local/Modules/versions:/software/local/Modules/$MODULE_VERSION/modulefiles:/software/local/Modules/modulefiles export MODULESHOME=/software/local/Modules/3.2.7 function module { eval `/software/local/Modules/$MODULE_VERSION/bin/modulecmd bash $*` } # Load additional modules, make sure python points to python2.7 for which the # Python modules installed by myuser were previously configured. module add python/current module add numpy/1.6.1 module add cuda/4.0.17 alias python=python2.7 PBS job script (``$HOME/sailfish-test.pbs``):: #!/bin/sh #PBS -l nodes=2:ppn=1:gpus=1 #PBS -N test_sailfish #PBS -q gpgpu . $HOME/sailfish-init.sh python ./examples/lbm_cylinder_multi.py --lat_nx=2046 --lat_ny=30000 --block_size=256 --mode=benchmark --vertical \ --every=500 --max_iters=2000 --blocks=2 --log=/mnt/lustre/scratch/people/myuser/test.log --verbose Once you have both scripts in place and a Sailfish installation in ``$HOME/mysailfish``, you can submit the job by running:: cd $HOME/mysailfish qsub ../sailfish-test.pbs Using InfiniBand ^^^^^^^^^^^^^^^^ Sailfish currently does not support InfiniBand (IB) explicitly. It is however possible to utilize IB interconnects by using the ``libsdp`` library which makes it possible to automatically replace TCP connections with SDP ones. To do so, you need to: * make sure the ``ib_sdp`` kernel module is loaded on the computational nodes, * add ``export LD_PRELOAD=libsdp.so`` to your ``sailfish-init.sh`` script, * run your simulation with the ``--cluster_pbs_interface=ib0`` parameter (this assumes the IB interface on the computational nodes is ``ib0``). How it works behind the scenes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If the ``--cluster_pbs`` option is set to true (default) and the ``$PBS_GPUFILE`` environment variable is set, Sailfish will assume it is running on a PBS cluster. A cluster specification will be dynamically built using the contents of the ``$PBS_GPUFILE``. For each machine listed in this file, ``pbsdsh`` will be used to execute ``--cluster_pbs_initscript`` (``sailfish-init.sh`` in the previous section), followed by ``python sailfish/socketserver.py`` (with a random port). The socket server will then be used to establish an execnet channel to the node and to start a local machine master.