Multi-Node Usage

ultraFluidX can generally be used on an arbitrary number of compute nodes.

However, be aware of typical strong scaling behavior, that is, for each problem size there is a maximum reasonable number of GPU devices. If the problem size per GPU device becomes too small, the communication between devices will start to dominate and the overall compute performance will not increase further.1

To start ultraFluidX on multiple compute nodes, it is recommended to use one of the two MPI options --hostfile or --host. Just like on a single machine, the main rank, 0 handles the preprocessing and the I/O, whereas each additional secondary rank 1...n handles one GPU device each. Therefore, if you want to run ultraFluidX on, for example, three machines with 2 GPU devices each, the simulation must be started with a total number of 7 MPI ranks (main rank 0 + 3×2 GPU ranks).

If the names of the machines from the example above were node1, node2 and node3 with node1 containing the main rank 0, the BASH command when using the --host option would be (-np is not needed in this case):
mpirun –-host node1,node1,node1,node2,node2,node3,node3 ultraFluidX case.xml
Alternatively, the same behavior could be achieved when using a so-called “hostfile” named, for example, uFX_hosts containing the following lines:
node1 slots=3
node2 slots=2
node3 slots=2
The BASH command to start ultraFluidX with this option would then be:
mpirun –-hostfile uFX_hosts –np 7 ultraFluidX case.xml
Note: ultraFluidX must be able to reach/use the same locations, folders, and so on, on all compute nodes, which is typically the case for a shared network file system.

For further documentation regarding multi-node usage and also general handling and run-time tuning of Open MPI, refer to the respective FAQ on the Open MPI web page:

1 Niedermeier, C. A., Janßen, C. F., & Indinger, T. (2018). Massively-parallel multi-GPU simulations for fast and accurate automotive aerodynamics. In 7th European conference on computational fluid dynamics.