Platform and Hardware Requirements

Platforms, operating systems, and processors supported by nanoFluidX and recommended and required hardware.

Supported Platforms

Linux

nanoFluidX is available for most common Unix-based OS versions, which include GCC and GLIBC system libraries newer than 4.8.5 and 2.17, respectively. Data on library versions for GCC and GLIBC can be found on DistroWatch.

nanoFluidX for Linux is distributed with OpenMPI 4.1.2 and NVIDIA CUDA 11.6.2 libraries. A compatible NVIDIA graphics driver (version 450.80.02 or newer) must be installed on the system.

Table 1. Supported Operating Systems
Distribution	GCC	GLIBC
RHEL 8.x	8.4.1	2.28
CentOS 8.x	8.4.1	2.28
CentOS 7.x	4.8.5	2.17
SLES 12 SP4	4.8.5	2.22
Ubuntu 18.04	7.3.0	2.27
Ubuntu 16.04	5.3.1	2.23

Windows

nanoFluidX is available for Windows 10. nanoFluidX is distributed with NVIDIA CUDA 11.6.2 libraries. A compatible NVIDIA graphics driver (version 452.39 or newer) must be installed on the system.

Due to a lack of CUDA-aware MPI for Windows we cannot support multiple GPUs for Windows. nanoFluidX for Windows is restricted to a single GPU.

Note: nanoFluidX tandem interpolation with nanoFluidX[c] is not supported for Windows.

Important: Windows style paths with backslash “\” are not supported. This is a C++ related limitation as backslash serves as an escape character. For most cases, simply replacing the backslash with a slash “/” is suﬀicient. While absolute paths including drive letters are supported, mixing drive letters within one .cfg file is not recommended. In general, use of relative paths is encouraged.

For WDDM driver mode, the GPU is a shared resource and heavy GPU usage for display output (pre- and postprocessing) can impair performance. For more information, refer to Windows Driver Mode on this page.

Windows Driver Mode

Microsoft and NVIDIA offer two driver modes for Windows.

WDDM: On workstations and laptops, this is usually the default mode. This driver mode allows shared usage of the NVIDIA GPU for display output and GPGPU computing.
TCC: This driver mode uses the NVIDIA GPU for GPGPU computing exclusively. It is only available on NVIDIA Tesla, NVIDIA Quadro or NVIDIA GTX Titan GPUs and is typically the default on most recent NVIDIA Tesla GPUs. As the GPU is not available for display output in this mode, the machine requires to be either headless or have a second (typically onboard GPU) available for display purposes.

To switch between WDDM and TCC driver modes, type the following command in a PowerShell or CMD window with administrator privileges:

nvidia-smi -g {GPU_ID} -dm {0|1}

Here, GPU_ID is usually 0 for systems with a single GPU and may be identified by running nvidia-smi without any arguments. Pass 0 to -dm option for WDDM mode and 1 for TCC mode. For example, to change to TCC mode in a single GPU system use the following command:

nvidia-smi -g 0 -dm 1

Important: A system restart is required after each modification.

Although TCC driver mode is geared toward aiding GPGPU computing, the results depend on the hardware configuration and our limited tests have not shown a distinct advantage in using TCC mode as of version 2021.2.

A simple case run on a NVIDIA Quadro P2000 (mobile GPU) in TCC mode required approximately three times the wall clock time spent in WDDM mode, although utilization in TCC mode was twice that of WDDM mode. This may suggest that TCC mode is not suitable for mobile devices and could lead to throttling. A test on a NVIDIA Quadro RTX 6000 (desktop GPU) showed no statistically significant difference between WDDM and TCC driver modes.

Hardware Requirements

Minimum Requirements

Hardware requirements are strongly dependent on the use case. For production level industrial cases, the following are suggested as minimum configuration.

CUDA-enabled GPU
Number of CPU cores should at least equal the number of GPU devices. Ideally, the number of CPU cores will slightly exceed the number of available GPU devices to ensure some computational overhead for system operations.
Recommended RAM must be at least equal to the RAM of GPUs combined.
3TB HDD space (long-term storage) or 500GB for operational drive.
Common nanoFluidX output can vary from 20 to 400 GB, depending on the size of the case, desired output, and frequency of the output.
High speed interconnect for multi-node systems, for example, Infiniband.

GPU Requirements and Information

To run nanoFluidX, your system must contain a CUDA-enabled GPU.

Linux: GPU must support Compute Capability 3.5 or higher
Windows: GPU must support Compute Capability 6.0 or higher

For more information, refer to Your GPU Compute Capability in the NVIDIA Developer Documentation.

Load Balancing: To make full use of GPUs at all times, a dynamic load balancing (DLB) scheme has been developed. The DLB implementation was successfully tested with hundreds of GPUs on Tokyo Tech’s Tsubame 2.5 GPU supercomputer. DLB allows for optimal utilization of all GPUs at any given time of the simulation. Load balancing is turned on by default for all nanoFluidX simulations.
GPU Recommendation: NVIDIA Enterprise GPUs, primarily the Tesla, Quadro and RTX series, are recommended as they are well established GPU cards for HPC applications and nanoFluidX has been thoroughly tested on them.
Note: Despite good performance in single precision, the Quadro and RTX series effectively have no double precision capability. Ensure that single precision is sufficient for your needs. For more information, refer to Single versus Double Precision binaries.

Important: The NVIDIA GeForce line of GPU cards are CUDA enabled. These are capable of running nanoFluidX, however, Altair does not guarantee accuracy, stability and overall performance of nanoFluidX on these cards. Be aware that the current NVIDIA EULA prohibits using non-Tesla series cards as a computational resource in bundles of four GPUs or more.; Quoted single precision performance (GFLOPS) of a GPU can be used for approximate performance comparison's of GPUs. A full list of NVIDIA GPUs and respective official performance numbers can be found on Wikipedia.; nanoFluidX performance tends to follow increase in FLOPS. This can be used as a rule of thumb for GPU comparison.

Figure 1.