Deployment and Usage Guide for Running AI Workloadson Red Hat OpenShift and NVIDIA DGX Systems with IBM Spectrum Scale

This Redpaper focuses on helping companies address the challenges of running large-scale workloads using orchestration platforms for containerized applications which is essential to guarantee performance, high availability, and efficient horizontal scaling across compute resources. The proof of concept (PoC) described in this chapter and explained in detail in the Redpaper will provide guidance on how to configure a Red Hat OpenShift 4.4.3 cluster for multi-GPU, multi-node deep learning (DL) workloads. It describes as an example how to run a real-world automotive industry training workload on a public dataset provided by Audi.