Estimated reading: 4 minutes 0 views

Kubernetes is a popular open-source container orchestration platform used for deploying, managing, and scaling containerized applications. It provides a reliable, scalable, and resilient infrastructure for running applications in production. However, running AI workloads on Kubernetes can be challenging due to their resource-intensive nature. AI workloads require a lot of compute power, memory, and storage, which can result in high infrastructure costs and poor performance. To address these challenges, the NVIDIA Open Source (NOS) team developed an open-source module for Kubernetes called NOS. In this blog post, we’ll take a closer look at NOS, its features, and how it can be used to run AI workloads on Kubernetes more efficiently.

What is NOS?

NOS is an open-source module for Kubernetes that optimizes the scheduling of AI workloads on Kubernetes clusters. It provides a set of Kubernetes custom resources and controllers that enable GPU-aware scheduling of AI workloads. NOS is built on top of the NVIDIA GPU Operator, which is a Kubernetes operator for managing NVIDIA GPUs on Kubernetes clusters. NOS can be used with any Kubernetes distribution that supports custom resource definitions (CRDs), including Red Hat OpenShift, Amazon EKS, Google Kubernetes Engine (GKE), and Microsoft Azure Kubernetes Service (AKS).

NOS Features

NOS provides several features that make it easier to run AI workloads on Kubernetes clusters. Some of the key features include:

  1. Efficient GPU Utilization: NOS optimizes GPU utilization by ensuring that each GPU is fully utilized before scheduling workloads on additional GPUs. This can help reduce infrastructure costs by minimizing the number of GPUs required to run AI workloads.
  2. GPU-Aware Scheduling: NOS enables GPU-aware scheduling of AI workloads on Kubernetes clusters. This means that workloads are scheduled on nodes with available GPUs, and the scheduling decisions take into account the GPU requirements of each workload. This can help improve workload performance by ensuring that workloads are running on nodes with the necessary GPU resources.
  3. Resource Management: NOS provides resource management capabilities for AI workloads, including memory and CPU management. This can help ensure that workloads have the resources they need to run efficiently, and can help prevent resource contention issues.
  4. Automatic Scaling: NOS supports automatic scaling of AI workloads based on demand. This means that additional nodes can be added to the Kubernetes cluster when workload demand increases, and nodes can be removed when demand decreases. This can help improve workload performance and reduce infrastructure costs.
  5. Metrics and Monitoring: NOS provides metrics and monitoring capabilities for AI workloads running on Kubernetes clusters. This can help operators identify and troubleshoot performance issues, and can help ensure that workloads are running efficiently.

Getting Started



You can install nos using Helm 3 (recommended). You can find all the available configuration values in the Chart documentation.

helm install oci:// \
  --version 0.1.0 \
  --namespace nebuly-nos \
  --generate-name \

Alternatively, you can use Kustomize by cloning the repository and running make deploy.

Here’s a sample YAML configuration file that demonstrates how to use NOS to schedule an AI workload on a Kubernetes cluster:

yamlCopy codeapiVersion: v1
kind: Pod
  name: my-ai-workload
    app: my-ai-app
  - name: my-ai-container
    image: my-ai-image
      limits: "1"
        cpu: 1
        memory: 4Gi
    command: ["python", ""]

This configuration file specifies a Pod named “my-ai-workload” that runs a container based on the “my-ai-image” Docker image. The container has a single GPU resource limit, as well as CPU and memory requests. The container runs a Python script named “”.


Running AI workloads on Kubernetes can be challenging due to their resource ntensive nature, but the NOS module can help overcome these challenges by optimizing GPU utilization, enabling GPU-aware scheduling, providing resource management capabilities, supporting automatic scaling, and providing metrics and monitoring capabilities. With NOS, organizations can improve the performance of AI workloads on Kubernetes clusters, while also reducing infrastructure costs. The example YAML configuration file provided above demonstrates how easy it is to use NOS to schedule an AI workload on a Kubernetes cluster. If you’re interested in learning more about NOS, you can check out the NOS GitHub repository, which includes documentation and examples.

Please follow and like us:

Leave a Reply

Your email address will not be published. Required fields are marked *

Share this Doc