Configure SR-IOV Network Device Plugin for Kubernetes

SRIOV CNI plugin exposes the Scalable Function (SF) on the DPU node to Kubernetes. User must create a SF ConfigMap resource to expose the SF on the DPU node to Kubernetes.

Create a SF Configmap on the Host Node

To configure the SRIOV CNI plugin, a SF ConfigMap resource on the host must be created with the following settings.

  1. Create sf-cm.yaml file with the example contents below.

    vi sf-cm.yaml
    

    Example Contents:

    Note: Make sure to update the values in the example content with the actual values as per your envirement.

    apiVersion: v1
    kind: ConfigMap
    metadata:
    name: sriovdp-config
    namespace: kube-system
    data:
    config.json: |
        {
            "resourceList": [
                {
                    "resourceName": "bf3_p0_sf",
                    "resourcePrefix": "nvidia.com",
                    "deviceType": "auxNetDevice",
                    "selectors": [{
                        "vendors": ["15b3"],
                        "devices": ["a2dc"],
                        "pciAddresses": ["0000:03:00.0"],
                        "pfNames": ["p0#1"],
                        "auxTypes": ["sf"]
                    }]
                },
                {
                    "resourceName": "bf3_p1_sf",
                    "resourcePrefix": "nvidia.com",
                    "deviceType": "auxNetDevice",
                    "selectors": [{
                        "vendors": ["15b3"],
                        "devices": ["a2dc"],
                        "pciAddresses": ["0000:03:00.1"],
                        "pfNames": ["p1#1"],
                        "auxTypes": ["sf"]
                    }]
                }
            ]
        }
    
  2. Apply the SF ConfigMap on the Host node.

    kubectl apply -f sf-cm.yaml
    

Install SR-IOV Network Device Plugin for Kubernetes

  1. Download the sriovdp-daemonset.yaml.

    wget https://raw.github.com/k8snetworkplumbingwg/sriov-network-device-plugin/master/deployments/sriovdp-daemonset.yaml 
    
  2. Add tolerations in sriovdp-daemonset.yaml file to allow the sriovdp pods run on the DPU nodes. Add the below section under spec. Download the example sriovdp-daemonset.yaml file, here.

    tolerations:         
      - key: "dpu"       
        value: "true"    
        operator: "Equal"
    
  3. Apply the SRI-OV CNI plugin.

    kubectl create -f sriovdp-daemonset.yaml
    
  4. Verify the SRI-OV device plugin pod created for each node in the cluster.

    kubectl get pods -owide -n kube-system
    

    Sample Response:

    kube-sriov-device-plugin-nstrs    1/1     Running   0              2d2h    <IP address>   localhost.localdomain   <none>           <none>
    kube-sriov-device-plugin-p8mv5    1/1     Running   0              2d22h   <IP address>    sm-hgx1                 <none>           <none>
    
  5. Verify the pod deployed for the DPU. You should see the SF ConfigMap read by the pod and resources created for each SF. The pod will iterate through all the PCI resources but should eventually locate the correct one.

    kubectl logs pod/kube-sriov-device-plugin-nstrs -n kube-system
    

    In the example logs below, look for new resource server created for bf3_p0_sf and bf3_p1_sf ResourcePools.

    I0814 15:33:51.566759       1 manager.go:57] Using Kubelet Plugin Registry Mode
    I0814 15:33:51.567877       1 main.go:46] resource manager reading configs
    I0814 15:33:51.568002       1 manager.go:86] raw ResourceList: {
        "resourceList": [
            {
                "resourceName": "bf3_p0_sf",
                "resourcePrefix": "nvidia.com",
                "deviceType": "auxNetDevice",
                "selectors": [{
                    "vendors": ["15b3"],
                    "devices": ["a2dc"],
                    "pciAddresses": ["0000:03:00.0"],
                    "pfNames": ["p0#1"],
                    "auxTypes": ["sf"]
                }]
            },
            {
                "resourceName": "bf3_p1_sf",
                "resourcePrefix": "nvidia.com",
                "deviceType": "auxNetDevice",
                "selectors": [{
                    "vendors": ["15b3"],
                    "devices": ["a2dc"],
                    "pciAddresses": ["0000:03:00.1"],
                    "pfNames": ["p1#1"],
                    "auxTypes": ["sf"]
                }]
            }
        ]
    }
    I0814 15:33:51.569064       1 factory.go:211] *types.AuxNetDeviceSelectors for resource bf3_p0_sf is [0x40000d24e0]
    I0814 15:33:51.569131       1 factory.go:211] *types.AuxNetDeviceSelectors for resource bf3_p1_sf is [0x40000d2680]
    ...
    
    I0814 15:33:51.641690       1 manager.go:156] New resource server is created for bf3_p0_sf ResourcePool
    I0814 15:33:51.641701       1 manager.go:121] Creating new
    ...
    
    I0814 15:33:51.692891       1 factory.go:124] device added: [identifier: mlx5_core.sf.5, vendor: 15b3, device: a2dc, driver: mlx5_core]
    I0814 15:33:51.692939       1 manager.go:156] New resource server is created for bf3_p1_sf ResourcePool
    
  6. Examine the DPU node to ensure that it has correct resources available.

    kubectl describe node localhost.localdomain
    

    Sample Response:

    Name:               localhost.localdomain
    Capacity:
      nvidia.com/bf3_p0_sf:  1
      nvidia.com/bf3_p1_sf:  1
    Allocatable:
      nvidia.com/bf3_p0_sf:  1
      nvidia.com/bf3_p1_sf:  1
    

Next Steps: