Service Proxy for Kubernetes (SPK) install

Service Proxy for Kubernetes (SPK) is a cloud-native application traffic management solution, designed for communication service provider (CoSP) 5G networks. SPK integrates F5’s containerized Traffic Management Microkernel (TMM) and Custom Resource Definitions (CRDs) into the OpenShift container platform, to proxy and load balance low-latency 5G workloads.o

The deployment includes at a minimum two pods, one supports the TMMM and the other manages the TMM Pod or Pods. F5 has created a custom Kubernetes operator called the F5Ingress Controller. The Operator pattern applied to the F5Ingress Controller is the standard for Kubernetes that allows for managing repeatable tasks such as configuring the F5 tmm container in this case.

Environment design

This environment was created for testing SPK functionality using 3 hardware hosts. One host provides the control plane, one host provides the SR-IOV interface and the last host provides the workload resources. The control plane node is running kvm to support virtualization and expose a virtual 3 node control plane cluster for OpenShift. This host is also used for administration of the cluster and SPK.

Clients will connect to the external sr-iov capable interface on Node2 and will be routed out the internal sr-iov capable interface connecting to the switch and workload on Node3.

base environment

Environment requirements

SPK relies on Redhat OpenShift 4.7 and later. OpenShift is an opinionated enterprise version of Kubernetes which requires a control plane cluster and dedicated work nodes.

In this design node1 is running Red Hat 8 with Kernel-based Virtual Machine (KVM) installed. Using KVM and QEMU images 3 virtual servers have been created on the bastion host that provide the OpenShift control plane cluster.

SPK leverages Single Root I/O Virtualization (SR-IOV) and the Open Virtual Network with Kubernetes (OVN-Kubernetes) CNI. In addition, OpenShift provides support for MULTUS a CNCF project with iCNI 2.0. MULTUS is considered a meta plugin, as it supports adding multiple CNI plugins to the same pod. This functionality is required for SPK because the TMM requires two interfaces: one connected to the F5Ingress Pod network and the other to the SR-IOV enabled network.

SR-IOV hardware support for SPK is limited by OpenShift, currently Melanox and Intel are the two currently supported hardware vendors.

You will also need to install the Performance Addon Operator(PAO) and enable Topology Manager. Topology Manager is a component of the Kubelet that works with the CPU and Device Manager to allocate the resources to assign to a pod and/or container. When configuring the Topology Manager keep in mind that the CPU resources assigned to the f5-tmm pod must be on a NUMA node that has an SR-IOV card. In some environments the server may have multiple interfaces but they may not all be SR-IOV compatible such as onboard interfaces in the server.

While working with PAO you will also need to configure huge pages. Huge pages are memory pages that support chunks of data larger that 4ki. Common Huge pages default sizes are 2Mi and 1Gi.

Any nodes that will support SPK must have the iptable_raw kernel modules enabled.

You will also need to provide a Persistent Volume. This can be something local and simple such as the HostPath type or a deployed application like OpenEBS.

Once you have OpenShift 4.7 or later cluster up and running you should be able to deploy SPK to the cluster.

Environment validation

If you are using an existing OpenShift environment you may want to review the environment and validate the settings in place. You will need direct access all the nodes in your cluster and access to a working OpenShift client to complete these tasks.

  1. Check the nodes

Start with discovering all the nodes in your cluster and confirming that they are healthy and show a status of Ready. The status will also include the OpenShift node ROLE which defaults typically to master or worker. As you can see, the nodes with the ROLE master have been named leader01-ocp-spk7, leader02-ocp-spk7, and leader03-ocp-spk7. These nodes make up the control plane for OpenShift. The worker nodes manage the other workloads for your environment. Additional node roles can be defined using labels. These labels can be used for other requirements such as configuring huge pages on the SPK node.

oc get nodes 
oc get nodes -o wide
oc get nodes --show-labels
  1. Check Cluster Operators status

The Cluster Operators are a group of components that support the base OpenShift environment. For example the Cluster Network Operator manages the network components and the Authentication operator manages authentication to and within the cluster. Problems with any of these Operators may impact SPK. Review the output from the clusteroperators command and ensure all the services in the AVAILABLE column are set to True and none indicate a DEGRADED status.

oc get clusteroperators
oc describe clusteroperators.config.openshift.io authentication
  1. Review system events

When OpenShift displaying events they are not presented in a chronological order by default. You can control this to some extent using shell pipes and filters. Alternatively you can use --sort-by= to filter the output based on the creation time stamp. It can be helpful to review all the namespaces for events and then to sort them by the number of issues reported. You can use the -A option to display all namespaces or the -n to select particular namespace. Using this information you can quickly focus on the namespaces with the most events and sort them by creation time stamp. The command examples below should provide these details for your environment.

oc get events
oc get events -A
oc get events -A | awk '{print $1}' | grep -iv namespace | sort | uniq -c
oc get events -A --sort-by=.metadata.creationTimestamp
oc get events -n {NAMESPACE value from previous output} --sort-by=.metadata.creationTimestamp
  1. Validate huge pages resources

By default each instance of SPK start-up two TMM threads each allocating approximately 1.5GB of memory per thread for a total of 3GB for each SPK instance. SPK takes advantage of huge pages which should have been configured at 2 MB per page. As a result by default each instance of SPK by default will consume 1500 HugePages or approximately 3GB of RAM. In the following examples you should be able to see a system with huge pages enabled and one without. The leader nodes typically do not require this feature but your SPK node should have huge pages enabled. Replace the node names in the examples with your environment node names. These values can also be confirmed locally on the host assuming you have ssh access to those nodes.

Note: The number of TMM instances deployed is defined in the SPK override file.

oc describe nodes {insert your worker node name here} | grep huge
oc describe nodes {insert your lead node name her} | grep huge

You can also login directly to the nodes and view the huge page settings, available CPUs and Memory.

ssh core@{node name here}
grep -i huge /sys/devices/system/node/node0/meminfo
grep -i huge /sys/devices/system/node/node1/meminfo
lscpu
free -h
  1. NUMA, CPUs, RAM and SR-IOV

It is likely that the nodes in your cluster are using Non-Uniform Memory Access (NUMA) systems with 2 multiple processors. If those systems also have non SR-IOV capable network cards then you should confirm that your topology manager is only allocating CPUs from the correct NUMA node. The network card will be connected via the bus to one of the CPU sockets. Memory on the system is arranged into zones which are allocated specific CPU sockets. Only the memory connected to a CPU socket that is also connected to an SR-IOV enabled network interface should have huge pages enabled.

The following commands provide details around the NUMA nodes and the CPUs. When you sign-in to your nodes you should only expect to see huge pages consumed on hosts with SPK installed and only for Memory zones associated to an SR-IOV capable SPU socket.

Start by reviewing the PCI details using lspci for a system with Mellanox cards installed. You may notice that you can increase the details from lspci using -v, -vv and, -vvvand then increase the detail further by elevating your rights using sudo or the root account directly.

ssh core@{node name here}
lspci -vv
lspci -vv | grep -i numa
lspci -vv | grep -A 5 Mellanox
grep -i huge /sys/devices/system/node/node0/meminfo
grep -i huge /sys/devices/system/node/node1/meminfo

Next review the CPU details using lscpu

lscpu
lscpu --all --extended=NODE,CPU

The following iterator will display device to NUMA associations.

for nic in $(ls /sys/class/net/*/device/numa_node|cut -d/ -f5|sed 's/v[0-9]*$//'|sort -u); do echo "$nic NUMA=$(cat /sys/class/net/$nic/device/numa_node)"; done

You can also review the virtual function details for your interfaces, if SR-IOV is enabled. These interface deails will not show on a host that does not have an SR-IOV capable card installed.

ls -l /sys/class/net/ens3f1/device/virtfn*

Next list the device to CPU associations

for nic in $(ls /sys/class/net/*/device/numa_node|cut -d/ -f5|sed 's/v[0-9]*$//'|sort -u); do echo "$nic CPUs=$(cat /sys/class/net/$nic/device/local_cpulist)"; done

Environment preparation for SPK

The SPK pod interacts with TMM pod over GRPC using an encrypted TLS tunnel. The SSL/TLS keys and certificates must be created and installed into your cluster. In addition you will need to add the default service account to your desired spk target namespace. The online F5 Service Proxy for Kubernetes documentation covers this in detail.

If you intend to work with the NAT64 solution you will also need to create, configure and add secrets for that solution also for DSSM. This process is also online in the F5 Service Proxy for Kubernetes documentation and covered in detail.

You should already be aware of your SR-IOV compatible hardware and which interfaces are associated to those cards. The OpenShift administrator should have configured these cards already. These are part of a set of configuration objects that define the network configuration used by SPK. This collection starts with the SR-IOV Network Node Policy. This object controls the virtual functions provided by the SR-IOV capable network card. The number of supported virtual functions that a card can support is defined by the vendor and configured at the BIOS level of the host system. Validating and updating this setting will be unique to each hardware vendor.

Once this has been enabled and the system rebooted you should be able to confirm which nodes will support SPK. The following shell iterator should print these details out for you but it assumes your nodes include either worker or spk in the node name awk '/ worker || spk /{print $1}'. You may need to change those values for your environment.

nvl="$(oc get pod -A -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName|awk '/^f5-tmm-/{print $2}'|sort|uniq -c|awk '{print $2 ":" $1}')"; for node in $(oc get nodes|awk '/ worker || spk /{print $1}'); do vfm="$(oc describe sriovnetworknodestates.sriovnetwork.openshift.io -n openshift-sriov-network-operator $node 2>/dev/null|grep -m1 'Num Vfs:'|awk '{print $3}')"; [ -n "$vfm" ] && echo "$node: VFs $((echo "$nvl"|grep "^$node:" || echo ':0')|cut -d: -f2)/$vfm" || echo "$node: [SR-IOV DISABLED]"; done

Next review the SR-IOV network node policies that are installed. When using SR-IOV with SPK you should have an internal and external SR-IOV network node policy. They should exist in the openshift-sriov-network-operator namespace but you can check all namespaces using the -A option. Once you find the policies review them for details using the -o yaml option. The nodeSelector should be defined for the SR-IOV capable devices, this can be based on a kubernetes role or label.

oc get sriovnetworknodepolicies.sriovnetwork.openshift.io -A
oc get sriovnetworknodepolicies -n openshift-sriov-network-operator
oc get sriovnetworknodepolicies -n openshift-sriov-network-operator {INSERT POLICY NAME HERE} -o yaml

The creation of the network node policies should have generated the network attachment definitions to match the internal and external sr-iov network node policies.

Now we need to create the SR-IOV network objects. These objects are based on the SR-IOV custom resources installed when the SR-IOV hardware was added. These objects should have unique descriptive names and be associated to the target SPK namespace. These yaml files should also reference the SR-IOV resource name which should be defined in your sriov network node policy object.

Here is one way to find your SR-IOV resource name:

oc get sriovnetworknodepolicies -n openshift-sriov-network-operator -o "custom-columns=SR-IOV Resource Name:.spec.resourceName"

Using these values you should be able to create your sriovnetwork objects which might look something like the following example. The metadata name value is an attempt to provide details for this object. This is a Development SPK ingress (dev-spk-ingress) instance using the interface ens3f0 on worker node 5 (w5) for the internal connection.

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: dev-spk-ingress-ens3f0-w5-internal
  namespace: openshift-sriov-network-operator
spec:
  networkNamespace: dev-spk-ingress
  resourceName: sriovens3f0int
  spoofChk: "off"
  trust: "on"
  capabilities: '{"mac": true, "ips": true}'

Installing and verifying objects required to install SPK

The SPK software images and installation helm charts are provided as a single tap archive (TAR) file. Detailed steps on how to validate, extract and upload these image files can be found in the F5 Service Proxy for Kubernetes documentation.

Once you have the images uploaded and the tar file extracted you will need to install the VLAN CRD in order to create the object IngressRouteVlan which is part of the F5 CRD set. These yaml files should be found in the f5ingress/crds folder. You only need to install the ingressroutevlan.yaml CRD.

oc get customresourcedefinitions.apiextensions.k8s.io | grep -i f5
oc create -f f5ingress/crds/ingressroutevlan.yaml
oc get customresourcedefinitions.apiextensions.k8s.io | grep -i f5

Once this yaml file is installed you can create your internal and external vlan objects. The namespace should reflect the namespace where you intend to deploy SPK. The self-ip should be an IP address on the subnet associated to the internal or external network based on which VLAN you are configuring. If you are going to scale the TMM then add additional self-ip values as shown in the example below. If you are using VLANs the tag should be defined in the VLAN object also as shown the sample below where we can see vlan tag 2001 in use.

apiVersion: "k8s.f5net.com/v1"
kind: IngressRouteVlan
metadata:
  namespace: dev-spk-ingress
  name: "vlan-external"
spec:
  name: external
  # vlan tag for external traffic
  tag: 2001
  interfaces:
    - "1.2"
  selfip_v4s:
    - 192.168.20.88
      # to support tmm scaling additional IP addresses must be listed here
    #- 192.168.20.89
  prefixlen_v4: 24

Duplicate this configuration for your other SPK VLAN interface. At this point you should have the following objects defined.

  • sriovnetworknodepolicies - external (likely created by the OpenShift administrator)

  • sriovnetworknodepolicies - internal (likely created by the OpenShift administrator)

  • ingressroutevlans - external (Created by you, after manually installing the F5 CRD for ingress route vlans)

  • ingressroutevlans - internal (Created by you, after manually installing the F5 CRD for ingress route vlans)

  • sriovnetwork - external (Created by you, requires sriov network node policy values)

  • sriovnetwork - internal (Created by you, requires sriov network node policy values))

  • network-attachment-definitions - external (Automatically generated when the SR-IOV network objects are created)

  • network-attachment-definitions - internal (Automatically generated when the SR-IOV network objects are created)

Preparing the SPK override file

The SPK helm package include default settings that can be overridden at deployment. In this example the custom values or override values will be stored in a file called spk-override.yaml. First notice topologyManager. Topology manager functionality is provided as a result of installing the Performance Addon Operator and it should be enabled and your performance profile configured. The value for runtimeClassName is can be confirmed using oc get runtimeclass

Next notice that the cniNetwork values are pulled from the sriovnetwork network objects. This is how we define the interfaces available to tmm when configured by the f5ingress helm chart. The values here point to the network-attachment-definitions that were created as a result of the sriovnetwork objects being created.

Just below this notice grpc and that it is enabled. In early versions of SPK the configuration of TMM was pushed from SPK via TCL. Now, this is all done using gRPC and secured using the certificates you created earlier for the SPK ingress namespace. If you scan the sample configuration you will notice another gRCP setting vlan_grpc which is also set to true. In earlier versions of SPK, only the initial configurations were using gRPC. But the vlan configs were still using TCL but that is no longer the case when this value is set to true.

icni2 is the Redhat OpenShift Intelligent Container Network Interface version 2.0.

Now notice the customEnvVars these will define the interfaces 1.1 and 1.2 in the base TMM configuration. The actual names here do not matter as SPK creates a simple index list which always starts with 1, the order is important but not the names.

Under resources you can manage the CPU and Memory allocations. The ratio of huge pages to should be 1.5Gb for each CPU, So in this case we have 2 CPUs and 3Gb of Memory and huge pages allocated.

This sample is not enabling logging so all the logs will go to standard out in the Pod, this is controlled by the f5-toda-logging block. In this example this instance of SPK will be monitoring the namespace dev-apps. There is a one to one relationship between namespaces and SPK deployments. Do not deploy your applications to the same namespace that SPK is deployed too and do not list more than one namespace here.

Note: It is recommended to ONLY keep values in the override file that differ from the defaults.

tmm:
  topologyManager: true
  runtimeClassName: performance-profile-spk
  # these are the network attachment definitions
  cniNetworks: dev-spk-ingress/dev-spk-ingress-ens3f0-w5-internal,dev-spk-ingress/dev-spk-ingress-ens3f1-w5-external
  grpc:
    enabled: true
  icni2:
    enabled: true
  customEnvVars:
    # name_1 equals interface 1.1, note the number does not reflect the interface name, this is a simple index list, it always starts with 1
    - name: OPENSHIFT_VFIO_RESOURCE_1
      value: sriovens3f0int
    - name: OPENSHIFT_VFIO_RESOURCE_2
      value: sriovens3f1ext
  # You can Force the tmm pod to use 1 CPU to conserve resources and for functionality testing but this should not be used in a production environment.
  resources:
    limits:
      cpu: 2
      hugepages-2Mi: 3Gi
      memory: 3Gi
    requests:
      cpu: 2
      hugepages-2Mi: 3Gi
      memory: 3Gi
# controls logging for the F5 TMM POD, when enabled SPK will deploy a fluentd container
f5-toda-logging:
  enabled: false
  fluentd:
    host: value from installing fluentd
    port: values from installing fluentd
# controls logging for f5ingress POD, when enabled SPK will deploy a fluentd container
controller:
  watchNamespace: dev-apps
  fluentbit_sidecar:
    enabled: false
  vlan_grpc:
    enabled: true
pccd:
  enabled: false

At this point you should be ready to start the SPK install.

Installing F5ingress

The preferred method to install the SPK F5ingress package is to use helm. Helm is a single binary install that will assist in installing, updating and removing kubernetes deployments. You should have the F5ingress tar package downloaded to your openshift client. In this example we are deploying spk to the dev-spk-ingress and it will be monitoring the dev-apps namespace.

helm install f5ingress f5ingress-1.0.23.tgz -f spk-override.yaml -n dev-spk-ingress

Once your install is complete you will want to check the spk namespace which in this example is dev-spk-ingress. Next you may want to review the deployment details and check for any unexpected events.

oc get pods -n dev-spk-ingress
oc get deployments -n dev-spk-ingress
oc describe deployment -n dev-spk-ingress f5-tmm
oc get events -n dev-spk-ingress

Now that you have SPK deployed you can login to f5-tmm pod and review some of the configurations, scripts and environment variables.

oc exec -it deployment/f5-tmm -c f5-tmm -n dev-spk-ingress -- bash
cd /config
cat tmm_init.tcl
cat tmm_args.sh
env
exit

Uninstall F5ingress

In the event that you want to uninstall your SPK instance, first remove any Custom Resource Definitions (CRD) you have created. Next remove the SPK ingress VLAN definitions and the network attachment objects. Once that is complete remove the certificate objects and then remove the SPK instance using helm.

Remove SPK using helm once the other objects have been deleted.

helm uninstall f5ingress -n dev-spk-ingress