Distributed Toda for Stats Aggregation

Overview

BIG-IP Next for Kubernetes generates terabytes of statistics per second. To manage this load, the system utilises three primary components: the receiver, the observer (or aggregator), and the operator. TMM provides the statistics, and the metrics scraper containers stream the metrics.

Distributed Toda Pods Roles

In this section, you will find details about the responsibilities of every Distributed Toda pod.

The following are the key responsibilities of every Distributed TODA pod:

  • Receiver: The receiver runs as a StatefulSet. It gathers metrics from TMM targets, persists them, and securely transmits them to the Observer via gRPC with mutual TLS (mTLS) for aggregation. Also handles the metrics of dead TMMs when the specific receiver has collected metrics before a TMM is down.

  • Observer: The observer ‌runs as a StatefulSet. Observers now support scraping metrics from multiple components beyond TMM. They also scrape and export their own telemetry to OTEL, including gRPC metrics, aggregation metrics, and storage-related metrics. These capabilities let you monitor observer performance.

  • Operator: The Operator runs as a Deployment and orchestrates the end-to-end metrics collection and aggregation lifecycle. It discovers available TMM Scrapers, Receivers, and Aggregators, then load-balances TMMs across Receivers. Related topic Aggregation Mode

  • TMM Scraper: TMM Scraper is an observer container that runs inside each TMM pod, replacing the tmstatsd tool. It directly serves metrics from tmctl over a gRPC response stream upon receiving requests from the Receiver.

Metrics Flow Architecture

This section highlights the Metrics flow architecture.

Example:

Following is an example of Metrics of the virtual server clientside.received.bytes metric example collected from OTEL.

{
  "resourceMetrics": [
    {
      "resource": {},
      "scopeMetrics": [
        {
          "scope": {
            "name": "io.f5.toda.observer",
            "version": "5.23.0"
          },
          "metrics": [
            {
              "name": "f5.virtual_server.clientside.received.bytes",
              "unit": "By",
              "sum": {
                "dataPoints": [
                  {
                    "attributes": [
                      {
                        "key": "f5.virtual_server.destination",
                        "value": {
                          "stringValue": "55.55.55.1"
                        }
                      },
                      {
                        "key": "f5.virtual_server.name",
                        "value": {
                          "stringValue": "spk-app-1-spk-app-tcp-8050-f5ing-testapp-virtual-server"
                        }
                      },
                      {
                        "key": "f5.virtual_server.source",
                        "value": {
                          "stringValue": "0.0.0.0"
                        }
                      },
                      {
                        "key": "k8s.namespace.name",
                        "value": {
                          "stringValue": "spk-app-1"
                        }
                      },
                      {
                        "key": "target.namespace",
                        "value": {
                          "stringValue": "default"
                        }
                      },
                      {
                        "key": "observer.job.mode",
                        "value": {
                          "stringValue": "aggregated"
                        }
                      }
                    ],
                    "startTimeUnixNano": "1765878400960332314",
                    "timeUnixNano": "1765878400962968310",
                    "asInt": "0"
                  }
                ],
                "aggregationTemporality": 2,
                "isMonotonic": true
              }
            }
          ]
        }
      ]
    }
  ]
}

Distributed Toda Pods Installation

For the installation of the Receiver, Observer and Coordiator (Operator), refer to the FLO section.

Metrics Export Intervals

Note: By default, the metrics export interval from the Receiver and Observer to OTEL is fixed at 2 minutes in the Coordinator (Operator) ConfigMap and cannot be changed.

OTEL Statistics

The full list of OTEL statistics can be reviewed here.

Prometheus and Grafana

Note: Prometheus and Grafana are not yet integrated with the FLO installation. However, they can be installed manually to view metrics.

The examples below show how the metrics can be used in Prometheus and Grafana.

Prometheus

For the metric f5.virtual_server.clientside.received.bytes, you can view the metrics in Prometheus using the following query:

f5_tmm_f5_virtual_server_clientside_received_bytes_total 

Note: f5_tmm is a prefix applied to all metrics by OTEL.

Example

Prometheus

Prometheus Integration for BIG-IP Next for Kubernetes Metrics

This section describes the procedure to expose BIG-IP Next for Kubernetes Metrics to Prometheus. The commands and Custom Resources (CRs) provided in this section are for the BIG-IP Next for Kubernetes OpenTelemetry (OTEL) running in the default namespace. If OTEL is running in a different namespace, modify the commands and CRs accordingly.

Note: Do not skip steps 1 to 3, even if OpenTelemetry is already running. You may skip them only if you have completed these steps earlier.

  1. Create a valid certificate using cert-manager. Copy the following data into otel-certs.yaml file, and replace arm-ca-cluster-issuer with your CA issuer’s name. You may adjust other fields as needed, and ensure to keep the name and secretName unchanged.

    apiVersion: cert-manager.io/v1
    kind: Certificate
    metadata:
      name: external-otelsvr
    spec:
      subject:
        countries:
          - US
        provinces:
          - Washington
        localities:
          - Seattle
        organizations:
          - F5 Networks
        organizationalUnits:
          - PD
      emailAddresses:
        - clientcert@f5net.com
      commonName: f5net.com
      dnsNames:
        - otel-collector-svc.default.svc.cluster.local
      secretName: external-otelsvr-secret
      issuerRef:
        name: arm-ca-cluster-issuer
        group: cert-manager.io
        kind: ClusterIssuer
      duration: 2160h
      privateKey:
        rotationPolicy: Always
        encoding: PKCS1
        algorithm: RSA
        size: 4096
    
  2. Apply the certificate manifest using kubectl command.

    kubectl apply -f otel-certs.yaml
    
  3. Delete the existing pod(s) and restart the OpenTelemetry collector for the new certificates to apply.

    kubectl get pods --no-headers -o custom-columns=":metadata.name" | grep otel-collector | xargs -r kubectl delete pod
    
  4. Create a dedicated namespace for Prometheus.

    kubectl create namespace prometheus
    
  5. Create a valid certificate for Prometheus using cert-manager.

    a. Copy the following data into prom-certs.yaml file and replace arm-ca-cluster-issuer with the name of your CA issuer. You can modify other fields as needed.

    apiVersion: cert-manager.io/v1
    kind: Certificate
    metadata:
      name: prometheus-client
      namespace: prometheus
    spec:
      subject:
        countries:
          - US
        provinces:
          - Washington
        localities:
          - Seattle
        organizations:
          - F5 Networks
        organizationalUnits:
          - PD
      emailAddresses:
        - clientcert@f5net.com
      commonName: f5net.com
      secretName: prometheus-client-secret
      issuerRef:
        name: arm-ca-cluster-issuer
        group: cert-manager.io
        kind: ClusterIssuer
      # Lifetime of the Certificate is 1 hour, not configurable
      duration: 2160h
      privateKey:
        rotationPolicy: Always
        encoding: PKCS1
        algorithm: RSA
        size: 4096
    

    b. Apply the certificate manifest using kubectl command.

    kubectl apply -f prom-certs.yaml
    
  6. Create a custom values.yaml file to configure and customize the Prometheus installation using Helm. Copy the following data into the file created.

    prometheus-pushgateway:
      enabled: false
    
    prometheus-node-exporter:
      enabled: false 
    
    kube-state-metrics:
      enabled: false 
    
    alertmanager:
      enabled: false 
    
    configmapReload:
      prometheus: 
        enabled: false 
    serverFiles:
      prometheus.yml:
        scrape_configs:
          - job_name: bnk-otel
            scheme: https
            static_configs:
              - targets:
                  - otel-collector-svc.default.svc.cluster.local:9090
            tls_config:
              cert_file: /etc/prometheus/certs/tls.crt
              key_file: /etc/prometheus/certs/tls.key
              ca_file: /etc/prometheus/certs/ca.crt
              insecure_skip_verify: false
    server:
      extraVolumes:
        - name: prometheus-tls
          secret:
            secretName: prometheus-client-secret
      extraVolumeMounts:
        - name: prometheus-tls
          mountPath: /etc/prometheus/certs
          readOnly: true
      global:
        scrape_interval: 10s
      service:
        type: "NodePort"
        nodePort: 31929
      configmapReload:
        enabled: false
      persistentVolume:
        enabled: false
    
  7. Deploy Prometheus with Helm.

    helm install prometheus oci://ghcr.io/prometheus-community/charts/prometheus -n prometheus --atomic -f values.yaml
    
  8. Ensure the BIG-IP Next for Kubernetes scrape configuration is active, status is healthy, and there are no errors.

    curl http://172.18.0.4:31929/api/v1/targets | jq
    

    Note: Replace 172.18.0.4 with the IP address of the node where Prometheus is running.

  9. Run the following command to list all BIG-IP Next for Kubernetes metrics currently ingested by Prometheus.

    curl http://172.18.0.4:31929/api/v1/label/__name__/values | jq
    

    Note: Replace 172.18.0.4 with the IP address of the node where Prometheus is running.

  10. Get total server-side connections for all pool members.

    curl "http://172.18.0.4:31929/api/v1/query?query=f5_tmm_f5_pool_member_serverside_connections_count_total" | jq
    

    Note: Replace 172.18.0.4 with the IP address of the node where Prometheus is running.

Viewing Metrics

You can view the aggregated TMM metrics (including those from TMM processes that have died) inside the Kubernetes pod that collects them.

  1. Open an interactive shell session. For example, use command kubectl exec -it <receiver-pod-name> -n <namespace> -- sh where is the actual pod name and is the namespace of the pod.

  2. Execute the metrics commands inside the shell to view the array of values You can use these commands

    1. mdb –list | grep merged

    2. mdb –segment <name_of_table_segment_from_list>

Note: The metrics are displayed in an array and it is not friendly to read. However, you can use the values for debugging purposes.

Grafana

Grafana can be connected to Prometheus to display dashboards based on the metrics sent to Prometheus. Here is an example of a Grafana dashboard for Virtual Server metrics.

Grafana

Viewing Observer metrics

You can now use Grafana dashboard to view observer metrics.

To view the dashboard,

  • Download this JSON file.

  • Open Grafana webpage.

  • Import the file to Grafana using the Import dashboard screen.

The Observer dashboard has three main sections:

GRPC metrics - Enables you to monitor gRPC communication performance between observer containers, including call latency and request counts. Go Runtime metrics - Enables you to monitor resource consumption of observer pods, including goroutine counts, heap memory, and object allocation rates. Storage/aggregation metrics - Enables you to monitor storage efficiency and merge operations that preserve cumulative metrics from terminated TMM pods. Merge operations consolidate dead pod metrics into a single storage location to prevent data loss and optimize disk usage.

Sample panels of Grafana Dashboard

GRPC metrics GRPC metrics Go Runtime metrics Go Runtime metrics Storage/aggregation metrics Storage/aggregation metrics