Stats Collection and Aggregation

Overview

The BIG-IP Next for Kubernetes (Host) generates terabytes of statistics per second. To manage this load to have effective telemetry, in V2 metrics architecture, the system utilises three primary components: the receiver, the observer (or aggregator), and the operator. TMM provides the statistics, and the metrics scraper containers stream the metrics to OTel. Then, statistics can be exposed with third -party vendors such as Prometheus and Grafana.

This page describes V2 metrics architecture (with aggregation and standardization) and installation procedures, as well as those for legacy V1 metrics.

Metric Pods Roles (V2)

  • Receiver: The receiver runs as a StatefulSet. It gathers metrics from TMM targets, persists them, and securely transmits them to the observer via gRPC with mutual TLS (mTLS) for aggregation. Also handles the metrics of dead TMMs when the specific receiver has collected metrics before a TMM is down.

  • Observer: The observer ‌runs as a StatefulSet. Observers now support scraping metrics from multiple components beyond TMM. They also scrape and export their own telemetry to OTeL, including gRPC metrics, aggregation metrics, and storage-related metrics. These capabilities let you monitor observer performance.

  • Operator: The operator runs as a Deployment and orchestrates the end-to-end metrics collection and aggregation lifecycle. It discovers available TMM scrapers, receivers, and aggregators, then load-balances TMMs across receivers. Related topic Aggregation Mode

  • TMM Scraper: TMM Scraper is an observer container that runs inside each TMM pod, replacing the tmstatsd tool. It directly serves metrics from tmctl over a gRPC response stream upon receiving requests from the receiver.

Metrics Flow Architecture

This section highlights the Metrics flow architecture of V2 and V1 Metrics.

V2 Metrics

The following image depicts the V2 metrics flow architecture:

Example:

Following is an example of the virtual server clientside.received.bytes v2 metrics example collected from OTeL.

{
  "resourceMetrics": [
    {
      "resource": {},
      "scopeMetrics": [
        {
          "scope": {
            "name": "io.f5.toda.observer",
            "version": "5.23.0"
          },
          "metrics": [
            {
              "name": "f5.virtual_server.clientside.received.bytes",
              "unit": "By",
              "sum": {
                "dataPoints": [
                  {
                    "attributes": [
                      {
                        "key": "f5.virtual_server.destination",
                        "value": {
                          "stringValue": "55.55.55.1"
                        }
                      },
                      {
                        "key": "f5.virtual_server.name",
                        "value": {
                          "stringValue": "spk-app-1-spk-app-tcp-8050-f5ing-testapp-virtual-server"
                        }
                      },
                      {
                        "key": "f5.virtual_server.source",
                        "value": {
                          "stringValue": "0.0.0.0"
                        }
                      },
                      {
                        "key": "k8s.namespace.name",
                        "value": {
                          "stringValue": "spk-app-1"
                        }
                      },
                      {
                        "key": "target.namespace",
                        "value": {
                          "stringValue": "default"
                        }
                      },
                      {
                        "key": "observer.job.mode",
                        "value": {
                          "stringValue": "aggregated"
                        }
                      }
                    ],
                    "startTimeUnixNano": "1765878400960332314",
                    "timeUnixNano": "1765878400962968310",
                    "asInt": "0"
                  }
                ],
                "aggregationTemporality": 2,
                "isMonotonic": true
              }
            }
          ]
        }
      ]
    }
  ]
}

V1 Metrics

In V1 metric flow architecture, the metrics are streamed directly from TMM by tmstatsd to OTeL, without aggregation.

The following image depicts the V1 metrics flow architecture:

Example:

Following is an example of the virtual server clientside.received.bytes v1 metrics collected from OTeL.

V1 metrics

{
    "resourceMetrics": [
        {
            "resource": {
                "attributes": [
                    {
                        "key": "host.name",
                        "value": {
                            "stringValue": "f5-tmm-fb54985cc-6nbl2"
                        }
                    },
                ]
            },
            "scopeMetrics": [
                {
                    "scope": {
                        "name": "demo-client-meter"
                    },
                    "metrics": [
                        {
                            "name": "virtual_server_stat/spk-app-1-spk-app-tcp-8050-f5ing-testapp-virtual-server/clientside.bytes_out",
                            "description": "TMM tmstatsd: table[virtual_server_stat] row[spk-app-1-spk-app-tcp-8050-f5ing-testapp-virtual-server] column[clientside.bytes_out] type:[Gauge] metric[virtual_server_stat/spk-app-1-spk-app-tcp-8050-f5ing-testapp-virtual-server/clientside.bytes_out]",
                            "gauge": {
                                "dataPoints": [
                                    {
                                        "attributes": [
                                            {
                                                "key": "column",
                                                "value": {
                                                    "stringValue": "clientside.bytes_out"
                                                }
                                            },
                                            {
                                                "key": "name",
                                                "value": {
                                                    "stringValue": "spk-app-1-spk-app-tcp-8050-f5ing-testapp-virtual-server"
                                                }
                                            },
                                            {
                                                "key": "tableName",
                                                "value": {
                                                    "stringValue": "virtual_server_stat"
                                                }
                                            },
                                            {
                                                "key": "tmmID",
                                                "value": {
                                                    "stringValue": "f5-tmm-fb54985cc-6nbl2"
                                                }
                                            }
                                        ],
                                        "startTimeUnixNano": "11651379494838206464",
                                        "timeUnixNano": "1751891173853319814",
                                        "asInt": "0"
                                    }
                                ]
                            }
                        }
                    ]
                }
            ],
            "schemaUrl": "https://opentelemetry.io/schemas/1.17.0"
        }
    ]
}

Advantages of V2 Metrics over V1 Metrics

The V2 metrics system introduces several enhancements compared to the V1 metrics system, improving standardization, aggregation, and descriptive capabilities across the telemetry data. Following are a few of advantages listed here:

  • V2 metrics are aggregated across all TMMs, providing a unified view of performance and resource usage across the entire system.

  • Each metric in V2 is standardized with fixed names, ensuring consistency and simplifying interpretation across different platforms and tools.

  • V2 metrics include additional labels (also known as OpenTelemetry (OTeL) attributes) to provide detailed descriptions for each metric, enabling better context and improved observability.

Installation of V2 (helm charts)

  1. Enable the V2 Metrics on TMM during the installation of f5ingress pod. For information on how to enable V2 Metrics, see step 11 in the TMM Values of BIG-IP Controller.

  2. Change into the directory containing the latest SPK Software, and obtain the f5-toda-observer Helm chart version.

    In this example, the SPK files are in the spkinstall directory:

    cd spkinstall
    
    ls -1 tar | grep observer
    

    The example output should appear similar to the following:

    f5-toda-observer-v4.56.4-0.0.15.tgz
    
  3. Create a Helm values file named observer_values.yaml file and set the image.repository and fluentbit_sidecar.image.repository parameters.

    image:
      repository: registry.com
    
    persistence:
       storageClassName: ""
       accessMode: ReadWriteOnce
       # size: 3Gi
    
    platformType: "openshift"
    
    fluentbit_sidecar:
      image:
        repository: registry.com
      fluentbit:
        tls:
        enabled: true
      fluentd:       
        host: f5-toda-fluentd.spk-utilities.svc.cluster.local.
    
      # - `targets_namespaces`: List of namespaces where the operator **actively monitors target pods**. 
      #   The operator scrapes metrics from these target pods and only consumes `Scrape` custom resources from those namespaces. 
    
      # - `watchNamespaces`: List of Kubernetes namespaces where f5 cr's installed and it used by observer to  **only to label or correlate metrics**.  
      #   The operator uses this list to determine which namespace each metric belongs to and adds that namespace as a label.  
    
      # Key Difference: 
      #   - `targets_namespaces` → Where to look and scrape metrics. 
      #   - `watchNamespaces` → What namespaces to label/associate the metrics with. 
      
    
      # Example: 
      # targets_namespaces: 
      #  - f5-alpha 
      #  - f5-beta 
      # watchNamespaces: 
      #  - spk-tcp-app 
      #  - spk-udp-app  
    

    Note

    • If the persistence profile is not defined (default) or explicitly set to null, the storageClassName specification will not be set, and the default provisioner will be used.

    • Observer has the ability to watch all targets or only specific targets. Refer target_namespaces and watchNamespaces parameters in the values file for more details.

  4. Install the observer using helm.

    helm install observer f5-toda-observer-<VERSION>.tgz   -f observer_values.yaml
    

    Note

    Run helm show values <observer-chart> command to view advanced options.

Aggregation Mode (Applies only for V2)

Aggregation mode controls how the observer combines and exports metrics per table. For each table, you choose a mode—Aggregated (combine metrics across all TMMs), Semi-Aggregated (combine but exclude dead TMMs), or Diagnostic (no aggregation; export raw per‑TMM metrics)—and set a per‑table export interval. The operator applies these settings via Jobs in the f5-observer-operator-config ConfigMap.

Note

By default, the operator creates a Job for every table with a 1-minute interval and Aggregated mode. You can define multiple Jobs for the same table with different intervals and modes.

The YAML below configures multiple Jobs for the same table (profile_http_stat), each with a different interval and mode, enabling flexible per-table metric collection.

      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: f5-observer-operator-config
      data:
        config.yaml: |-
          plugins:
            io.observer.insights.scrape: {}
            io.observer.operator.operator:
              default_collect_interval: 60s
              namespace: default
              namespaces: ["gamma", "delta"]
              targets_namespaces: ["alpha", "beta", "default"]
              jobs: 
                - table: profile_http_stat
                  mode: diagnostic
                  interval: 3s       
                - table: profile_http_stat
                  mode: semi-aggregated
                  interval: 10s 
                - table: profile_http_stat 

Configuring Aggregation mode

  1. Configure the observer.

  2. Set the interval and mode by modifying the operator config map file (Eg: f5-observer-operator-config.yaml).

    Note

    You can use kubectl edit cm f5-observer-operator-config command to edit the ConfigMap when the observer is configured and is in the running state.

  3. Apply the operator config map file.

  4. Restart the operator pod.

  5. Refresh the Grafana dashboard to see the updated frequency changes.

Installation of V1 (Helm charts)

Enable the V1 Metrics on TMM during the installation of f5ingress pod. For information on how to enable V1 Metrics, see step 11 in the TMM Values of BIG-IP Controller.

OTeL Statistics

The full list of OTeL statistics can be reviewed here.

Prometheus and Grafana

Following are a few examples on how to view ‌metrics using Prometheus and Grafana:

Prometheus Integration for Metrics

This section describes the procedure to expose BNK Metrics to Prometheus. The commands and Custom Resources (CRs) provided in this section are for the BNK OpenTelemetry (OTeL) running in the default namespace. If OTeL is running in a different namespace, modify the commands and CRs accordingly.

Note

To create required certificates for OTeL to communicate with third party applications such as Prometheus, see OTeL Collectors section.

  1. Create a dedicated namespace for Prometheus.

    oc create namespace prometheus
    
  2. Create a valid certificate for Prometheus using cert-manager.

    a. Copy the following data into prom-certs.yaml file and replace arm-ca-cluster-issuer with the name of your CA issuer. You can modify other fields as needed.

    apiVersion: cert-manager.io/v1
    kind: Certificate
    metadata:
      name: prometheus-client
      namespace: prometheus
    spec:
      subject:
        countries:
          - US
        provinces:
          - Washington
        localities:
          - Seattle
        organizations:
          - F5 Networks
        organizationalUnits:
          - PD
      emailAddresses:
        - clientcert@f5net.com
      commonName: f5net.com
      secretName: prometheus-client-secret
      issuerRef:
        name: arm-ca-cluster-issuer
        group: cert-manager.io
        kind: ClusterIssuer
      # Lifetime of the Certificate is 1 hour, not configurable
      duration: 2160h
      privateKey:
        rotationPolicy: Always
        encoding: PKCS1
        algorithm: RSA
        size: 4096
    

    b. Apply the certificate manifest using oc command.

    oc apply -f prom-certs.yaml
    
  3. Create a custom values.yaml file to configure and customize the Prometheus installation using Helm. Copy the following data into the file created.

    prometheus-pushgateway:
      enabled: false
    
    prometheus-node-exporter:
      enabled: false 
    
    kube-state-metrics:
      enabled: false 
    
    alertmanager:
      enabled: false 
    
    configmapReload:
      prometheus: 
        enabled: false 
    serverFiles:
      prometheus.yml:
        scrape_configs:
          - job_name: bnk-otel
            scheme: https
            static_configs:
              - targets:
                  - otel-collector-svc.default.svc.cluster.local:9090
            tls_config:
              cert_file: /etc/prometheus/certs/tls.crt
              key_file: /etc/prometheus/certs/tls.key
              ca_file: /etc/prometheus/certs/ca.crt
              insecure_skip_verify: false
    server:
      extraVolumes:
        - name: prometheus-tls
          secret:
            secretName: prometheus-client-secret
      extraVolumeMounts:
        - name: prometheus-tls
          mountPath: /etc/prometheus/certs
          readOnly: true
      global:
        scrape_interval: 10s
      service:
        type: "NodePort"
        nodePort: 31929
      configmapReload:
        enabled: false
      persistentVolume:
        enabled: false
    
  4. Deploy Prometheus with Helm.

    helm install prometheus oci://ghcr.io/prometheus-community/charts/prometheus -n prometheus --atomic -f values.yaml
    
  5. Ensure the BNK scrape configuration is active, status is healthy, and there are no errors.

    curl http://172.18.0.4:31929/api/v1/targets | jq
    

    Note

    Replace 172.18.0.4 with the IP address of the node where Prometheus is running.

  6. Run the following command to list all BNK metrics currently ingested by Prometheus.

    curl http://172.18.0.4:31929/api/v1/label/__name__/values | jq
    

    Note

    Replace 172.18.0.4 with the IP address of the node where Prometheus is running.

  7. Get total server-side connections for all pool members.

    curl "http://172.18.0.4:31929/api/v1/query?query=f5_tmm_f5_pool_member_serverside_connections_count_total" | jq
    

    Note

    Replace 172.18.0.4 with the IP address of the node where Prometheus is running.

Viewing Metrics

You can view the aggregated TMM metrics (including those from TMM processes that have died) inside the Kubernetes pod that collects them.

  1. Open an interactive shell session. For example, use command kubectl exec -it <receiver-pod-name> -n <namespace> -- sh where is the actual pod name and is the namespace of the pod.

  2. Execute the metrics commands inside the shell to view the array of values You can use these commands

    1. mdb –list | grep merged

    2. mdb –segment <name_of_table_segment_from_list>

Grafana

Grafana can be connected to Prometheus to display dashboards based on the metrics sent to Prometheus. Here is an example of a Grafana dashboard for Virtual Server metrics.

Grafana

Viewing Observer metrics

You can now use Grafana dashboard to view observer metrics.

To view the dashboard,

  • Download JSON file.

  • Open Grafana webpage.

  • Import the file to Grafana using the Import dashboard screen.

The observer dashboard has three main sections:

GRPC metrics - Enables you to monitor gRPC communication performance between observer containers, including call latency and request counts.

Go Runtime metrics - Enables you to monitor resource consumption of observer pods, including goroutine counts, heap memory, and object allocation rates.

Storage/aggregation metrics - Enables you to monitor storage efficiency and merge operations that preserve cumulative metrics from terminated TMM pods. Merge operations consolidate dead pod metrics into a single storage location to prevent data loss and optimize disk usage.

Sample panels of Grafana Dashboard

GRPC metrics

GRPC metrics

Go Runtime metrics

Go Runtime metrics

Storage/aggregation metrics

Storage/aggregation metrics