Distributed Toda for Stats Aggregation

Overview

BIG-IP Next for Kubernetes generates a large amount of data, at a rate of terabytes per second. To efficiently manage this high-volume of statistics, the Distributed Toda for Stats Aggregation system has been enhanced with three primary pods: Receiver, Observer, and Coordinator (Operator).

Distributed Toda Pods Roles

In this section, you will find details about the responsibilities of every Distributed Toda pod.

  • Receiver: The receiver runs in a StatefulSet. It collects metrics from the files and then sends them to the Observer using gRPC (Remote Procedure Call) with mutual TLS (mTLS) for aggregation.

  • Observer: The observer also runs in a StatefulSet. It aggregates the metrics across multiple TMMs, which it receives from the receivers, and sends the metrics to the (OTEL) collector over gRPC with mTLS for aggregation and standardization.

  • Coordinator (Operator): The operator is responsible for overseeing the entire process. It coordinates the collection and aggregation of resources with corresponding requests over gRPC with mTLS, ensuring fast and safe metrics flow.

  • TMM Scraper: TMM Scraper is an observer container that runs inside each TMM pod, replacing the tmstatsd tool. It directly serves metrics from tmctl over a gRPC response stream upon receiving requests from the Receiver. This approach eliminates the need for TMM to mount persistent storage or write metrics directly to CSV files.

Metrics Flow Architecture

V2 Metrics

The following image depicts the V2 metrics flow architecture:

Example:

Following is an example of v2 Metrics of the virtual server clientside.received.bytes metric example collected from OTEL.

V1 Metrics

In V1 metric flow architecture, the metrics are streamed directly from TMM by tmstatsd to OTEL, without aggregation.

The following image depicts the V1 metrics flow architecture:

CSV Files

The tmstatsd process creates CSV files to ensure that metrics from various tmctl tables are saved in a consistent format. The TMMs save the metrics to CSV files periodically, with each TMM creating its separate files for each table. These files are then stored in specific folders, one for each TMM.

Distributed Toda Pods Installation

For the installation of the Receiver, Observer and Coordiator (Operator), refer to the FLO section.

Metrics Export Intervals

Note: By default, the metrics export interval from the Receiver and Observer to OTEL is fixed at 2 minutes in the Coordinator (Operator) ConfigMap and cannot be changed.

OTEL Statistics

The full list of OTEL statistics can be reviewed here.

Prometheus and Grafana

Note: Prometheus and Grafana are not yet integrated with the FLO installation. However, they can be installed manually to view metrics.

The examples below show how the metrics can be used in Prometheus and Grafana.

Prometheus

For the metric f5.virtual_server.clientside.received.bytes, you can view the metrics in Prometheus using the following query:

f5_tmm_f5_virtual_server_clientside_received_bytes_total 

Note: f5_tmm is a prefix applied to all metrics by OTEL.

Example

Prometheus

Prometheus Integration for BNK Metrics

This section describes the procedure to expose BNK Metrics to Prometheus. The commands and Custom Resources (CRs) provided in this section are for the BNK OpenTelemetry (OTEL) running in the default namespace. If OTEL is running in a different namespace, modify the commands and CRs accordingly.

Note: Do not skip steps 1 to 3, even if OpenTelemetry is already running. You may skip them only if you have completed these steps earlier.

  1. Create a valid certificate using cert-manager. Copy the following data into otel-certs.yaml file, and replace arm-ca-cluster-issuer with your CA issuer’s name. You may adjust other fields as needed, and ensure to keep the name and secretName unchanged.

    apiVersion: cert-manager.io/v1
    kind: Certificate
    metadata:
      name: external-otelsvr
    spec:
      subject:
        countries:
          - US
        provinces:
          - Washington
        localities:
          - Seattle
        organizations:
          - F5 Networks
        organizationalUnits:
          - PD
      emailAddresses:
        - clientcert@f5net.com
      commonName: f5net.com
      dnsNames:
        - otel-collector-svc.default.svc.cluster.local
      secretName: external-otelsvr-secret
      issuerRef:
        name: arm-ca-cluster-issuer
        group: cert-manager.io
        kind: ClusterIssuer
      duration: 2160h
      privateKey:
        rotationPolicy: Always
        encoding: PKCS1
        algorithm: RSA
        size: 4096
    
  2. Apply the certificate manifest using kubectl command.

    kubectl apply -f otel-certs.yaml
    
  3. Delete the existing pod(s) and restart the OpenTelemetry collector for the new certificates to apply.

    kubectl get pods --no-headers -o custom-columns=":metadata.name" | grep otel-collector | xargs -r kubectl delete pod
    
  4. Create a dedicated namespace for Prometheus.

    kubectl create namespace prometheus
    
  5. Create a valid certificate for Prometheus using cert-manager.

    a. Copy the following data into prom-certs.yaml file and replace arm-ca-cluster-issuer with the name of your CA issuer. You can modify other fields as needed.

    apiVersion: cert-manager.io/v1
    kind: Certificate
    metadata:
      name: prometheus-client
      namespace: prometheus
    spec:
      subject:
        countries:
          - US
        provinces:
          - Washington
        localities:
          - Seattle
        organizations:
          - F5 Networks
        organizationalUnits:
          - PD
      emailAddresses:
        - clientcert@f5net.com
      commonName: f5net.com
      secretName: prometheus-client-secret
      issuerRef:
        name: arm-ca-cluster-issuer
        group: cert-manager.io
        kind: ClusterIssuer
      # Lifetime of the Certificate is 1 hour, not configurable
      duration: 2160h
      privateKey:
        rotationPolicy: Always
        encoding: PKCS1
        algorithm: RSA
        size: 4096
    

    b. Apply the certificate manifest using kubectl command.

    kubectl apply -f prom-certs.yaml
    
  6. Create a custom values.yaml file to configure and customize the Prometheus installation using Helm. Copy the following data into the file created.

    prometheus-pushgateway:
      enabled: false
    
    prometheus-node-exporter:
      enabled: false 
    
    kube-state-metrics:
      enabled: false 
    
    alertmanager:
      enabled: false 
    
    configmapReload:
      prometheus: 
        enabled: false 
    serverFiles:
      prometheus.yml:
        scrape_configs:
          - job_name: bnk-otel
            scheme: https
            static_configs:
              - targets:
                  - otel-collector-svc.default.svc.cluster.local:9090
            tls_config:
              cert_file: /etc/prometheus/certs/tls.crt
              key_file: /etc/prometheus/certs/tls.key
              ca_file: /etc/prometheus/certs/ca.crt
              insecure_skip_verify: false
    server:
      extraVolumes:
        - name: prometheus-tls
          secret:
            secretName: prometheus-client-secret
      extraVolumeMounts:
        - name: prometheus-tls
          mountPath: /etc/prometheus/certs
          readOnly: true
      global:
        scrape_interval: 10s
      service:
        type: "NodePort"
        nodePort: 31929
      configmapReload:
        enabled: false
      persistentVolume:
        enabled: false
    
  7. Deploy Prometheus with Helm.

    helm install prometheus oci://ghcr.io/prometheus-community/charts/prometheus -n prometheus --atomic -f values.yaml
    
  8. Ensure the BNK scrape configuration is active, status is healthy, and there are no errors.

    curl http://172.18.0.4:31929/api/v1/targets | jq
    

    Note: Replace 172.18.0.4 with the IP address of the node where Prometheus is running.

  9. Run the following command to list all BNK metrics currently ingested by Prometheus.

    curl http://172.18.0.4:31929/api/v1/label/__name__/values | jq
    

    Note: Replace 172.18.0.4 with the IP address of the node where Prometheus is running.

  10. Get total server-side connections for all pool members.

    curl "http://172.18.0.4:31929/api/v1/query?query=f5_tmm_f5_pool_member_serverside_connections_count_total" | jq
    

    Note: Replace 172.18.0.4 with the IP address of the node where Prometheus is running.

Grafana

Grafana can be connected to Prometheus to display dashboards based on the metrics sent to Prometheus. Here is an example of a Grafana dashboard for Virtual Server metrics.

Grafana