Distributed Toda for Stats Aggregation

Overview

BIG-IP Next for Kubernetes generates a large amount of data, at a rate of terabytes per second. To efficiently manage this high-volume of statistics, the Distributed Toda for Stats Aggregation system has been enhanced with three primary pods: Receiver, Observer, and Coordinator (Operator).

Distributed Toda Pods Roles

In this section, you will find details about the responsibilities of every Distributed Toda pod.

  • Receiver: The receiver runs in a StatefulSet. It collects metrics from the files and then sends them to the Observer using gRPC (Remote Procedure Call) with mutual TLS (mTLS) for aggregation.

  • Observer: The observer also runs in a StatefulSet. It aggregates the metrics across multiple TMMs, which it receives from the receivers, and sends the metrics to the (OTEL) collector over gRPC with mTLS for aggregation and standardization.

  • Coordinator (Operator): The operator is responsible for overseeing the entire process. It coordinates the collection and aggregation of resources with corresponding requests over gRPC with mTLS, ensuring fast and safe metrics flow.

Metrics Flow Architecture

Metrics Flow Architecture

Prerequisites

Ensure you have:

  • Storage Class (SC) with ReadWriteMany (RWX) access: As TMMs run on DPU nodes, TMMs and Observers always run on separate nodes. Therefore, local storage is no longer viable, and an SC with RWX access (such as Network File System (NFS)) is required. If the default storage class does not support RWX, the Observers may remain pending. For more information, see Create Storage Class.

CSV Files

The tmstatsd process creates CSV files to ensure that metrics from various tmctl tables are saved in a consistent format. The TMMs save the metrics to CSV files periodically, with each TMM creating its separate files for each table. These files are then stored in specific folders, one for each TMM.

Distributed Toda Pods Installation

For the installation of the Receiver, Observer and Coordiator (Operator), refer to the FLO section.

Metrics Export Intervals

Note: By default, the metrics export interval from the Receiver and Observer to OTEL is fixed at 2 minutes in the Coordinator (Operator) ConfigMap and cannot be changed.

OTEL Statistics

The full list of OTEL statistics can be reviewed here.

Prometheus and Grafana

Note: Prometheus and Grafana are not yet integrated with the FLO installation. However, they can be installed manually to view metrics.

The examples below show how the metrics can be used in Prometheus and Grafana.

Prometheus

For the metric f5.virtual_server.clientside.received.bytes, you can view the metric in Prometheus using the following query:

f5_tmm_f5_virtual_server_clientside_received_bytes_total 

Note: f5_tmm is a prefix applied to all metrics by OTEL.

OTEL Config Example

exporters:
  prometheus:
    endpoint: "0.0.0.0:9090"
    namespace: "f5-tmm"
    tls:
      client_ca_file: /external/otelsvr/ca.crt
      cert_file: /external/otelsvr/tls.crt
      key_file: /external/otelsvr/tls.key
      reload_interval: 3.6e+12
 
service:
    pipelines:
      metrics:
        receivers: [otlp]
        processors: [memory_limiter/with-settings]
        exporters: [prometheus, debug]

Example

Prometheus

Grafana

Grafana can be connected to Prometheus to display dashboards based on the metrics sent to Prometheus. Here is an example of a Grafana dashboard for Virtual Server metrics.

Grafana