Distributed Toda for Stats Aggregation¶
Overview¶
Cloud-Native Network Functions (CNFs) generates massive data, at a rate of terabytes per second. To manage this high-volume of statistics efficiently, the Distributed Telemetry over Data Aggregation (TODA) for Stats Aggregation system has been enhanced with four primary pods. The pods are Receiver, Observer, Coordinator (Operator), and TMM Scraper.
Distributed Toda Pods Roles¶
Following are the key responsibilities of every Distributed TODA pod:
Receiver: The receiver runs as a StatefulSet. It collects metrics from TMM, stores them persistently, and sends them to the Observer over gRPC (Remote Procedure Call) with mutual TLS (mTLS) for secure aggregation.
Observer Aggregator: The observer runs as a StatefulSet. It aggregates metrics from multiple TMMs, received from the receivers, and securely forwards them to the OTEL collector over gRPC with mTLS for further aggregation and standardization.
Coordinator (Operator): The operator oversees the entire metric collection and aggregation process. It coordinates the collection and aggregation of resources with corresponding requests over gRPC with mTLS, ensuring efficient and secure metrics flow.
TMM Scraper: TMM Scraper is an observer container that runs inside each TMM pod, replacing the tmstatsd tool. It directly serves metrics from tmctl over a gRPC response stream upon receiving requests from the Receiver.
Metrics Flow Architecture¶
This section highlights the Metrics flow architecture of V2 and V1 Metrics.
V2 Metrics¶
The following image depicts the V2 metrics flow architecture:

Example:
Following is an example of the virtual server clientside.received.bytes v2 metrics example collected from OTEL.
{
"resourceMetrics": [
{
"scopeMetrics": [
{
"scope": {
"name": "io.f5.toda.observer",
"version": "5.5.30"
},
"metrics": [
{
"name": "f5.virtual_server.clientside.received.bytes",
"unit": "By",
"sum": {
"dataPoints": [
{
"attributes": [
{
"key": "f5.virtual_server.destination",
"value": {
"stringValue": "00:00:00:00:00:00:00:00:00:00:FF:FF:37:37:37:01:00:00:00:00"
}
},
{
"key": "f5.virtual_server.name",
"value": {
"stringValue": "spk-app-1-spk-app-tcp-8050-f5ing-testapp-virtual-server"
}
},
{
"key": "f5.virtual_server.source",
"value": {
"stringValue": "00:00:00:00:00:00:00:00:00:00:FF:FF:00:00:00:00:00:00:00:00"
}
}
],
"startTimeUnixNano": "1751183898216943595",
"timeUnixNano": "1751889465853307727",
"asInt": "0"
}
],
"aggregationTemporality": 2,
"isMonotonic": true
}
}
]
}
]
}
]
}
V1 Metrics¶
In V1 metric flow architecture, the metrics are streamed directly from TMM by tmstatsd to OTEL, without aggregation.
The following image depicts the V1 metrics flow architecture:

Example:
Following is an example of the virtual server clientside.received.bytes v1 metrics collected from OTEL.
V1 metrics
{
"resourceMetrics": [
{
"resource": {
"attributes": [
{
"key": "host.name",
"value": {
"stringValue": "f5-tmm-fb54985cc-6nbl2"
}
},
]
},
"scopeMetrics": [
{
"scope": {
"name": "demo-client-meter"
},
"metrics": [
{
"name": "virtual_server_stat/spk-app-1-spk-app-tcp-8050-f5ing-testapp-virtual-server/clientside.bytes_out",
"description": "TMM tmstatsd: table[virtual_server_stat] row[spk-app-1-spk-app-tcp-8050-f5ing-testapp-virtual-server] column[clientside.bytes_out] type:[Gauge] metric[virtual_server_stat/spk-app-1-spk-app-tcp-8050-f5ing-testapp-virtual-server/clientside.bytes_out]",
"gauge": {
"dataPoints": [
{
"attributes": [
{
"key": "column",
"value": {
"stringValue": "clientside.bytes_out"
}
},
{
"key": "name",
"value": {
"stringValue": "spk-app-1-spk-app-tcp-8050-f5ing-testapp-virtual-server"
}
},
{
"key": "tableName",
"value": {
"stringValue": "virtual_server_stat"
}
},
{
"key": "tmmID",
"value": {
"stringValue": "f5-tmm-fb54985cc-6nbl2"
}
}
],
"startTimeUnixNano": "11651379494838206464",
"timeUnixNano": "1751891173853319814",
"asInt": "0"
}
]
}
}
]
}
],
"schemaUrl": "https://opentelemetry.io/schemas/1.17.0"
}
]
}
Advantages of V2 Metrics over V1 Metrics¶
The V2 metrics system introduces several enhancements compared to the V1 metrics system, improving standardization, aggregation, and descriptive capabilities across the telemetry data. Following are a few of advantages listed here:
V2 metrics are aggregated across all TMMs, providing a unified view of performance and resource usage across the entire system.
Each metric in V2 is standardized with fixed names, ensuring consistency and simplifying interpretation across different platforms and tools.
V2 metrics include additional labels (also known as OpenTelemetry (OTEL) attributes) to provide detailed descriptions for each metric, enabling better context and improved observability.
Distributed TODA Pods¶
This section outlines the procedures to install required stats environment for V2 and V1 metrics.
Installation on V2¶
Following is the procedure to install the stats infrastructure for v2 metrics:
Note: Install the Observer in the same namespace as the
F5Ingress.
Enable the V2 Metrics on TMM during the installation of
f5ingresspod. For information on how to enable V2 Metrics, see step 11 in the TMM Values of BIG-IP Controller.Change into the directory containing the latest CNFs Software, and obtain the
f5-toda-observerHelm chart version.In this example, the CNF files are in the cnfinstall directory:
cd cnfinstall
ls -1 tar | grep observer
The example output should appear similar to the following:
f5-toda-observer-v4.56.4-0.0.15.tgz
Create a Helm values file named
observer_values.yamlfile and set theimage.repositoryandfluentbit_sidecar.image.repositoryparameters.image: repository: registry.com persistence: storageClassName: "" accessMode: ReadWriteOnce # size: 3Gi platformType: "robin" fluentbit_sidecar: image: repository: registry.com fluentbit: tls: enabled: true fluentd: host: f5-toda-fluentd.cnf-gateway.svc.cluster.local.
Note: If the persistence profile is not defined (default) or explicitly set to null, the
storageClassNamespecification will not be set, and the default provisioner will be used.Install the observer using helm.
helm install observer f5-toda-observer-<VERSION>.tgz -f observer_values.yaml
Note: Run
helm show values <observer-chart>command to view advanced options.
Important: The Operator and Receivers share the same volume. When they both are deployed on the same node, any storage class is supported. However, if Receivers are distributed across multiple nodes, a ReadWriteMany-compatible storage class, such as NFS, is required.
Installation on V1¶
Enable the V1 Metrics on TMM during the installation of f5ingress pod. For information on how to enable V1 Metrics, see step 11 in the TMM Values of BIG-IP Controller
Prometheus and Grafana¶
Following are a few examples on how to view metrics using Prometheus and Grafana:
Prometheus Integration for Metrics¶
This section describes the procedure to expose Metrics to Prometheus. The commands and Custom Resources (CRs) provided in this section are for the OpenTelemetry (OTEL) running in the default namespace. If OTEL is running in a different namespace, modify the commands and CRs accordingly.
Note: To create required certificates for OTEL to communicate with third party applications such as Prometheus, see OTEL Collectors section.
Create a dedicated namespace for Prometheus.
oc create namespace prometheus
Create a valid certificate for Prometheus using cert-manager.
a. Copy the following data into
prom-certs.yamlfile and replacearm-ca-cluster-issuerwith the name of your CA issuer. You can modify other fields as needed.apiVersion: cert-manager.io/v1 kind: Certificate metadata: name: prometheus-client namespace: prometheus spec: subject: countries: - US provinces: - Washington localities: - Seattle organizations: - F5 Networks organizationalUnits: - PD emailAddresses: - clientcert@f5net.com commonName: f5net.com secretName: prometheus-client-secret issuerRef: name: arm-ca-cluster-issuer group: cert-manager.io kind: ClusterIssuer # Lifetime of the Certificate is 1 hour, not configurable duration: 2160h privateKey: rotationPolicy: Always encoding: PKCS1 algorithm: RSA size: 4096
b. Apply the certificate manifest using oc command.
oc apply -f prom-certs.yaml
Create a custom
values.yamlfile to configure and customize the Prometheus installation using Helm. Copy the following data into the file created.prometheus-pushgateway: enabled: false prometheus-node-exporter: enabled: false kube-state-metrics: enabled: false alertmanager: enabled: false configmapReload: prometheus: enabled: false serverFiles: prometheus.yml: scrape_configs: - job_name: bnk-otel scheme: https static_configs: - targets: - otel-collector-svc.default.svc.cluster.local:9090 tls_config: cert_file: /etc/prometheus/certs/tls.crt key_file: /etc/prometheus/certs/tls.key ca_file: /etc/prometheus/certs/ca.crt insecure_skip_verify: false server: extraVolumes: - name: prometheus-tls secret: secretName: prometheus-client-secret extraVolumeMounts: - name: prometheus-tls mountPath: /etc/prometheus/certs readOnly: true global: scrape_interval: 10s service: type: "NodePort" nodePort: 31929 configmapReload: enabled: false persistentVolume: enabled: false
Deploy Prometheus with Helm.
helm install prometheus oci://ghcr.io/prometheus-community/charts/prometheus -n prometheus --atomic -f values.yaml
Ensure the scrape configuration is active, status is healthy, and there are no errors.
curl http://172.18.0.4:31929/api/v1/targets | jq
Note: Replace
172.18.0.4with the IP address of the node where Prometheus is running.Run the following command to list all metrics currently ingested by Prometheus.
curl http://172.18.0.4:31929/api/v1/label/__name__/values | jq
Note: Replace
172.18.0.4with the IP address of the node where Prometheus is running.Get total server-side connections for all pool members.
curl "http://172.18.0.4:31929/api/v1/query?query=f5_tmm_f5_pool_member_serverside_connections_count_total" | jq
Note: Replace
172.18.0.4with the IP address of the node where Prometheus is running.
Grafana¶
Grafana can be connected to Prometheus to display dashboards based on the metrics sent to Prometheus. Here is an example of a Grafana dashboard for Virtual Server metrics.
