Envoy statistics#

By default, Aspen Mesh configures the Istio proxy (Envoy) to record a minimal set of statistics to reduce the overall performance footprint of the installed sidecars. The default collection keys are:

  • cluster_manager

  • listener_manager

  • server

  • cluster.xds-grpc

  • wasm

These default statistics are perfect for a majority of applications, but there are use cases where additional statistics are needed to fully understand what is happening within your service mesh. With Aspen Mesh you have the capability to capture additional envoy statistics that you may need, such as capturing the number of request retries. This is easily achieved by updating your mesh proxy configuration or adding the appropriate annotations to specific workloads or gateways.

The list of Envoy statistics are available here (Please note that not all of these may be available in a given release).

Warning

Including additional Envoy statistics might significantly increase the number of time series collected by Prometheus. Special care may need to be taken when configuring Prometheus to reduce cardinality.

Example: Imagine that you would like to know how many times a request is being automatically retried for a particular workload#

This can be accomplished by adding the Envoy upstream_rq_retry statistic as part of the proxy.istio.io/config annotation on the workload under observation.

Below is an example of the annotation:

    metadata:
      annotations:
        proxy.istio.io/config: |-
          proxyStatsMatcher:
            inclusionPrefixes:
            - "upstream_rq_retry"

Here is an example of the sleep Deployment that is modified to utilize the annotation:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sleep
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sleep
  template:
    metadata:
      labels:
        app: sleep
      annotations:
        proxy.istio.io/config: |-
          proxyStatsMatcher:
            inclusionPrefixes:
            - "upstream_rq_retry"
    spec:
      terminationGracePeriodSeconds: 0
      containers:
      - name: sleep
        image: curlimages/curl
        command: ["/bin/sh", "-c", "while true; do curl -XGET https://httpbin.org/status/500 && sleep 5000; done"]
        imagePullPolicy: IfNotPresent

Warning

Once applied you must restart your pod to have the Istio proxy (Envoy) pick up the stats matcher configuration.

cluster.outbound|80||httpbin.example.svc.cluster.local.retry.upstream_rq_503: 3
cluster.outbound|80||httpbin.example.svc.cluster.local.retry.upstream_rq_5xx: 3
cluster.outbound|80||httpbin.example.svc.cluster.local.retry.upstream_rq_completed: 3
cluster.outbound|80||httpbin.example.svc.cluster.local.upstream_rq_retry: 3