CNFs Coremond

Overview

The Coremond pod runs as a DaemonSet on Cloud-Native Network Functions (CNFs) and collects corefiles. A Core file is a snapshot of the memory and register state of a process or a program when it terminates unexpectedly due to an uncertain or unexpected event that triggers default signal handling. Root-cause analysis can be performed on the core file. The core files are generated either by a third party or by the kernel itself.

Coremond monitors /var/crash folder mapped to a volume to detect updates to core files as the Coremond pod does not have access to the operating system. When Coremond starts, it reads the core_pattern from /proc/sys/kernel/ to decide if the configured core_pattern is supported.

CNFs Robin.io platform store all the core files generated in a single directory on the host at /home/crash/f5. This directory is not created by default. You can create one or enable it through installation to store the core files. F5 recommends to enable the directory during installation.

To prevent the core files overload, Coremond rotates the core files for each process and keeps only the three latest files. Coremond also detects continuous process crashes within the same time frame and ignores coredump writing in such scenarios.

Prerequisites

Ensure you have the following:

  1. A working cluster with Robin/Tanzu platform.

  2. A linux based workstation

  3. A core_pattern file located at /proc/sys/kernel/core_pattern. Some of the supported core patterns are:

    • By default, for Openshift and Tanzu platforms, the core dump used by the system is systemd-coredump with xz, lz4 or zst extension, such as (|/usr/lib/systemd/systemd-coredumps %P %u %g %s %t 9223372036854775808 %h)

    • In Robin.io, the native Kernel must be /var/crash/core.%e.%p.%h.%t otherwise, an error is returned.

      | Specifier | Description | |——–|————| | %h | Hostname | | %e | Executable filename | | %p | pid of the process | | %t | UNIX time of dump |

_images/spk_info.png Note: F5 recommends to install the Coremond first before installing any other F5 components. This is suggested as if there are any other components installed prior to Coremond, they may generate the core files.

Procedures

Installation

  1. Obtain the [TAG/Version] from the CNFs 1.4.0 tarball.

  2. Install the Coremond by using the following syntax on Robin platform:

    helm install coremond tar/<helm-chart>.tgz \ -f <values>.yaml -n <project>
    

    For example:

    helm install coremond tar/coremond-0.5.12-0.1.5.tgz -n coremond
    
  3. You can edit the values.yaml file as per usecase and requirement. Following are some of the mandatory and optional settings that can done by editing the values.yaml file:

    Run the following command to install through Helm:

    echo 'platformType: "robin"' >> values.yaml
    helm install coremond f5-coremond-0.5.12-0.0.2.tgz
    
  4. You can edit the values.yaml file as per usecase and requirement. Following are some of the mandatory and optional settings that can done by editing the values.yaml file:

    a. Mandatory settings:

    • Override the image settings by specifying the custom image values:

      image:
      repository: repo.f5.com/images/f5-toda-docker
      name: f5-coremond
      tag: v
      pullPolicy: IfNotPresent
      
    • Coremond supports the usage of node selectors and node affinity to specify the nodes. For this, a Coremond pod should be scheduled in a Kubernetes cluster. By default, Coremond runs on all worker nodes.
      To run the pod on the worker-node node, configure both the nodeSelector and affinity as shown in the following example.

      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/hostname
                operator: In
                values:
                - worker-node
        nodeSelector:
          kubernetes.io/hostname: worker-node
      

    b. Optional settings:

    • To adjust the Log level setting in COREMON_LOG_LEVEL value, add the following in values.yaml file:

      env:  
      - name: COREMON_LOG_LEVEL
      value: "debug"
      
    • Coremond requires a PV with RWX access and if the default storage class does not support that, it may cause the Coremond to remain pending. To avoid this, override the storageClass parameter with RWX through values.yaml file.

      Following is an example to override the file:

      persistence:
        accessMode: ReadWriteMany
        storageClass: your-rwx
      
    • To override the resources settings, specify the custom resources values in the values.yaml file as shown in the following example:

      resources: 
          limits:
          cpu: 100m
          memory: 128Mi
          requests:
          cpu: 100m
          memory: 128Mi
      
    • To disable the qkview process, run the following command:

      f5_csm_qkview:
      enabled: false
      
    • To override the fluentbit_sidecar image settings, specify the custom image values as shown in the following example:

      fluentbit_sidecar:  
        image:  
            repository: repo.f5.com/images/f5-toda-docker
            name: f5-fluentbit
            tag: v
            pullPolicy: IfNotPresent
      
    • To override the fluentbit_sidecar resources settings, specify the custom resources values as shown in the following example:

      fluentbit_sidecar:  
        resources:  
          limits:
            cpu: "0.5"
            memory: "512Mi"
          requests:
            cpu: "0.25"
            memory: "256Mi"
      
    • To override the fluentbit_sidecar security context settings, specify the custom securityContext values as shown in the following example:

      fluentbit_sidecar:
        securityContext:
          allowPrivilegeEscalation: false
          # runAsUser: 10000
      
    • To override the fluentbit_sidecar additional settings, specify the custom fluentbit values as shown in the following example:

      fluentbit_sidecar:
        fluentbit:
          # Interval to flush output (seconds)
          flush_interval: 1
          # Error/warning/info/debug/trace
          logLevel: debug
          # Pipe reading parameters
          input:
            pipes:
              bufSize: 8096
              intervalSec: 1
              intervalNsec: 0
          tls:
            enabled: false
            # TLS debug verbosity level, values: 0 (No debug), 1 (Error), 2 (State change), 3 (Informational) and 4 (Verbose)
            debug: 1
            # Force certificate validation
            verify: Off
            # key string known by the remote Fluentd used for authorization.
            shared_key: f5-toda-shared-key
      fluentd:
        host: '127.0.0.1'
        port: 54321
      
    • To disable fluentbit_sidecar container, set the fluentbit_sidecar value to false in values.yaml file:

      fluentbit_sidecar:
        enabled: false
      

How to generate a core file

Following are the steps to generate a core file:

  1. Run the following command to get the list of pods.

    kubectl get pods 
    

    Sample output with the list of pods:

    NAME                                          READY   STATUS    RESTARTS   AGE
    client                                        1/1     Running   0          2m28s
    dssm-f5-dssm-db-0                             2/2     Running   0          2m26s
    dssm-f5-dssm-db-1                             2/2     Running   0          96s
    dssm-f5-dssm-sentinel-0                       2/2     Running   0          2m26s
    dssm-f5-dssm-sentinel-1                       2/2     Running   0          90s
    f5-cert-manager-84f857f786-gk6xq              1/1     Running   0          4m10s
    f5-cert-manager-cainjector-695866d7ff-m2h2g   1/1     Running   0          4m10s
    f5-cert-manager-webhook-8554fd5b58-xc89x      1/1     Running   0          4m10s
    f5-coremond-7gqfp                             2/2     Running   0          2m54s
    f5-crdconversion-7df678d8fc-2vplv             1/1     Running   0          2m51s
    f5-rabbit-f9c58487c-vhtw2                     1/1     Running   0          2m53s
    f5-spk-cwc-669f8c9dc-ptjb2                    2/2     Running   0          2m52s
    f5-tmm-7b685cd57c-lp7cl                       0/4     Pending   0          2m9s
    f5-tmm-7b685cd57c-rq92s                       4/4     Running   0          2m9s
    f5-toda-fluentd-6bc5cb8bfb-wqsvx              1/1     Running   0          2m11s
    f5-toda-observer-788ddcd596-6qjpg             2/2     Running   0          2m12s
    f5-toda-stats-77cb79c44d-4cn4x                2/2     Running   0          2m25s
    otel-collector-5f48b7ccf7-s6wx7               1/1     Running   0          2m9s
    router                                        2/2     Running   0          2m27s
    server                                        1/1     Running   0          2m28s
    spk-f5ingress-797bdbb59-zssd6                 4/4     Running   0          2m9s
    
  2. To get the process list, run the following command:

    kubectl exec <pod-name> -- ps aux
    

    Example:

    Defaulted container "f5-toda-observer" out of: f5-toda-observer, fluentbit
    USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    f5docker       1  0.0  0.0 711880  3024 ?        Ssl  09:45   0:00 /init
    f5docker      25  0.0  0.0   3024  1200 ?        S    09:45   0:00 s6-svscan -c30 -t0 /var/run/s6/services
    f5docker      27  0.0  0.0   3036  1264 ?        S    09:45   0:00 s6-supervise observer
    f5docker      28  0.0  0.0   3036  1268 ?        S    09:45   0:00 s6-supervise qkview-collect-daemon
    f5docker      29  1.2  0.3 1270624 49540 ?       Ssl  09:45   0:01 observer
    f5docker      30  0.0  0.0 1235736 9892 ?        Ssl  09:45   0:00 /usr/bin/qkview-collect-daemon
    f5docker     212  0.0  0.0   7072  1592 ?        Rs   09:47   0:00 ps aux
    
  3. To kill a process and generate the core dumps, run the following command:

    kubectl exec <pod-name> -- kill -11 <process-id> 
    

    Sample output:

    Defaulted container "f5-toda-observer" out of: f5-toda-observer, fluentbit
    

How to validate the core file

To verify that the core file is created, do the following:

  1. Run kubectl get pods command to get the Coremon pod name.

    Sample output:

      NAME                                          READY   STATUS    RESTARTS   AGE
      client                                        1/1     Running   0          2m28s
      dssm-f5-dssm-db-0                             2/2     Running   0          2m26s
      dssm-f5-dssm-db-1                             2/2     Running   0          96s
      dssm-f5-dssm-sentinel-0                       2/2     Running   0          2m26s
      dssm-f5-dssm-sentinel-1                       2/2     Running   0          90s
      f5-cert-manager-84f857f786-gk6xq              1/1     Running   0          4m10s
      f5-cert-manager-cainjector-695866d7ff-m2h2g   1/1     Running   0          4m10s
      f5-cert-manager-webhook-8554fd5b58-xc89x      1/1     Running   0          4m10s
      f5-coremond-7gqfp                             2/2     Running   0          2m54s
      f5-crdconversion-7df678d8fc-2vplv             1/1     Running   0          2m51s
      f5-rabbit-f9c58487c-vhtw2                     1/1     Running   0          2m53s
      f5-spk-cwc-669f8c9dc-ptjb2                    2/2     Running   0          2m52s
      f5-tmm-7b685cd57c-lp7cl                       0/4     Pending   0          2m9s
      f5-tmm-7b685cd57c-rq92s                       4/4     Running   0          2m9s
      f5-toda-fluentd-6bc5cb8bfb-wqsvx              1/1     Running   0          2m11s
      f5-toda-observer-788ddcd596-6qjpg             2/2     Running   0          2m12s
      f5-toda-stats-77cb79c44d-4cn4x                2/2     Running   0          2m25s
      otel-collector-5f48b7ccf7-s6wx7               1/1     Running   0          2m9s
      router                                        2/2     Running   0          2m27s
      server                                        1/1     Running   0          2m28s
      spk-f5ingress-797bdbb59-zssd6                 4/4     Running   0          2m9s
    
  2. To find the core file created, run the kubectl logs <coremon-pod> command.

    Sample output:

     Defaulted container "f5-coremond" out of: f5-coremond, fluentbit, init-coremond-dir (init)
     2024-09-16 09:44:37,954 CRIT Supervisor is running as root.  Privileges were not dropped because no user is specified in the config file.  If you intend to run as root, you can set user=root in the config file to avoid this message.
     2024-09-16 09:44:37,957 INFO supervisord started with pid 1
     2024-09-16 09:44:38,960 INFO spawned: 'coremond' with pid 13
     2024-09-16 09:44:38,962 INFO spawned: 'qkview-collect' with pid 14
     2024/09/16 09:44:38 INFO qkview-collect f5-log-ID=15216-000000 lt=A msg="Details: Client config details {Base:/etc/qkview-collect Overlay:/etc/qkview-collect/qkview-collect.config.yml GlobalTimeout:-1s LocalTimeout:-1s Outfile:/tmp/qkview.tar.gz PkgType:container MaxFileSize:25 RemovePrivateKeyFromFiles:true} base config file..."
     2024/09/16 09:44:38 INFO qkview-collect f5-log-ID=15216-000000 lt=A msg="Details: Environment details &{IsDevVersion:false HostMode:false TLSCABundle:/etc/ssl/certs/ca-root-cert.pem TLSCertificateFile:/etc/ssl/certs/server-cert.pem TLSKeyFile:/etc/ssl/certs/server-key.pem TLSCertRetryWait:5s SecureOnly:true UsingCertOrchestrator:true ContainerName:f5-coremond GrpcPort:19891 MaxFileSize:25 BaseCfgPath:/etc/qkview-collect ContainerOverlayPath:/etc/qkview-collect/qkview-collect.config.yml TotalCollectionTimeout:-1s IndividualCmdTimeout:-1s Outfile:/tmp/qkview.tar.gz PkgType:container RemovePrivateKeyFromFiles:true} base config file..."
     2024/09/16 09:44:38 INFO qkview-collect f5-log-ID=15216-000000 lt=A msg="Info: Starting GRPC server in secured mode"
     2024/09/16 09:44:38 INFO qkview-collect f5-log-ID=15216-000000 lt=A msg="Info: starting secure server"
     "ts"="2024-09-16 09:44:39.000"|"l"="error"|"m"="failed to read levels file /logs/.minlevel.yaml: open /logs/.minlevel.yaml: no such file or directory"|"lt"="A"|"pod"="f5-coremond-7gqfp"|"ct"="f5-coremond"|"cv"="v0.5.12"|"ns"="default"|"v"="1.0"
     "ts"="2024-09-16 09:44:39.000"|"l"="info"|"m"="coremond started"|"lt"="A"|"version"="0.5.12"|"commitHash"="22bb5c8"|"buildDate"="2024-08-27T20:57:25Z"|"pod"="f5-coremond-7gqfp"|"ct"="f5-coremond"|"cv"="v0.5.12"|"ns"="default"|"v"="1.0"
     "ts"="2024-09-16 09:44:39.207"|"l"="error"|"m"="no such file or directory"|"lt"="A"|"path"="/logs/.minlevel.yaml"|"pod"="f5-coremond-7gqfp"|"ct"="f5-coremond"|"cv"="v0.5.12"|"ns"="default"|"v"="1.0"
     2024-09-16 09:44:40,209 INFO success: coremond entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
     2024-09-16 09:44:40,209 INFO success: qkview-collect entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
     "ts"="2024-09-16 09:47:47.997"|"l"="info"|"m"="new core file detected"|"lt"="A"|"file"="/var/crash/core.observer.29.f5-toda-observer-788ddcd596-6qjpg.1726480067"|"pod"="f5-coremond-7gqfp"|"ct"="f5-coremond"|"cv"="v0.5.12"|"ns"="default"|"v"="1.0"
     "ts"="2024-09-16 09:47:48.014"|"l"="info"|"m"="creating coredump"|"lt"="A"|"src"="/var/crash/core.observer.29.f5-toda-observer-788ddcd596-6qjpg.1726480067"|"dst"="/var/cores/core.f5-toda-observer.f5-toda-observer.observer.29.1726480067"|"pod"="f5-coremond-7gqfp"|"ct"="f5-coremond"|"cv"="v0.5.12"|"ns"="default"|"v"="1.0"
    
  3. To validate the core file created by the operating system, run kubectl exec <coremon-pod> -- ls /var/crash command.

    Sample output:

    dev@datkube-devbox:~/ws/datkube$ kubectl exec f5-coremond-7gqfp -- ls /var/crash
    Defaulted container "f5-coremond" out of: f5-coremond, fluentbit, init-coremond-dir (init)
    core.observer.29.f5-toda-observer-788ddcd596-6qjpg.1726480067
    
  4. To validate the core file created by F5, run kubectl exec <coremon-pod> -- ls /var/cores command.

    Sample output:

    dev@datkube-devbox:~/ws/datkube$ kubectl exec f5-coremond-7gqfp -- ls /var/cores
    Defaulted container "f5-coremond" out of: f5-coremond, fluentbit, init-coremond-dir (init)
    core.f5-toda-observer.f5-toda-observer.observer.29.1726480067.gz
    core.f5-toda-observer.f5-toda-observer.observer.29.1726480067.gz.crc
    

Feedback

Provide feedback to improve this document by emailing cnfdocs@f5.com.