BIG-IP Next for Kubernetes Coremond

Overview

The F5 Coremond component runs as a DaemonSet on BIG-IP Next for Kubernetes. It is designed to monitor and collect kernel core files (core dumps) from processes that terminate unexpectedly. It then converts these core files into F5-specific core files for further analysis. A core file is a snapshot of a process or program’s memory and register state at the moment it crashes. These core files are crucial for performing root cause analysis as they provide detailed insights into the state of the system and the conditions leading up to the crash.

When a process unexpectedly stops, the OS will generate a core file in the pod volume /var/crash that is mapped to /home/crash/f5 the host machine, and then the Coremond will use the core file to create an F5 core file. This automated crash data collection helps engineers quickly diagnose issues, improving system stability and reliability.

Prerequisites

Ensure you have the following:

  1. A working Kubernetes cluster running on an Ubuntu OS platform.

  2. Apport service running to handle crash report.

  3. core_pattern is set correctly in /proc/sys/kernel/core_pattern the host file.

  4. platformType: other In values.yaml

  5. When Coremond is not compatible with the core_pattern, enable the Coremond init container to override the core_pattern. Refer Installation section.

Platform-Specific Core Patterns

  • Generic: Ubuntu-based platforms use Apport for crash reporting, and this pattern ensures that core dumps are handled correctly by Apport.

    |/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E                                                  
    
  • OCP: OCP uses systemd-coredump to capture and process core dumps. The pattern correctly passes the process ID (%P), user ID (%u), group ID (%g), signal (%s), timestamp (%t), and other relevant metadata.

    |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e
    
  • Robin: Robin.io uses a traditional file-based core dump storage format, where…

    /var/crash/core.%e.%p.%h.%t
    

Configure Rotation and Retention

This section outlines the environment variables used to configure the core file retention, rotation, and cleanup of Coremond. These variables allow you to manage retention durations, set file limits per process, and define rotation policies.

Environment Variable Default Value Description
COREMON_RETENTION_INTERVAL 5m Specifies the time frame to ignore additional core dumps from the same process once COREMON_CORES_MAX_FILES limit is reached.
COREMON_CORES_MAX_FILES 3 Specifies the maximum number of core files allowed for the same process. This parameter is used to prevent continuous crashes and rotations.
COREMON_RETENTION 0 Specifies the duration to keep core files before deletion. This also applies to the final core file copied to the volume. To disable the retention, set the value of this parameter to 0.
COREMON_CORES_INTERVAL 5m Specifies the interval or duration at which, Coremond schedules scanning and deletion of core files exceeding the COREMON_RETENTION period.
COREMON_DELETE_SRC true Specifies to delete source core files from the host path /home/crash/f5 generated by the kernel.
COREMON_ROTATE false Allows to replace old core files with the new ones, when COREMON_CORES_MAX_FILES limit is reached. This only occurs if COREMON_RETENTION_INTERVAL limit is elapsed and the Coremond continues processing core files for that process.

Coremond Installation

For the installation of Coremond, refer to the FLO section.

Follow the steps below to allow the Coremond init container override the core_pattern and enable core file generation in containers.

  • Configure core dump handling to override the kernel core_pattern with the desired pattern.

  • Apply SELinux labelling on the crash directory using chcon.

  • Adjust container security contexts via FLO:

    • If using FLO, set the Coremond container SecurityContext to privileged and allow privilege escalation.

Store core files on Host instead of PVCs

Coremond supports storing core files directly on the host directory instead of using Persistent Volumes (PVCs), eliminating the need for ReadWriteMany volumes and shared storage when multiple Coremond pods are deployed. By default, this option is disabled and PVs are used.

To change the default and store cores on the Host machine instead of PVs, enable the following in the Coremond advanced section of the CNEInstance CR:

# FLO: cneinstance CR
apiVersion: k8s.f5.com/v1
kind: CNEInstance
metadata:
  name: cneinstance-sample
# Enabling COREMOND_OVERRIDE_CORE_PATTERN applies:
# - core_pattern=/var/crash/core.%e.%p.%h.%t (written to /proc/sys/kernel/core_pattern)
# - chcon=-Rt container_file_t (executed on the crash directory)
# - initContainer securityContext: Privileged=true
# - coremond securityContext: ReadOnlyRootFilesystem=true, AllowPrivilegeEscalation=true, Privileged=true
spec:
	advanced:
  		coremon:
    		hostPath: true
    		env:
    		- name: "COREMOND_OVERRIDE_CORE_PATTERN"
      		value: "true" 

Generate a Core File

To generate a core file, follow these steps:

  1. Run the command to get the list of pods.

    kubectl get pods - A
    

    Sample Output

    NAMESPACE     NAME                                               READY   STATUS      RESTARTS      AGE
    default       cluster-cert-manager-57cc54f85-fst6r               1/1     Running     0             7m45s
    default       cluster-cert-manager-cainjector-575b969df5-x6rjz   1/1     Running     0             7m45s
    default       cluster-cert-manager-webhook-854c759f74-vhk9c      1/1     Running     0             7m45s
    default       f5-afm-648c784cb-c2zzv                             2/2     Running     0             6m57s
    default       f5-cne-controller-568c7cf87c-4tkns                 4/4     Running     0             6m57s
    default       f5-ipam-operator-7bc8dccd9-ktz2f                   1/1     Running     0             7m16s
    default       f5-node-labeler-bdz8v                              0/1     Init:0/1    0             6m58s
    default       f5-observer-0                                      2/2     Running     0             6m48s
    default       f5-observer-operator-59d46f69b7-fm6gw              2/2     Running     0             6m49s
    default       f5-observer-receiver-0                             2/2     Running     0             6m49s
    default       f5-tmm-pn4tq                                       7/7     Running     0             6m42s
    default       flo-f5-lifecycle-operator-58958c4f77-lmbs7         2/2     Running     0             7m16s
    default       otel-collector-5f8c9fbd8-dqxpt                     1/1     Running     0             6m42s
    f5-utils      f5-coremond-bnhf7                                  2/2     Running     0             4m51s
    f5-utils      f5-crdconversion-576f7b7579-4d5n2                  2/2     Running     0             6m47s
    f5-utils      f5-dssm-db-0                                       3/3     Running     0             6m45s
    f5-utils      f5-dssm-db-1                                       3/3     Running     0             5m26s
    f5-utils      f5-dssm-db-2                                       3/3     Running     0             4m50s
    f5-utils      f5-dssm-sentinel-0                                 3/3     Running     0             6m47s
    f5-utils      f5-dssm-sentinel-1                                 3/3     Running     0             5m16s
    f5-utils      f5-dssm-sentinel-2                                 0/3     Pending     0             4m40s
    f5-utils      f5-ipam-ctlr-595c467d8d-mfs58                      2/2     Running     0             6m45s
    f5-utils      f5-rabbit-6c7d56ddfb-87jnf                         2/2     Running     0             6m50s
    f5-utils      f5-spk-cwc-6f89988c86-5m56n                        3/3     Running     0             6m45s
    f5-utils      f5-toda-fluentd-bf845d465-sfm62                    1/1     Running     0             6m50s
    kube-system   coredns-ff8999cc5-4w2rc                            1/1     Running     0             7m46s
    kube-system   csi-nfs-controller-69dc5b4c8c-c56g8                5/5     Running     0             7m23s
    kube-system   csi-nfs-node-rdvsn                                 3/3     Running     0             7m23s
    kube-system   helm-install-multus-99k8d                          0/1     Completed   0             7m46s
    kube-system   local-path-provisioner-698b58967b-j22xd            1/1     Running     0             7m46s
    kube-system   metrics-server-8584b5786c-b4qkq                    1/1     Running     0             7m46s
    kube-system   multus-lkxdn                                       1/1     Running     0             7m32s
    
  2. Run the command to get the process list.

    kubectl exec f5-observer-0 -- ps aux
    

    Sample Output

    Defaulting container name to f5-observer.
    USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    f5docker       1  0.0  0.0 711880  3024 ?        Ssl  09:45   0:00 /init
    f5docker      25  0.0  0.0   3024  1200 ?        S    09:45   0:00 s6-svscan -c30 -t0 /var/run/s6/services
    f5docker      27  0.0  0.0   3036  1264 ?        S    09:45   0:00 s6-supervise observer
    f5docker      28  0.0  0.0   3036  1268 ?        S    09:45   0:00 s6-supervise qkview-collect-daemon
    f5docker      29  1.2  0.3 1270624 49540 ?       Ssl  09:45   0:01 observer
    f5docker      30  0.0  0.0 1235736 9892 ?        Ssl  09:45   0:00 /usr/bin/qkview-collect-daemon
    f5docker     212  0.0  0.0   7072  1592 ?        Rs   09:47   0:00 ps aux
    
  3. Run the command to kill a process and generate the core dumps.

    kubectl exec f5-observer-0 -- kill -11 9
    

    Sample Output

    Defaulting container name to f5-observer.
    

Validate the Core File

To verify the generated core file, follow the instructions below:

  1. Run the command to get the Coremond pod name.

    kubectl get pods - A
    

    Sample Output

    NAMESPACE     NAME                                               READY   STATUS      RESTARTS      AGE
    default       cluster-cert-manager-57cc54f85-fst6r               1/1     Running     0             7m45s
    default       cluster-cert-manager-cainjector-575b969df5-x6rjz   1/1     Running     0             7m45s
    default       cluster-cert-manager-webhook-854c759f74-vhk9c      1/1     Running     0             7m45s
    default       f5-afm-648c784cb-c2zzv                             2/2     Running     0             6m57s
    default       f5-cne-controller-568c7cf87c-4tkns                 4/4     Running     0             6m57s
    default       f5-ipam-operator-7bc8dccd9-ktz2f                   1/1     Running     0             7m16s
    default       f5-node-labeler-bdz8v                              0/1     Init:0/1    0             6m58s
    default       f5-observer-0                                      2/2     Running     0             6m48s
    default       f5-observer-operator-59d46f69b7-fm6gw              2/2     Running     0             6m49s
    default       f5-observer-receiver-0                             2/2     Running     0             6m49s
    default       f5-tmm-pn4tq                                       7/7     Running     0             6m42s
    default       flo-f5-lifecycle-operator-58958c4f77-lmbs7         2/2     Running     0             7m16s
    default       otel-collector-5f8c9fbd8-dqxpt                     1/1     Running     0             6m42s
    f5-utils      f5-coremond-bnhf7                                  2/2     Running     0             4m51s
    f5-utils      f5-crdconversion-576f7b7579-4d5n2                  2/2     Running     0             6m47s
    f5-utils      f5-dssm-db-0                                       3/3     Running     0             6m45s
    f5-utils      f5-dssm-db-1                                       3/3     Running     0             5m26s
    f5-utils      f5-dssm-db-2                                       3/3     Running     0             4m50s
    f5-utils      f5-dssm-sentinel-0                                 3/3     Running     0             6m47s
    f5-utils      f5-dssm-sentinel-1                                 3/3     Running     0             5m16s
    f5-utils      f5-dssm-sentinel-2                                 0/3     Pending     0             4m40s
    f5-utils      f5-ipam-ctlr-595c467d8d-mfs58                      2/2     Running     0             6m45s
    f5-utils      f5-rabbit-6c7d56ddfb-87jnf                         2/2     Running     0             6m50s
    f5-utils      f5-spk-cwc-6f89988c86-5m56n                        3/3     Running     0             6m45s
    f5-utils      f5-toda-fluentd-bf845d465-sfm62                    1/1     Running     0             6m50s
    kube-system   coredns-ff8999cc5-4w2rc                            1/1     Running     0             7m46s
    kube-system   csi-nfs-controller-69dc5b4c8c-c56g8                5/5     Running     0             7m23s
    kube-system   csi-nfs-node-rdvsn                                 3/3     Running     0             7m23s
    kube-system   helm-install-multus-99k8d                          0/1     Completed   0             7m46s
    kube-system   local-path-provisioner-698b58967b-j22xd            1/1     Running     0             7m46s
    kube-system   metrics-server-8584b5786c-b4qkq                    1/1     Running     0             7m46s
    kube-system   multus-lkxdn                                       1/1     Running     0             7m32s
    
  2. Run the command to find the core file created.

    kubectl -n f5-utils logs f5-coremond-bnhf7 -c f5-coremond
    

    Sample Output

    2025-03-23 14:01:35,840 CRIT Supervisor is running as root.  Privileges were not dropped because no user is specified in the config file.  If you intend to run as root, you can set user=root in the config file to avoid this message.
    2025-03-23 14:01:35,843 INFO supervisord started with pid 1
    2025-03-23 14:01:36,848 INFO spawned: 'coremond' with pid 7
    2025-03-23 14:01:36,852 INFO spawned: 'crashagent' with pid 8
    2025-03-23 14:01:36,860 INFO spawned: 'qkview-collect' with pid 9
    "ts"="2025-03-23 14:01:36.867"|"l"="info"|"m"="POD_NAME is not set; defaulting to hostname"|"lt"="A"|"proc"="crashagent"|"ct"="f5-coremond"|"v"="1.0"
    "ts"="2025-03-23 14:01:36.867"|"l"="info"|"m"="listen and serve"|"lt"="A"|"proc"="crashagent"|"addr"="/run/apport.socket"|"ct"="f5-coremond"|"v"="1.0"
    "ts"="2025-03-23 14:01:36.903"|"l"="error"|"m"="failed to read levels file /logs/.minlevel.yaml: open /logs/.minlevel.yaml: no such file or directory"|"lt"="A"|"ct"="f5-coremond"|"v"="1.0"
    "ts"="2025-03-23 14:01:36.906"|"l"="info"|"m"="coremond started"|"lt"="A"|"version"="0.7.27+0.0.6"|"commitHash"="359bc2a"|"buildDate"="2025-03-13T19:53:15Z"|"ct"="f5-coremond"|"v"="1.0"
    "ts"="2025-03-23 14:01:36.906"|"l"="info"|"m"="coremon dest"|"lt"="A"|"dest"="/var/cores/k3d-minibip-server-0"|"ct"="f5-coremond"|"v"="1.0"
    "ts"="2025-03-23 14:01:36.948"|"l"="error"|"m"="failed to read levels file /logs/.minlevel.yaml: open /logs/.minlevel.yaml: no such file or directory"|"lt"="A"|"ct"="f5-coremond"|"v"="1.0"
    "ts"="2025-03-23 14:01:36.953"|"l"="info"|"m"="grpc server is starting up"|"lt"="A"|"proc"="qkd"|"address"="0.0.0.0:19891"|"ct"="f5-coremond"|"v"="1.0"
    "ts"="2025-03-23 14:01:37.107"|"l"="error"|"m"="no such file or directory"|"lt"="A"|"path"="/logs/.minlevel.yaml"|"ct"="f5-coremond"|"v"="1.0"
    "ts"="2025-03-23 14:01:37.150"|"l"="error"|"m"="no such file or directory"|"lt"="A"|"proc"="qkd"|"path"="/logs/.minlevel.yaml"|"ct"="f5-coremond"|"v"="1.0"
    2025-03-23 14:01:38,152 INFO success: coremond entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
    2025-03-23 14:01:38,152 INFO success: crashagent entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
    2025-03-23 14:01:38,152 INFO success: qkview-collect entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
    "ts"="2025-03-23 14:03:35.923"|"l"="info"|"m"="new core file detected"|"lt"="A"|"file"="/var/crash/core.observer.9.f5-observer-0.1742738615173108860"|"ct"="f5-coremond"|"v"="1.0"
    "ts"="2025-03-23 14:03:35.950"|"l"="error"|"m"="failed to list pods"|"lt"="A"|"err"="pods is forbidden: User "system:serviceaccount:f5-utils:default" cannot list resource "pods" in API group "" at the cluster scope"|"ct"="f5-coremond"|"v"="1.0"
    "ts"="2025-03-23 14:03:35.950"|"l"="info"|"m"="creating coredump"|"lt"="A"|"src"="/var/crash/core.observer.9.f5-observer-0.1742738615173108860"|"dst"="/var/cores/k3d-minibip-server-0/core.f5-observer-0.f5-observer.observer.9.1742738615173108860"|"ct"="f5-coremond"|"v"="1.0"
    "ts"="2025-03-23 14:03:36.996"|"l"="info"|"m"="deleting src core file"|"lt"="A"|"src"="/var/crash/core.observer.9.f5-observer-0.1742738615173108860"|"dst"="/var/cores/k3d-minibip-server-0/core.f5-observer-0.f5-observer.observer.9.1742738615173108860"|"ct"="f5-coremond"|"v"="1.0"
    
  3. Run the command to validate the core file created by F5.

    kubectl -n f5-utils exec f5-coremond-bnhf7 -- ls /var/cores/k3d-minibip-server-0/
    

    Sample Output

    Defaulting container name to f5-coremond.
    Use 'kubectl describe pod/f5-coremond-bnhf7 -n f5-utils' to see all of the containers in this pod.
    core.f5-observer-0.f5-observer.observer.9.1742738615173108860.gz
    core.f5-observer-0.f5-observer.observer.9.1742738615173108860.gz.crc