BIG-IP Next for Kubernetes Coremond

Overview

The F5 Coremond component runs as a DaemonSet on BIG-IP Next for Kubernetes. It is designed to monitor and collect kernel core files (core dumps) from processes that terminate unexpectedly. It then converts these core files into F5-specific core files for further analysis. A core file is a snapshot of a process or program’s memory and register state at the moment it crashes. These core files are crucial for performing root cause analysis as they provide detailed insights into the state of the system and the conditions leading up to the crash.

When a process unexpectedly stops, the OS will generate a core file in the pod volume /var/crash that is mapped to /home/crash/f5 the host machine, and then the Coremond will use the core file to create an F5 core file. This automated crash data collection helps engineers quickly diagnose issues, improving system stability and reliability.

Prerequisites

Ensure you have the following:

  1. A working Kubernetes cluster running on an Ubuntu OS platform.

  2. Apport service running to handle crash report.

  3. core_pattern is set correctly in /proc/sys/kernel/core_pattern the host file.

  4. platformType: other In values.yaml

Platform-Specific Core Patterns

  • Generic: Ubuntu-based platforms use Apport for crash reporting, and this pattern ensures that core dumps are handled correctly by Apport.

    |/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E                                                  
    
  • OCP: OCP uses systemd-coredump to capture and process core dumps. The pattern correctly passes the process ID (%P), user ID (%u), group ID (%g), signal (%s), timestamp (%t), and other relevant metadata.

    |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e
    
  • Robin: Robin.io uses a traditional file-based core dump storage format, where…

    /var/crash/core.%e.%p.%h.%t
    

Coremond Installation

For the installation of Coremond, refer to the FLO section.

Generate a Core File

To generate a core file, follow these steps:

  1. Run the command to get the list of pods.

kubectl get pods - A

Sample Output

NAMESPACE     NAME                                               READY   STATUS      RESTARTS      AGE
default       cluster-cert-manager-57cc54f85-fst6r               1/1     Running     0             7m45s
default       cluster-cert-manager-cainjector-575b969df5-x6rjz   1/1     Running     0             7m45s
default       cluster-cert-manager-webhook-854c759f74-vhk9c      1/1     Running     0             7m45s
default       f5-afm-648c784cb-c2zzv                             2/2     Running     0             6m57s
default       f5-cne-controller-568c7cf87c-4tkns                 4/4     Running     0             6m57s
default       f5-ipam-operator-7bc8dccd9-ktz2f                   1/1     Running     0             7m16s
default       f5-node-labeler-bdz8v                              0/1     Init:0/1    0             6m58s
default       f5-observer-0                                      2/2     Running     0             6m48s
default       f5-observer-operator-59d46f69b7-fm6gw              2/2     Running     0             6m49s
default       f5-observer-receiver-0                             2/2     Running     0             6m49s
default       f5-tmm-pn4tq                                       7/7     Running     0             6m42s
default       flo-f5-lifecycle-operator-58958c4f77-lmbs7         2/2     Running     0             7m16s
default       otel-collector-5f8c9fbd8-dqxpt                     1/1     Running     0             6m42s
f5-utils      f5-coremond-bnhf7                                  2/2     Running     0             4m51s
f5-utils      f5-crdconversion-576f7b7579-4d5n2                  2/2     Running     0             6m47s
f5-utils      f5-dssm-db-0                                       3/3     Running     0             6m45s
f5-utils      f5-dssm-db-1                                       3/3     Running     0             5m26s
f5-utils      f5-dssm-db-2                                       3/3     Running     0             4m50s
f5-utils      f5-dssm-sentinel-0                                 3/3     Running     0             6m47s
f5-utils      f5-dssm-sentinel-1                                 3/3     Running     0             5m16s
f5-utils      f5-dssm-sentinel-2                                 0/3     Pending     0             4m40s
f5-utils      f5-ipam-ctlr-595c467d8d-mfs58                      2/2     Running     0             6m45s
f5-utils      f5-rabbit-6c7d56ddfb-87jnf                         2/2     Running     0             6m50s
f5-utils      f5-spk-cwc-6f89988c86-5m56n                        3/3     Running     0             6m45s
f5-utils      f5-toda-fluentd-bf845d465-sfm62                    1/1     Running     0             6m50s
kube-system   coredns-ff8999cc5-4w2rc                            1/1     Running     0             7m46s
kube-system   csi-nfs-controller-69dc5b4c8c-c56g8                5/5     Running     0             7m23s
kube-system   csi-nfs-node-rdvsn                                 3/3     Running     0             7m23s
kube-system   helm-install-multus-99k8d                          0/1     Completed   0             7m46s
kube-system   local-path-provisioner-698b58967b-j22xd            1/1     Running     0             7m46s
kube-system   metrics-server-8584b5786c-b4qkq                    1/1     Running     0             7m46s
kube-system   multus-lkxdn                                       1/1     Running     0             7m32s
  1. Run the command to get the process list.

kubectl exec f5-observer-0 -- ps aux

Sample Output

Defaulting container name to f5-observer.
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
f5docker       1  0.0  0.0 711880  3024 ?        Ssl  09:45   0:00 /init
f5docker      25  0.0  0.0   3024  1200 ?        S    09:45   0:00 s6-svscan -c30 -t0 /var/run/s6/services
f5docker      27  0.0  0.0   3036  1264 ?        S    09:45   0:00 s6-supervise observer
f5docker      28  0.0  0.0   3036  1268 ?        S    09:45   0:00 s6-supervise qkview-collect-daemon
f5docker      29  1.2  0.3 1270624 49540 ?       Ssl  09:45   0:01 observer
f5docker      30  0.0  0.0 1235736 9892 ?        Ssl  09:45   0:00 /usr/bin/qkview-collect-daemon
f5docker     212  0.0  0.0   7072  1592 ?        Rs   09:47   0:00 ps aux
  1. Run the command to kill a process and generate the core dumps.

kubectl exec f5-observer-0 -- kill -11 9

Sample Output

Defaulting container name to f5-observer.

Validate the Core File

To verify the generated core file, follow the instructions below:

  1. Run the command to get the Coremond pod name.

kubectl get pods - A

Sample Output

NAMESPACE     NAME                                               READY   STATUS      RESTARTS      AGE
default       cluster-cert-manager-57cc54f85-fst6r               1/1     Running     0             7m45s
default       cluster-cert-manager-cainjector-575b969df5-x6rjz   1/1     Running     0             7m45s
default       cluster-cert-manager-webhook-854c759f74-vhk9c      1/1     Running     0             7m45s
default       f5-afm-648c784cb-c2zzv                             2/2     Running     0             6m57s
default       f5-cne-controller-568c7cf87c-4tkns                 4/4     Running     0             6m57s
default       f5-ipam-operator-7bc8dccd9-ktz2f                   1/1     Running     0             7m16s
default       f5-node-labeler-bdz8v                              0/1     Init:0/1    0             6m58s
default       f5-observer-0                                      2/2     Running     0             6m48s
default       f5-observer-operator-59d46f69b7-fm6gw              2/2     Running     0             6m49s
default       f5-observer-receiver-0                             2/2     Running     0             6m49s
default       f5-tmm-pn4tq                                       7/7     Running     0             6m42s
default       flo-f5-lifecycle-operator-58958c4f77-lmbs7         2/2     Running     0             7m16s
default       otel-collector-5f8c9fbd8-dqxpt                     1/1     Running     0             6m42s
f5-utils      f5-coremond-bnhf7                                  2/2     Running     0             4m51s
f5-utils      f5-crdconversion-576f7b7579-4d5n2                  2/2     Running     0             6m47s
f5-utils      f5-dssm-db-0                                       3/3     Running     0             6m45s
f5-utils      f5-dssm-db-1                                       3/3     Running     0             5m26s
f5-utils      f5-dssm-db-2                                       3/3     Running     0             4m50s
f5-utils      f5-dssm-sentinel-0                                 3/3     Running     0             6m47s
f5-utils      f5-dssm-sentinel-1                                 3/3     Running     0             5m16s
f5-utils      f5-dssm-sentinel-2                                 0/3     Pending     0             4m40s
f5-utils      f5-ipam-ctlr-595c467d8d-mfs58                      2/2     Running     0             6m45s
f5-utils      f5-rabbit-6c7d56ddfb-87jnf                         2/2     Running     0             6m50s
f5-utils      f5-spk-cwc-6f89988c86-5m56n                        3/3     Running     0             6m45s
f5-utils      f5-toda-fluentd-bf845d465-sfm62                    1/1     Running     0             6m50s
kube-system   coredns-ff8999cc5-4w2rc                            1/1     Running     0             7m46s
kube-system   csi-nfs-controller-69dc5b4c8c-c56g8                5/5     Running     0             7m23s
kube-system   csi-nfs-node-rdvsn                                 3/3     Running     0             7m23s
kube-system   helm-install-multus-99k8d                          0/1     Completed   0             7m46s
kube-system   local-path-provisioner-698b58967b-j22xd            1/1     Running     0             7m46s
kube-system   metrics-server-8584b5786c-b4qkq                    1/1     Running     0             7m46s
kube-system   multus-lkxdn                                       1/1     Running     0             7m32s
  1. Run the command to find the core file created.

kubectl -n f5-utils logs f5-coremond-bnhf7 -c f5-coremond

Sample Output

2025-03-23 14:01:35,840 CRIT Supervisor is running as root.  Privileges were not dropped because no user is specified in the config file.  If you intend to run as root, you can set user=root in the config file to avoid this message.
2025-03-23 14:01:35,843 INFO supervisord started with pid 1
2025-03-23 14:01:36,848 INFO spawned: 'coremond' with pid 7
2025-03-23 14:01:36,852 INFO spawned: 'crashagent' with pid 8
2025-03-23 14:01:36,860 INFO spawned: 'qkview-collect' with pid 9
"ts"="2025-03-23 14:01:36.867"|"l"="info"|"m"="POD_NAME is not set; defaulting to hostname"|"lt"="A"|"proc"="crashagent"|"ct"="f5-coremond"|"v"="1.0"
"ts"="2025-03-23 14:01:36.867"|"l"="info"|"m"="listen and serve"|"lt"="A"|"proc"="crashagent"|"addr"="/run/apport.socket"|"ct"="f5-coremond"|"v"="1.0"
"ts"="2025-03-23 14:01:36.903"|"l"="error"|"m"="failed to read levels file /logs/.minlevel.yaml: open /logs/.minlevel.yaml: no such file or directory"|"lt"="A"|"ct"="f5-coremond"|"v"="1.0"
"ts"="2025-03-23 14:01:36.906"|"l"="info"|"m"="coremond started"|"lt"="A"|"version"="0.7.27+0.0.6"|"commitHash"="359bc2a"|"buildDate"="2025-03-13T19:53:15Z"|"ct"="f5-coremond"|"v"="1.0"
"ts"="2025-03-23 14:01:36.906"|"l"="info"|"m"="coremon dest"|"lt"="A"|"dest"="/var/cores/k3d-minibip-server-0"|"ct"="f5-coremond"|"v"="1.0"
"ts"="2025-03-23 14:01:36.948"|"l"="error"|"m"="failed to read levels file /logs/.minlevel.yaml: open /logs/.minlevel.yaml: no such file or directory"|"lt"="A"|"ct"="f5-coremond"|"v"="1.0"
"ts"="2025-03-23 14:01:36.953"|"l"="info"|"m"="grpc server is starting up"|"lt"="A"|"proc"="qkd"|"address"="0.0.0.0:19891"|"ct"="f5-coremond"|"v"="1.0"
"ts"="2025-03-23 14:01:37.107"|"l"="error"|"m"="no such file or directory"|"lt"="A"|"path"="/logs/.minlevel.yaml"|"ct"="f5-coremond"|"v"="1.0"
"ts"="2025-03-23 14:01:37.150"|"l"="error"|"m"="no such file or directory"|"lt"="A"|"proc"="qkd"|"path"="/logs/.minlevel.yaml"|"ct"="f5-coremond"|"v"="1.0"
2025-03-23 14:01:38,152 INFO success: coremond entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-03-23 14:01:38,152 INFO success: crashagent entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-03-23 14:01:38,152 INFO success: qkview-collect entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
"ts"="2025-03-23 14:03:35.923"|"l"="info"|"m"="new core file detected"|"lt"="A"|"file"="/var/crash/core.observer.9.f5-observer-0.1742738615173108860"|"ct"="f5-coremond"|"v"="1.0"
"ts"="2025-03-23 14:03:35.950"|"l"="error"|"m"="failed to list pods"|"lt"="A"|"err"="pods is forbidden: User "system:serviceaccount:f5-utils:default" cannot list resource "pods" in API group "" at the cluster scope"|"ct"="f5-coremond"|"v"="1.0"
"ts"="2025-03-23 14:03:35.950"|"l"="info"|"m"="creating coredump"|"lt"="A"|"src"="/var/crash/core.observer.9.f5-observer-0.1742738615173108860"|"dst"="/var/cores/k3d-minibip-server-0/core.f5-observer-0.f5-observer.observer.9.1742738615173108860"|"ct"="f5-coremond"|"v"="1.0"
"ts"="2025-03-23 14:03:36.996"|"l"="info"|"m"="deleting src core file"|"lt"="A"|"src"="/var/crash/core.observer.9.f5-observer-0.1742738615173108860"|"dst"="/var/cores/k3d-minibip-server-0/core.f5-observer-0.f5-observer.observer.9.1742738615173108860"|"ct"="f5-coremond"|"v"="1.0"
  1. Run the command to validate the core file created by F5.

kubectl -n f5-utils exec f5-coremond-bnhf7 -- ls /var/cores/k3d-minibip-server-0/

Sample Output

Defaulting container name to f5-coremond.
Use 'kubectl describe pod/f5-coremond-bnhf7 -n f5-utils' to see all of the containers in this pod.
core.f5-observer-0.f5-observer.observer.9.1742738615173108860.gz
core.f5-observer-0.f5-observer.observer.9.1742738615173108860.gz.crc