BIG-IP Next for Kubernetes Coremond¶
Overview¶
The F5 Coremond component runs as a DaemonSet on BIG-IP Next for Kubernetes. It is designed to monitor and collect kernel core files (core dumps) from processes that terminate unexpectedly. It then converts these core files into F5-specific core files for further analysis. A core file is a snapshot of a process or program’s memory and register state at the moment it crashes. These core files are crucial for performing root cause analysis as they provide detailed insights into the state of the system and the conditions leading up to the crash.
When a process unexpectedly stops, the OS will generate a core file in the pod volume /var/crash that is mapped to /home/crash/f5 the host machine, and then the Coremond will use the core file to create an F5 core file. This automated crash data collection helps engineers quickly diagnose issues, improving system stability and reliability.
Prerequisites¶
Ensure you have the following:
A working Kubernetes cluster running on an Ubuntu OS platform.
Apport service running to handle crash report.
core_pattern is set correctly in
/proc/sys/kernel/core_patternthe host file.platformType: otherIn values.yamlWhen Coremond is not compatible with the core_pattern, enable the Coremond init container to override the core_pattern. Refer Installation section.
Platform-Specific Core Patterns¶
Generic: Ubuntu-based platforms use Apport for crash reporting, and this pattern ensures that core dumps are handled correctly by Apport.
|/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E
OCP: OCP uses systemd-coredump to capture and process core dumps. The pattern correctly passes the process ID (%P), user ID (%u), group ID (%g), signal (%s), timestamp (%t), and other relevant metadata.
|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e
Robin: Robin.io uses a traditional file-based core dump storage format, where…
/var/crash/core.%e.%p.%h.%t
Configure Rotation and Retention¶
This section outlines the environment variables used to configure the core file retention, rotation, and cleanup of Coremond. These variables allow you to manage retention durations, set file limits per process, and define rotation policies.
| Environment Variable | Default Value | Description |
|---|---|---|
COREMON_RETENTION_INTERVAL |
5m | Specifies the time frame to ignore additional core dumps from the same process once COREMON_CORES_MAX_FILES limit is reached. |
COREMON_CORES_MAX_FILES |
3 | Specifies the maximum number of core files allowed for the same process. This parameter is used to prevent continuous crashes and rotations. |
COREMON_RETENTION |
0 | Specifies the duration to keep core files before deletion. This also applies to the final core file copied to the volume. To disable the retention, set the value of this parameter to 0. |
COREMON_CORES_INTERVAL |
5m | Specifies the interval or duration at which, Coremond schedules scanning and deletion of core files exceeding the COREMON_RETENTION period. |
COREMON_DELETE_SRC |
true | Specifies to delete source core files from the host path /home/crash/f5 generated by the kernel. |
COREMON_ROTATE |
false | Allows to replace old core files with the new ones, when COREMON_CORES_MAX_FILES limit is reached. This only occurs if COREMON_RETENTION_INTERVAL limit is elapsed and the Coremond continues processing core files for that process. |
Coremond Installation¶
For the installation of Coremond, refer to the FLO section.
Follow the steps below to allow the Coremond init container override the core_pattern and enable core file generation in containers.
Configure core dump handling to override the kernel core_pattern with the desired pattern.
Apply SELinux labelling on the crash directory using chcon.
Adjust container security contexts via FLO:
If using FLO, set the Coremond container SecurityContext to privileged and allow privilege escalation.
Store core files on Host instead of PVCs¶
Coremond supports storing core files directly on the host directory instead of using Persistent Volumes (PVCs), eliminating the need for ReadWriteMany volumes and shared storage when multiple Coremond pods are deployed. By default, this option is disabled and PVs are used.
To change the default and store cores on the Host machine instead of PVs, enable the following in the Coremond advanced section of the CNEInstance CR:
# FLO: cneinstance CR
apiVersion: k8s.f5.com/v1
kind: CNEInstance
metadata:
name: cneinstance-sample
# Enabling COREMOND_OVERRIDE_CORE_PATTERN applies:
# - core_pattern=/var/crash/core.%e.%p.%h.%t (written to /proc/sys/kernel/core_pattern)
# - chcon=-Rt container_file_t (executed on the crash directory)
# - initContainer securityContext: Privileged=true
# - coremond securityContext: ReadOnlyRootFilesystem=true, AllowPrivilegeEscalation=true, Privileged=true
spec:
advanced:
coremon:
hostPath: true
env:
- name: "COREMOND_OVERRIDE_CORE_PATTERN"
value: "true"
Generate a Core File¶
To generate a core file, follow these steps:
Run the command to get the list of pods.
kubectl get pods - A
Sample Output
NAMESPACE NAME READY STATUS RESTARTS AGE default cluster-cert-manager-57cc54f85-fst6r 1/1 Running 0 7m45s default cluster-cert-manager-cainjector-575b969df5-x6rjz 1/1 Running 0 7m45s default cluster-cert-manager-webhook-854c759f74-vhk9c 1/1 Running 0 7m45s default f5-afm-648c784cb-c2zzv 2/2 Running 0 6m57s default f5-cne-controller-568c7cf87c-4tkns 4/4 Running 0 6m57s default f5-ipam-operator-7bc8dccd9-ktz2f 1/1 Running 0 7m16s default f5-node-labeler-bdz8v 0/1 Init:0/1 0 6m58s default f5-observer-0 2/2 Running 0 6m48s default f5-observer-operator-59d46f69b7-fm6gw 2/2 Running 0 6m49s default f5-observer-receiver-0 2/2 Running 0 6m49s default f5-tmm-pn4tq 7/7 Running 0 6m42s default flo-f5-lifecycle-operator-58958c4f77-lmbs7 2/2 Running 0 7m16s default otel-collector-5f8c9fbd8-dqxpt 1/1 Running 0 6m42s f5-utils f5-coremond-bnhf7 2/2 Running 0 4m51s f5-utils f5-crdconversion-576f7b7579-4d5n2 2/2 Running 0 6m47s f5-utils f5-dssm-db-0 3/3 Running 0 6m45s f5-utils f5-dssm-db-1 3/3 Running 0 5m26s f5-utils f5-dssm-db-2 3/3 Running 0 4m50s f5-utils f5-dssm-sentinel-0 3/3 Running 0 6m47s f5-utils f5-dssm-sentinel-1 3/3 Running 0 5m16s f5-utils f5-dssm-sentinel-2 0/3 Pending 0 4m40s f5-utils f5-ipam-ctlr-595c467d8d-mfs58 2/2 Running 0 6m45s f5-utils f5-rabbit-6c7d56ddfb-87jnf 2/2 Running 0 6m50s f5-utils f5-spk-cwc-6f89988c86-5m56n 3/3 Running 0 6m45s f5-utils f5-toda-fluentd-bf845d465-sfm62 1/1 Running 0 6m50s kube-system coredns-ff8999cc5-4w2rc 1/1 Running 0 7m46s kube-system csi-nfs-controller-69dc5b4c8c-c56g8 5/5 Running 0 7m23s kube-system csi-nfs-node-rdvsn 3/3 Running 0 7m23s kube-system helm-install-multus-99k8d 0/1 Completed 0 7m46s kube-system local-path-provisioner-698b58967b-j22xd 1/1 Running 0 7m46s kube-system metrics-server-8584b5786c-b4qkq 1/1 Running 0 7m46s kube-system multus-lkxdn 1/1 Running 0 7m32s
Run the command to get the process list.
kubectl exec f5-observer-0 -- ps aux
Sample Output
Defaulting container name to f5-observer. USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND f5docker 1 0.0 0.0 711880 3024 ? Ssl 09:45 0:00 /init f5docker 25 0.0 0.0 3024 1200 ? S 09:45 0:00 s6-svscan -c30 -t0 /var/run/s6/services f5docker 27 0.0 0.0 3036 1264 ? S 09:45 0:00 s6-supervise observer f5docker 28 0.0 0.0 3036 1268 ? S 09:45 0:00 s6-supervise qkview-collect-daemon f5docker 29 1.2 0.3 1270624 49540 ? Ssl 09:45 0:01 observer f5docker 30 0.0 0.0 1235736 9892 ? Ssl 09:45 0:00 /usr/bin/qkview-collect-daemon f5docker 212 0.0 0.0 7072 1592 ? Rs 09:47 0:00 ps aux
Run the command to kill a process and generate the core dumps.
kubectl exec f5-observer-0 -- kill -11 9
Sample Output
Defaulting container name to f5-observer.
Validate the Core File¶
To verify the generated core file, follow the instructions below:
Run the command to get the Coremond pod name.
kubectl get pods - A
Sample Output
NAMESPACE NAME READY STATUS RESTARTS AGE default cluster-cert-manager-57cc54f85-fst6r 1/1 Running 0 7m45s default cluster-cert-manager-cainjector-575b969df5-x6rjz 1/1 Running 0 7m45s default cluster-cert-manager-webhook-854c759f74-vhk9c 1/1 Running 0 7m45s default f5-afm-648c784cb-c2zzv 2/2 Running 0 6m57s default f5-cne-controller-568c7cf87c-4tkns 4/4 Running 0 6m57s default f5-ipam-operator-7bc8dccd9-ktz2f 1/1 Running 0 7m16s default f5-node-labeler-bdz8v 0/1 Init:0/1 0 6m58s default f5-observer-0 2/2 Running 0 6m48s default f5-observer-operator-59d46f69b7-fm6gw 2/2 Running 0 6m49s default f5-observer-receiver-0 2/2 Running 0 6m49s default f5-tmm-pn4tq 7/7 Running 0 6m42s default flo-f5-lifecycle-operator-58958c4f77-lmbs7 2/2 Running 0 7m16s default otel-collector-5f8c9fbd8-dqxpt 1/1 Running 0 6m42s f5-utils f5-coremond-bnhf7 2/2 Running 0 4m51s f5-utils f5-crdconversion-576f7b7579-4d5n2 2/2 Running 0 6m47s f5-utils f5-dssm-db-0 3/3 Running 0 6m45s f5-utils f5-dssm-db-1 3/3 Running 0 5m26s f5-utils f5-dssm-db-2 3/3 Running 0 4m50s f5-utils f5-dssm-sentinel-0 3/3 Running 0 6m47s f5-utils f5-dssm-sentinel-1 3/3 Running 0 5m16s f5-utils f5-dssm-sentinel-2 0/3 Pending 0 4m40s f5-utils f5-ipam-ctlr-595c467d8d-mfs58 2/2 Running 0 6m45s f5-utils f5-rabbit-6c7d56ddfb-87jnf 2/2 Running 0 6m50s f5-utils f5-spk-cwc-6f89988c86-5m56n 3/3 Running 0 6m45s f5-utils f5-toda-fluentd-bf845d465-sfm62 1/1 Running 0 6m50s kube-system coredns-ff8999cc5-4w2rc 1/1 Running 0 7m46s kube-system csi-nfs-controller-69dc5b4c8c-c56g8 5/5 Running 0 7m23s kube-system csi-nfs-node-rdvsn 3/3 Running 0 7m23s kube-system helm-install-multus-99k8d 0/1 Completed 0 7m46s kube-system local-path-provisioner-698b58967b-j22xd 1/1 Running 0 7m46s kube-system metrics-server-8584b5786c-b4qkq 1/1 Running 0 7m46s kube-system multus-lkxdn 1/1 Running 0 7m32s
Run the command to find the core file created.
kubectl -n f5-utils logs f5-coremond-bnhf7 -c f5-coremond
Sample Output
2025-03-23 14:01:35,840 CRIT Supervisor is running as root. Privileges were not dropped because no user is specified in the config file. If you intend to run as root, you can set user=root in the config file to avoid this message. 2025-03-23 14:01:35,843 INFO supervisord started with pid 1 2025-03-23 14:01:36,848 INFO spawned: 'coremond' with pid 7 2025-03-23 14:01:36,852 INFO spawned: 'crashagent' with pid 8 2025-03-23 14:01:36,860 INFO spawned: 'qkview-collect' with pid 9 "ts"="2025-03-23 14:01:36.867"|"l"="info"|"m"="POD_NAME is not set; defaulting to hostname"|"lt"="A"|"proc"="crashagent"|"ct"="f5-coremond"|"v"="1.0" "ts"="2025-03-23 14:01:36.867"|"l"="info"|"m"="listen and serve"|"lt"="A"|"proc"="crashagent"|"addr"="/run/apport.socket"|"ct"="f5-coremond"|"v"="1.0" "ts"="2025-03-23 14:01:36.903"|"l"="error"|"m"="failed to read levels file /logs/.minlevel.yaml: open /logs/.minlevel.yaml: no such file or directory"|"lt"="A"|"ct"="f5-coremond"|"v"="1.0" "ts"="2025-03-23 14:01:36.906"|"l"="info"|"m"="coremond started"|"lt"="A"|"version"="0.7.27+0.0.6"|"commitHash"="359bc2a"|"buildDate"="2025-03-13T19:53:15Z"|"ct"="f5-coremond"|"v"="1.0" "ts"="2025-03-23 14:01:36.906"|"l"="info"|"m"="coremon dest"|"lt"="A"|"dest"="/var/cores/k3d-minibip-server-0"|"ct"="f5-coremond"|"v"="1.0" "ts"="2025-03-23 14:01:36.948"|"l"="error"|"m"="failed to read levels file /logs/.minlevel.yaml: open /logs/.minlevel.yaml: no such file or directory"|"lt"="A"|"ct"="f5-coremond"|"v"="1.0" "ts"="2025-03-23 14:01:36.953"|"l"="info"|"m"="grpc server is starting up"|"lt"="A"|"proc"="qkd"|"address"="0.0.0.0:19891"|"ct"="f5-coremond"|"v"="1.0" "ts"="2025-03-23 14:01:37.107"|"l"="error"|"m"="no such file or directory"|"lt"="A"|"path"="/logs/.minlevel.yaml"|"ct"="f5-coremond"|"v"="1.0" "ts"="2025-03-23 14:01:37.150"|"l"="error"|"m"="no such file or directory"|"lt"="A"|"proc"="qkd"|"path"="/logs/.minlevel.yaml"|"ct"="f5-coremond"|"v"="1.0" 2025-03-23 14:01:38,152 INFO success: coremond entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2025-03-23 14:01:38,152 INFO success: crashagent entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2025-03-23 14:01:38,152 INFO success: qkview-collect entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) "ts"="2025-03-23 14:03:35.923"|"l"="info"|"m"="new core file detected"|"lt"="A"|"file"="/var/crash/core.observer.9.f5-observer-0.1742738615173108860"|"ct"="f5-coremond"|"v"="1.0" "ts"="2025-03-23 14:03:35.950"|"l"="error"|"m"="failed to list pods"|"lt"="A"|"err"="pods is forbidden: User "system:serviceaccount:f5-utils:default" cannot list resource "pods" in API group "" at the cluster scope"|"ct"="f5-coremond"|"v"="1.0" "ts"="2025-03-23 14:03:35.950"|"l"="info"|"m"="creating coredump"|"lt"="A"|"src"="/var/crash/core.observer.9.f5-observer-0.1742738615173108860"|"dst"="/var/cores/k3d-minibip-server-0/core.f5-observer-0.f5-observer.observer.9.1742738615173108860"|"ct"="f5-coremond"|"v"="1.0" "ts"="2025-03-23 14:03:36.996"|"l"="info"|"m"="deleting src core file"|"lt"="A"|"src"="/var/crash/core.observer.9.f5-observer-0.1742738615173108860"|"dst"="/var/cores/k3d-minibip-server-0/core.f5-observer-0.f5-observer.observer.9.1742738615173108860"|"ct"="f5-coremond"|"v"="1.0"
Run the command to validate the core file created by F5.
kubectl -n f5-utils exec f5-coremond-bnhf7 -- ls /var/cores/k3d-minibip-server-0/
Sample Output
Defaulting container name to f5-coremond. Use 'kubectl describe pod/f5-coremond-bnhf7 -n f5-utils' to see all of the containers in this pod. core.f5-observer-0.f5-observer.observer.9.1742738615173108860.gz core.f5-observer-0.f5-observer.observer.9.1742738615173108860.gz.crc