SPKs Coremond¶
Overview¶
The F5 Coremond component runs as a DaemonSet on Service Proxy for Kubernetes (SPK). A Core file is a snapshot of the memory and register state of a process or a program when it terminates unexpectedly due to an uncertain or unexpected event that triggers default signal handling. Root-cause analysis can be performed on the core file. The core files are generated either by a third party or by the kernel itself.
Coremond monitors /var/crash folder mapped to a volume to detect updates to core files as the Coremond pod does not have access to the operating system. When Coremond starts, it reads the core_pattern from /proc/sys/kernel/ to decide if the configured core_pattern is supported.
SPK Openshift platform store all the core files generated in a single directory on the host at/var/lib/systemd/coredump file path. This directory is not created by default. You can create one or enable it through installation to store the core files. F5 recommends to enable the directory during installation.
Prerequisites¶
Ensure you have the following:
A working cluster with Openshift platform.
A linux based workstation
A core_pattern file located at
/proc/sys/kernel/core_pattern. Some of the supported core patterns are:By default, for Openshift platform, the core dump used by the system is
systemd-coredumpwith xz, lz4 or zst extension, such as (|/usr/lib/systemd/systemd-coredumps %P %u %g %s %t 9223372036854775808 %h)In Robin.io, the native Kernel must be /var/crash/core.%e.%p.%h.%t otherwise, an error is returned
Specifier Description %h/td>Hostname %e/td>Executable filename %p/td>pid of the process %t/td>UNIX time of dump
Note: F5 recommends to install the Coremond first before installing any other F5 components. This is suggested as if there are any other components installed prior to Coremond, they may generate the core files.
Configure Rotation and Retention¶
This section outlines the environment variables used to configure the core file retention, rotation, and cleanup of Coremond. These variables allow you to manage retention durations, set file limits per process, and define rotation policies.
| Environment Variable | Default Value | Description |
|---|---|---|
COREMON_RETENTION_INTERVAL |
5m | Specifies the time frame to ignore additional core dumps from the same process once COREMON_CORES_MAX_FILES limit is reached. |
COREMON_CORES_MAX_FILES |
3 | Specifies the maximum number of core files allowed for the same process. This parameter is used to prevent continuous crashes and rotations. |
COREMON_RETENTION |
0 | Specifies the duration to keep core files before deletion. This also applies to the final core file copied to the volume. To disable the retention, set the value of this parameter to 0. |
COREMON_CORES_INTERVAL |
5m | Specifies the interval or duration at which, Coremond schedules scanning and deletion of core files exceeding the COREMON_RETENTION period. |
COREMON_ROTATE |
false | Allows to replace old core files with the new ones, when COREMON_CORES_MAX_FILES limit is reached. This only occurs if COREMON_RETENTION_INTERVAL limit is elapsed and the Coremond continues processing core files for that process. |
Procedures¶
Platform-Specific Core Patterns¶
Generic: Ubuntu-based platforms use Apport for crash reporting, and this pattern ensures that core dumps are handled correctly by Apport.
|/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E
OCP: OCP uses systemd-coredump to capture and process core dumps. The pattern correctly passes the process ID (%P), user ID (%u), group ID (%g), signal (%s), timestamp (%t), and other relevant metadata.
|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e
Robin: Robin.io uses a traditional file-based core dump storage format, where…
/var/crash/core.%e.%p.%h.%t
Configure Rotation and Retention¶
This section outlines the environment variables used to configure the core file retention, rotation, and cleanup of Coremond. These variables allow you to manage retention durations, set file limits per process, and define rotation policies.
| Environment Variable | Default Value | Description |
|---|---|---|
COREMON_RETENTION_INTERVAL |
5m | Specifies the time frame to ignore additional core dumps from the same process once COREMON_CORES_MAX_FILES limit is reached. |
COREMON_CORES_MAX_FILES |
3 | Specifies the maximum number of core files allowed for the same process. This parameter is used to prevent continuous crashes and rotations. |
COREMON_RETENTION |
0 | Specifies the duration to keep core files before deletion. This also applies to the final core file copied to the volume. To disable the retention, set the value of this parameter to 0. |
COREMON_CORES_INTERVAL |
5m | Specifies the interval or duration at which, Coremond schedules scanning and deletion of core files exceeding the COREMON_RETENTION period. |
COREMON_DELETE_SRC |
true | Specifies to delete source core files from the host path /home/crash/f5 generated by the kernel. |
COREMON_ROTATE |
false | Allows to replace old core files with the new ones, when COREMON_CORES_MAX_FILES limit is reached. This only occurs if COREMON_RETENTION_INTERVAL limit is elapsed and the Coremond continues processing core files for that process. |
Procedures¶
Installation¶
Obtain the [TAG/Version] from the CNE 2.1.0 tarball.
Install the Coremond by using the following syntax on Openshift and Tanzu platforms:
helm install coremond tar/<helm-chart>.tgz \ -f <values>.yaml -n <project>
For example:
helm install coremond tar/coremond-0.7.27-10.0.14.tgz -n coremond
You can edit the
values.yamlfile as per usecase and requirement. Following are some of the mandatory and optional settings that can done by editing thevalues.yamlfile:a. Mandatory settings:
Override the image settings by specifying the custom image values:
image: repository: repo.f5.com/images/f5-toda-docker name: f5-coremond tag: v pullPolicy: IfNotPresent
Coremond supports the usage of node
selectorsand nodeaffinityto specify the nodes. For this, a Coremond pod should be scheduled in a Kubernetes cluster. By default, Coremond runs on all worker nodes.
To run the pod on theworker-nodenode, configure both thenodeSelectorandaffinityas shown in the following example.affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - worker-node nodeSelector: kubernetes.io/hostname: worker-node
b. Optional settings:
Coremond supports storing core files directly on the host directory instead of using Persistent Volumes (PVCs), eliminating the need for ReadWriteMany volumes and shared storage when multiple Coremond pods are deployed. By default, this option is disabled and PVs are used.
To change the default and store cores on the Host machine instead of PVs, set the following value to true in
values.yamlfile:useHostPath: true
To adjust the Log level setting in
COREMON_LOG_LEVELvalue, add the following invalues.yamlfile:env: - name: COREMON_LOG_LEVEL value: "debug"
Coremond requires a PV with RWX access and if the default storage class does not support that, it may cause the Coremond to remain pending. To avoid this, override the
storageClassparameter with RWX throughvalues.yamlfile.Following is an example to override the file:
persistence: accessMode: ReadWriteMany storageClass: your-rwx
To override the resources settings, specify the custom resources values in the
values.yamlfile as shown in the following example:resources: limits: cpu: 100m memory: 128Mi requests: cpu: 100m memory: 128Mi
To disable the
qkviewprocess, run the following command:f5_csm_qkview: enabled: false
To override the
fluentbit_sidecarimage settings, specify the custom image values as shown in the following example:fluentbit_sidecar: image: repository: repo.f5.com/images/f5-toda-docker name: f5-fluentbit tag: v pullPolicy: IfNotPresent
To override the
fluentbit_sidecarresources settings, specify the custom resources values as shown in the following example:fluentbit_sidecar: resources: limits: cpu: "0.5" memory: "512Mi" requests: cpu: "0.25" memory: "256Mi"
To override the
fluentbit_sidecarsecurity context settings, specify the customsecurityContextvalues as shown in the following example:fluentbit_sidecar: securityContext: allowPrivilegeEscalation: false # runAsUser: 10000
To override the
fluentbit_sidecaradditional settings, specify the custom fluentbit values as shown in the following example:fluentbit_sidecar: fluentbit: # Interval to flush output (seconds) flush_interval: 1 # Error/warning/info/debug/trace logLevel: debug # Pipe reading parameters input: pipes: bufSize: 8096 intervalSec: 1 intervalNsec: 0 tls: enabled: false # TLS debug verbosity level, values: 0 (No debug), 1 (Error), 2 (State change), 3 (Informational) and 4 (Verbose) debug: 1 # Force certificate validation verify: Off # key string known by the remote Fluentd used for authorization. shared_key: f5-toda-shared-key fluentd: host: '127.0.0.1' port: 54321
To disable
fluentbit_sidecarcontainer, set thefluentbit_sidecarvalue to false invalues.yamlfile:fluentbit_sidecar: enabled: false
Generate a Core File¶
To generate a core file, follow these steps:
Run the command to get the list of pods.
kubectl get pods - A
Sample output with the list of pods:
NAME READY STATUS RESTARTS AGE client 1/1 Running 0 2m28s dssm-f5-dssm-db-0 2/2 Running 0 2m26s dssm-f5-dssm-db-1 2/2 Running 0 96s dssm-f5-dssm-sentinel-0 2/2 Running 0 2m26s dssm-f5-dssm-sentinel-1 2/2 Running 0 90s f5-cert-manager-84f857f786-gk6xq 1/1 Running 0 4m10s f5-cert-manager-cainjector-695866d7ff-m2h2g 1/1 Running 0 4m10s f5-cert-manager-webhook-8554fd5b58-xc89x 1/1 Running 0 4m10s f5-coremond-7gqfp 2/2 Running 0 2m54s f5-crdconversion-7df678d8fc-2vplv 1/1 Running 0 2m51s f5-rabbit-f9c58487c-vhtw2 1/1 Running 0 2m53s f5-spk-cwc-669f8c9dc-ptjb2 2/2 Running 0 2m52s f5-tmm-7b685cd57c-lp7cl 0/4 Pending 0 2m9s f5-tmm-7b685cd57c-rq92s 4/4 Running 0 2m9s f5-toda-fluentd-6bc5cb8bfb-wqsvx 1/1 Running 0 2m11s f5-toda-observer-788ddcd596-6qjpg 2/2 Running 0 2m12s f5-toda-stats-77cb79c44d-4cn4x 2/2 Running 0 2m25s otel-collector-5f48b7ccf7-s6wx7 1/1 Running 0 2m9s router 2/2 Running 0 2m27s server 1/1 Running 0 2m28s spk-f5ingress-797bdbb59-zssd6 4/4 Running 0 2m9s
Run the command to get the process list.
kubectl exec <pod-name> -- ps aux
Sample Output
Defaulting container name to f5-observer. USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND f5docker 1 0.0 0.0 711880 3024 ? Ssl 09:45 0:00 /init f5docker 25 0.0 0.0 3024 1200 ? S 09:45 0:00 s6-svscan -c30 -t0 /var/run/s6/services f5docker 27 0.0 0.0 3036 1264 ? S 09:45 0:00 s6-supervise observer f5docker 28 0.0 0.0 3036 1268 ? S 09:45 0:00 s6-supervise qkview-collect-daemon f5docker 29 1.2 0.3 1270624 49540 ? Ssl 09:45 0:01 observer f5docker 30 0.0 0.0 1235736 9892 ? Ssl 09:45 0:00 /usr/bin/qkview-collect-daemon f5docker 212 0.0 0.0 7072 1592 ? Rs 09:47 0:00 ps aux
Run the command to kill a process and generate the core dumps.
kubectl exec -- kill -11 <process-id>
Sample Output
Defaulting container name to f5-observer out of: f5-toda-observer, fluentbit
Validate the Core File¶
To verify the generated core file, follow the instructions below:
Run the command to get the Coremond pod name.
kubectl get pods - A
Sample Output
NAME READY STATUS RESTARTS AGE client 1/1 Running 0 2m28s dssm-f5-dssm-db-0 2/2 Running 0 2m26s dssm-f5-dssm-db-1 2/2 Running 0 96s dssm-f5-dssm-sentinel-0 2/2 Running 0 2m26s dssm-f5-dssm-sentinel-1 2/2 Running 0 90s f5-cert-manager-84f857f786-gk6xq 1/1 Running 0 4m10s f5-cert-manager-cainjector-695866d7ff-m2h2g 1/1 Running 0 4m10s f5-cert-manager-webhook-8554fd5b58-xc89x 1/1 Running 0 4m10s f5-coremond-7gqfp 2/2 Running 0 2m54s f5-crdconversion-7df678d8fc-2vplv 1/1 Running 0 2m51s f5-rabbit-f9c58487c-vhtw2 1/1 Running 0 2m53s f5-spk-cwc-669f8c9dc-ptjb2 2/2 Running 0 2m52s f5-tmm-7b685cd57c-lp7cl 0/4 Pending 0 2m9s f5-tmm-7b685cd57c-rq92s 4/4 Running 0 2m9s f5-toda-fluentd-6bc5cb8bfb-wqsvx 1/1 Running 0 2m11s f5-toda-observer-788ddcd596-6qjpg 2/2 Running 0 2m12s f5-toda-stats-77cb79c44d-4cn4x 2/2 Running 0 2m25s otel-collector-5f48b7ccf7-s6wx7 1/1 Running 0 2m9s router 2/2 Running 0 2m27s server 1/1 Running 0 2m28s spk-f5ingress-797bdbb59-zssd6 4/4 Running 0 2m9s
Run the command to find the core file created.
kubectl -n f5-utils logs f5-coremond-bnhf7 -c f5-coremond
Sample Output
Defaulted container "f5-coremond" out of: f5-coremond, fluentbit, init-coremond-dir (init) 2024-09-16 09:44:37,954 CRIT Supervisor is running as root. Privileges were not dropped because no user is specified in the config file. If you intend to run as root, you can set user=root in the config file to avoid this message. 2024-09-16 09:44:37,957 INFO supervisord started with pid 1 2024-09-16 09:44:38,960 INFO spawned: 'coremond' with pid 13 2024-09-16 09:44:38,962 INFO spawned: 'qkview-collect' with pid 14 2024/09/16 09:44:38 INFO qkview-collect f5-log-ID=15216-000000 lt=A msg="Details: Client config details {Base:/etc/qkview-collect Overlay:/etc/qkview-collect/qkview-collect.config.yml GlobalTimeout:-1s LocalTimeout:-1s Outfile:/tmp/qkview.tar.gz PkgType:container MaxFileSize:25 RemovePrivateKeyFromFiles:true} base config file..." 2024/09/16 09:44:38 INFO qkview-collect f5-log-ID=15216-000000 lt=A msg="Details: Environment details &{IsDevVersion:false HostMode:false TLSCABundle:/etc/ssl/certs/ca-root-cert.pem TLSCertificateFile:/etc/ssl/certs/server-cert.pem TLSKeyFile:/etc/ssl/certs/server-key.pem TLSCertRetryWait:5s SecureOnly:true UsingCertOrchestrator:true ContainerName:f5-coremond GrpcPort:19891 MaxFileSize:25 BaseCfgPath:/etc/qkview-collect ContainerOverlayPath:/etc/qkview-collect/qkview-collect.config.yml TotalCollectionTimeout:-1s IndividualCmdTimeout:-1s Outfile:/tmp/qkview.tar.gz PkgType:container RemovePrivateKeyFromFiles:true} base config file..." 2024/09/16 09:44:38 INFO qkview-collect f5-log-ID=15216-000000 lt=A msg="Info: Starting GRPC server in secured mode" 2024/09/16 09:44:38 INFO qkview-collect f5-log-ID=15216-000000 lt=A msg="Info: starting secure server" "ts"="2024-09-16 09:44:39.000"|"l"="error"|"m"="failed to read levels file /logs/.minlevel.yaml: open /logs/.minlevel.yaml: no such file or directory"|"lt"="A"|"pod"="f5-coremond-7gqfp"|"ct"="f5-coremond"|"cv"="v0.5.12"|"ns"="default"|"v"="1.0" "ts"="2024-09-16 09:44:39.000"|"l"="info"|"m"="coremond started"|"lt"="A"|"version"="0.5.12"|"commitHash"="22bb5c8"|"buildDate"="2024-08-27T20:57:25Z"|"pod"="f5-coremond-7gqfp"|"ct"="f5-coremond"|"cv"="v0.5.12"|"ns"="default"|"v"="1.0" "ts"="2024-09-16 09:44:39.207"|"l"="error"|"m"="no such file or directory"|"lt"="A"|"path"="/logs/.minlevel.yaml"|"pod"="f5-coremond-7gqfp"|"ct"="f5-coremond"|"cv"="v0.5.12"|"ns"="default"|"v"="1.0" 2024-09-16 09:44:40,209 INFO success: coremond entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2024-09-16 09:44:40,209 INFO success: qkview-collect entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) "ts"="2024-09-16 09:47:47.997"|"l"="info"|"m"="new core file detected"|"lt"="A"|"file"="/var/crash/core.observer.29.f5-toda-observer-788ddcd596-6qjpg.1726480067"|"pod"="f5-coremond-7gqfp"|"ct"="f5-coremond"|"cv"="v0.5.12"|"ns"="default"|"v"="1.0" "ts"="2024-09-16 09:47:48.014"|"l"="info"|"m"="creating coredump"|"lt"="A"|"src"="/var/crash/core.observer.29.f5-toda-observer-788ddcd596-6qjpg.1726480067"|"dst"="/var/cores/core.f5-toda-observer.f5-toda-observer.observer.29.1726480067"|"pod"="f5-coremond-7gqfp"|"ct"="f5-coremond"|"cv"="v0.5.12"|"ns"="default"|"v"="1.0"
Run the command to validate the core file created by F5.
kubectl -n f5-utils exec <coremon-pod> -- ls /var/crash`
Sample Output
dev@datkube-devbox:~/ws/datkube$ oc exec f5-coremond-7gqfp -- ls /var/crash Defaulted container "f5-coremond" out of: f5-coremond, fluentbit, init-coremond-dir (init) core.observer.29.f5-toda-observer-788ddcd596-6qjpg.1726480067
To validate the core file created by F5, run
oc exec <coremon-pod> -- ls /var/corescommand.Sample output:
dev@datkube-devbox:~/ws/datkube$ oc exec f5-coremond-7gqfp -- ls /var/cores Defaulted container "f5-coremond" out of: f5-coremond, fluentbit, init-coremond-dir (init) core.f5-toda-observer.f5-toda-observer.observer.29.1726480067.gz core.f5-toda-observer.f5-toda-observer.observer.29.1726480067.gz.crc