CNFs Coremond¶
Overview¶
The Coremond pod runs as a DaemonSet on Cloud-Native Network Functions (CNFs) and collects corefiles. A Core file is a snapshot of the memory and register state of a process or a program when it terminates unexpectedly due to an uncertain or unexpected event that triggers default signal handling. Root-cause analysis can be performed on the core file. The core files are generated either by a third party or by the kernel itself.
Coremond monitors /var/crash
folder mapped to a volume to detect updates to core files as the Coremond pod does not have access to the operating system. When Coremond starts, it reads the core_pattern
from /proc/sys/kernel/
to decide if the configured core_pattern
is supported.
CNFs Openshift platform store all the core files generated in a single directory on the host at /var/lib/systemd/coredump
file path. This directory is not created by default. You can create one or enable it through installation to store the core files. F5 recommends to enable the directory during installation.
To prevent the core files overload, Coremond rotates the core files for each process and keeps only the three latest files. Coremond also detects continuous process crashes within the same time frame and ignores coredump writing in such scenarios.
Prerequisites¶
Ensure you have the following:
- A working cluster with Openshift/Robin/Tanzu platform.
- A linux based workstation
- A core_pattern file located at
/proc/sys/kernel/core_pattern
. Some of the supported core patterns are:- By default, for Openshift and Tanzu platforms, the core dump used by the system is
systemd-coredump
with xz, lz4 or zst extension, such as (|/usr/lib/systemd/systemd-coredumps %P %u %g %s %t 9223372036854775808 %h) - In Robin.io, the native Kernel must be /var/crash/core.%e.%p.%h.%t otherwise, an error is returned
- By default, for Openshift and Tanzu platforms, the core dump used by the system is
| Specifier | Description |
|--------|------------|
| `%h` | Hostname |
| `%e` | Executable filename |
| `%p` | pid of the process |
| `%t` | UNIX time of dump |
Note: F5 recommends to install the Coremond first before installing any other F5 components. This is suggested as if there are any other components installed prior to Coremond, they may generate the core files.
Procedures¶
Installation¶
Obtain the [TAG/Version] from the CNFs 1.4.0 tarball.
Install the Coremond by using the following syntax on Openshift and Tanzu platforms:
helm install coremond tar/<helm-chart>.tgz \ -f <values>.yaml -n <project>
For example:
helm install coremond tar/coremond-0.5.12-0.1.7.tgz -n coremond
You can edit the
values.yaml
file as per usecase and requirement. Following are some of the mandatory and optional settings that can done by editing thevalues.yaml
file:a. Mandatory settings:
Override the image settings by specifying the custom image values:
image: repository: repo.f5.com/images/f5-toda-docker name: f5-coremond tag: v pullPolicy: IfNotPresent
Coremond supports the usage of node
selectors
and nodeaffinity
to specify the nodes. For this, a Coremond pod should be scheduled in a Kubernetes cluster. By default, Coremond runs on all worker nodes.
To run the pod on theworker-node
node, configure both thenodeSelector
andaffinity
as shown in the following example.affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - worker-node nodeSelector: kubernetes.io/hostname: worker-node
b. Optional settings:
To adjust the Log level setting in
COREMON_LOG_LEVEL
value, add the following invalues.yaml
file:env: - name: COREMON_LOG_LEVEL value: "debug"
Coremond requires a PV with RWX access and if the default storage class does not support that, it may cause the Coremond to remain pending. To avoid this, override the
storageClass
parameter with RWX throughvalues.yaml
file.Following is an example to override the file:
persistence: accessMode: ReadWriteMany storageClass: your-rwx
To override the resources settings, specify the custom resources values in the
values.yaml
file as shown in the following example:resources: limits: cpu: 100m memory: 128Mi requests: cpu: 100m memory: 128Mi
To disable the
qkview
process, run the following command:f5_csm_qkview: enabled: false
To override the
fluentbit_sidecar
image settings, specify the custom image values as shown in the following example:fluentbit_sidecar: image: repository: repo.f5.com/images/f5-toda-docker name: f5-fluentbit tag: v pullPolicy: IfNotPresent
To override the
fluentbit_sidecar
resources settings, specify the custom resources values as shown in the following example:fluentbit_sidecar: resources: limits: cpu: "0.5" memory: "512Mi" requests: cpu: "0.25" memory: "256Mi"
To override the
fluentbit_sidecar
security context settings, specify the customsecurityContext
values as shown in the following example:fluentbit_sidecar: securityContext: allowPrivilegeEscalation: false # runAsUser: 10000
To override the
fluentbit_sidecar
additional settings, specify the custom fluentbit values as shown in the following example:fluentbit_sidecar: fluentbit: # Interval to flush output (seconds) flush_interval: 1 # Error/warning/info/debug/trace logLevel: debug # Pipe reading parameters input: pipes: bufSize: 8096 intervalSec: 1 intervalNsec: 0 tls: enabled: false # TLS debug verbosity level, values: 0 (No debug), 1 (Error), 2 (State change), 3 (Informational) and 4 (Verbose) debug: 1 # Force certificate validation verify: Off # key string known by the remote Fluentd used for authorization. shared_key: f5-toda-shared-key fluentd: host: '127.0.0.1' port: 54321
To disable
fluentbit_sidecar
container, set thefluentbit_sidecar
value to false invalues.yaml
file:fluentbit_sidecar: enabled: false
How to generate a core file¶
Following are the steps to generate a core file:
Run the following command to get the list of pods.
oc get pods
Sample output with the list of pods:
NAME READY STATUS RESTARTS AGE client 1/1 Running 0 2m28s dssm-f5-dssm-db-0 2/2 Running 0 2m26s dssm-f5-dssm-db-1 2/2 Running 0 96s dssm-f5-dssm-sentinel-0 2/2 Running 0 2m26s dssm-f5-dssm-sentinel-1 2/2 Running 0 90s f5-cert-manager-84f857f786-gk6xq 1/1 Running 0 4m10s f5-cert-manager-cainjector-695866d7ff-m2h2g 1/1 Running 0 4m10s f5-cert-manager-webhook-8554fd5b58-xc89x 1/1 Running 0 4m10s f5-coremond-7gqfp 2/2 Running 0 2m54s f5-crdconversion-7df678d8fc-2vplv 1/1 Running 0 2m51s f5-rabbit-f9c58487c-vhtw2 1/1 Running 0 2m53s f5-spk-cwc-669f8c9dc-ptjb2 2/2 Running 0 2m52s f5-tmm-7b685cd57c-lp7cl 0/4 Pending 0 2m9s f5-tmm-7b685cd57c-rq92s 4/4 Running 0 2m9s f5-toda-fluentd-6bc5cb8bfb-wqsvx 1/1 Running 0 2m11s f5-toda-observer-788ddcd596-6qjpg 2/2 Running 0 2m12s f5-toda-stats-77cb79c44d-4cn4x 2/2 Running 0 2m25s otel-collector-5f48b7ccf7-s6wx7 1/1 Running 0 2m9s router 2/2 Running 0 2m27s server 1/1 Running 0 2m28s spk-f5ingress-797bdbb59-zssd6 4/4 Running 0 2m9s
To get the process list, run the following command:
oc exec <pod-name> -- ps aux
Example:
Defaulted container "f5-toda-observer" out of: f5-toda-observer, fluentbit USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND f5docker 1 0.0 0.0 711880 3024 ? Ssl 09:45 0:00 /init f5docker 25 0.0 0.0 3024 1200 ? S 09:45 0:00 s6-svscan -c30 -t0 /var/run/s6/services f5docker 27 0.0 0.0 3036 1264 ? S 09:45 0:00 s6-supervise observer f5docker 28 0.0 0.0 3036 1268 ? S 09:45 0:00 s6-supervise qkview-collect-daemon f5docker 29 1.2 0.3 1270624 49540 ? Ssl 09:45 0:01 observer f5docker 30 0.0 0.0 1235736 9892 ? Ssl 09:45 0:00 /usr/bin/qkview-collect-daemon f5docker 212 0.0 0.0 7072 1592 ? Rs 09:47 0:00 ps aux
To kill a process and generate the core dumps, run the following command:
oc exec <pod-name> -- kill -11 <process-id>
Sample output:
Defaulted container "f5-toda-observer" out of: f5-toda-observer, fluentbit
How to validate the core file¶
To verify that the core file is created, do the following:
Run
oc get pods
command to get the Coremon pod name.Sample output:
NAME READY STATUS RESTARTS AGE client 1/1 Running 0 2m28s dssm-f5-dssm-db-0 2/2 Running 0 2m26s dssm-f5-dssm-db-1 2/2 Running 0 96s dssm-f5-dssm-sentinel-0 2/2 Running 0 2m26s dssm-f5-dssm-sentinel-1 2/2 Running 0 90s f5-cert-manager-84f857f786-gk6xq 1/1 Running 0 4m10s f5-cert-manager-cainjector-695866d7ff-m2h2g 1/1 Running 0 4m10s f5-cert-manager-webhook-8554fd5b58-xc89x 1/1 Running 0 4m10s f5-coremond-7gqfp 2/2 Running 0 2m54s f5-crdconversion-7df678d8fc-2vplv 1/1 Running 0 2m51s f5-rabbit-f9c58487c-vhtw2 1/1 Running 0 2m53s f5-spk-cwc-669f8c9dc-ptjb2 2/2 Running 0 2m52s f5-tmm-7b685cd57c-lp7cl 0/4 Pending 0 2m9s f5-tmm-7b685cd57c-rq92s 4/4 Running 0 2m9s f5-toda-fluentd-6bc5cb8bfb-wqsvx 1/1 Running 0 2m11s f5-toda-observer-788ddcd596-6qjpg 2/2 Running 0 2m12s f5-toda-stats-77cb79c44d-4cn4x 2/2 Running 0 2m25s otel-collector-5f48b7ccf7-s6wx7 1/1 Running 0 2m9s router 2/2 Running 0 2m27s server 1/1 Running 0 2m28s spk-f5ingress-797bdbb59-zssd6 4/4 Running 0 2m9s
To find the core file created, run the
oc logs <coremon-pod>
command.Sample output:
Defaulted container "f5-coremond" out of: f5-coremond, fluentbit, init-coremond-dir (init) 2024-09-16 09:44:37,954 CRIT Supervisor is running as root. Privileges were not dropped because no user is specified in the config file. If you intend to run as root, you can set user=root in the config file to avoid this message. 2024-09-16 09:44:37,957 INFO supervisord started with pid 1 2024-09-16 09:44:38,960 INFO spawned: 'coremond' with pid 13 2024-09-16 09:44:38,962 INFO spawned: 'qkview-collect' with pid 14 2024/09/16 09:44:38 INFO qkview-collect f5-log-ID=15216-000000 lt=A msg="Details: Client config details {Base:/etc/qkview-collect Overlay:/etc/qkview-collect/qkview-collect.config.yml GlobalTimeout:-1s LocalTimeout:-1s Outfile:/tmp/qkview.tar.gz PkgType:container MaxFileSize:25 RemovePrivateKeyFromFiles:true} base config file..." 2024/09/16 09:44:38 INFO qkview-collect f5-log-ID=15216-000000 lt=A msg="Details: Environment details &{IsDevVersion:false HostMode:false TLSCABundle:/etc/ssl/certs/ca-root-cert.pem TLSCertificateFile:/etc/ssl/certs/server-cert.pem TLSKeyFile:/etc/ssl/certs/server-key.pem TLSCertRetryWait:5s SecureOnly:true UsingCertOrchestrator:true ContainerName:f5-coremond GrpcPort:19891 MaxFileSize:25 BaseCfgPath:/etc/qkview-collect ContainerOverlayPath:/etc/qkview-collect/qkview-collect.config.yml TotalCollectionTimeout:-1s IndividualCmdTimeout:-1s Outfile:/tmp/qkview.tar.gz PkgType:container RemovePrivateKeyFromFiles:true} base config file..." 2024/09/16 09:44:38 INFO qkview-collect f5-log-ID=15216-000000 lt=A msg="Info: Starting GRPC server in secured mode" 2024/09/16 09:44:38 INFO qkview-collect f5-log-ID=15216-000000 lt=A msg="Info: starting secure server" "ts"="2024-09-16 09:44:39.000"|"l"="error"|"m"="failed to read levels file /logs/.minlevel.yaml: open /logs/.minlevel.yaml: no such file or directory"|"lt"="A"|"pod"="f5-coremond-7gqfp"|"ct"="f5-coremond"|"cv"="v0.5.12"|"ns"="default"|"v"="1.0" "ts"="2024-09-16 09:44:39.000"|"l"="info"|"m"="coremond started"|"lt"="A"|"version"="0.5.12"|"commitHash"="22bb5c8"|"buildDate"="2024-08-27T20:57:25Z"|"pod"="f5-coremond-7gqfp"|"ct"="f5-coremond"|"cv"="v0.5.12"|"ns"="default"|"v"="1.0" "ts"="2024-09-16 09:44:39.207"|"l"="error"|"m"="no such file or directory"|"lt"="A"|"path"="/logs/.minlevel.yaml"|"pod"="f5-coremond-7gqfp"|"ct"="f5-coremond"|"cv"="v0.5.12"|"ns"="default"|"v"="1.0" 2024-09-16 09:44:40,209 INFO success: coremond entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2024-09-16 09:44:40,209 INFO success: qkview-collect entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) "ts"="2024-09-16 09:47:47.997"|"l"="info"|"m"="new core file detected"|"lt"="A"|"file"="/var/crash/core.observer.29.f5-toda-observer-788ddcd596-6qjpg.1726480067"|"pod"="f5-coremond-7gqfp"|"ct"="f5-coremond"|"cv"="v0.5.12"|"ns"="default"|"v"="1.0" "ts"="2024-09-16 09:47:48.014"|"l"="info"|"m"="creating coredump"|"lt"="A"|"src"="/var/crash/core.observer.29.f5-toda-observer-788ddcd596-6qjpg.1726480067"|"dst"="/var/cores/core.f5-toda-observer.f5-toda-observer.observer.29.1726480067"|"pod"="f5-coremond-7gqfp"|"ct"="f5-coremond"|"cv"="v0.5.12"|"ns"="default"|"v"="1.0"
To validate the core file created by the operating system, run
oc exec <coremon-pod> -- ls /var/crash
command.Sample output:
dev@datkube-devbox:~/ws/datkube$ oc exec f5-coremond-7gqfp -- ls /var/crash Defaulted container "f5-coremond" out of: f5-coremond, fluentbit, init-coremond-dir (init) core.observer.29.f5-toda-observer-788ddcd596-6qjpg.1726480067
To validate the core file created by F5, run
oc exec <coremon-pod> -- ls /var/cores
command.Sample output:
dev@datkube-devbox:~/ws/datkube$ oc exec f5-coremond-7gqfp -- ls /var/cores Defaulted container "f5-coremond" out of: f5-coremond, fluentbit, init-coremond-dir (init) core.f5-toda-observer.f5-toda-observer.observer.29.1726480067.gz core.f5-toda-observer.f5-toda-observer.observer.29.1726480067.gz.crc
Feedback
Provide feedback to improve this document by emailing cnfdocs@f5.com.