CNFs Coremond¶
Overview¶
The Coremond pod runs as a DaemonSet on Cloud-Native Network Functions (CNFs) and collects corefiles. A Core file is a snapshot of the memory and register state of a process or a program when it terminates unexpectedly due to an uncertain or unexpected event that triggers default signal handling. Root-cause analysis can be performed on the core file. The core files are generated either by a third party or by the kernel itself.
Coremond monitors /var/crash folder mapped to a volume to detect updates to core files as the Coremond pod does not have access to the operating system. When Coremond starts, it reads the core_pattern from /proc/sys/kernel/ to decide if the configured core_pattern is supported.
CNFs Openshift platform store all the core files generated in a single directory on the host at /var/lib/systemd/coredump file path. This directory is not created by default. You can create one or enable it through installation to store the core files. F5 recommends to enable the directory during installation.
Prerequisites¶
Ensure you have the following:
A working cluster with Openshift/Robin/Tanzu platform.
A linux based workstation
A core_pattern file located at
/proc/sys/kernel/core_pattern. Some of the supported core patterns are:By default, for Openshift and Tanzu platforms, the core dump used by the system is
systemd-coredumpwith xz, lz4 or zst extension, such as (|/usr/lib/systemd/systemd-coredumps %P %u %g %s %t 9223372036854775808 %h)In Robin.io, the native Kernel must be /var/crash/core.%e.%p.%h.%t otherwise, an error is returned
Specifier Description %h/td>Hostname %e/td>Executable filename %p/td>pid of the process %t/td>UNIX time of dump
Note: F5 recommends to install the Coremond first before installing any other F5 components. This is suggested as if there are any other components installed prior to Coremond, they may generate the core files.
Configure Rotation and Retention¶
This section outlines the environment variables used to configure the core file retention, rotation, and cleanup of Coremond. These variables allow you to manage retention durations, set file limits per process, and define rotation policies.
| Environment Variable | Default Value | Description |
|---|---|---|
COREMON_RETENTION_INTERVAL |
5m | Specifies the time frame to ignore additional core dumps from the same process once COREMON_CORES_MAX_FILES limit is reached. |
COREMON_CORES_MAX_FILES |
3 | Specifies the maximum number of core files allowed for the same process. This parameter is used to prevent continuous crashes and rotations. |
COREMON_RETENTION |
0 | Specifies the duration to keep core files before deletion. This also applies to the final core file copied to the volume. To disable the retention, set the value of this parameter to 0. |
COREMON_CORES_INTERVAL |
5m | Specifies the interval or duration at which, Coremond schedules scanning and deletion of core files exceeding the COREMON_RETENTION period. |
COREMON_ROTATE |
false | Allows to replace old core files with the new ones, when COREMON_CORES_MAX_FILES limit is reached. This only occurs if COREMON_RETENTION_INTERVAL limit is elapsed and the Coremond continues processing core files for that process. |
Procedures¶
Installation¶
Obtain the [TAG/Version] from the CNFs 2.1.0 tarball.
Install the Coremond by using the following syntax on Openshift and Tanzu platforms:
helm install coremond tar/<helm-chart>.tgz \ -f <values>.yaml -n <project>
For example:
helm install coremond tar/coremond-0.7.56-0.0.5.tgz -n coremond
You can edit the
values.yamlfile as per usecase and requirement. Following are some of the mandatory and optional settings that can done by editing thevalues.yamlfile:a. Mandatory settings:
Override the image settings by specifying the custom image values:
image: repository: repo.f5.com/images/f5-toda-docker name: f5-coremond tag: v pullPolicy: IfNotPresent
Coremond supports the usage of node
selectorsand nodeaffinityto specify the nodes. For this, a Coremond pod should be scheduled in a Kubernetes cluster. By default, Coremond runs on all worker nodes.
To run the pod on theworker-nodenode, configure both thenodeSelectorandaffinityas shown in the following example.affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - worker-node nodeSelector: kubernetes.io/hostname: worker-node
b. Optional settings:
Coremond supports storing core files directly on the host directory instead of using Persistent Volumes (PVCs), eliminating the need for ReadWriteMany volumes and shared storage when multiple Coremond pods are deployed. By default, this option is disabled and PVs are used.
To change the default and store cores on the Host machine instead of PVs, set the following value to true in
values.yamlfile:useHostPath: true
To adjust the Log level setting in
COREMON_LOG_LEVELvalue, add the following invalues.yamlfile:env: - name: COREMON_LOG_LEVEL value: "debug"
Coremond requires a PV with RWX access and if the default storage class does not support that, it may cause the Coremond to remain pending. To avoid this, override the
storageClassparameter with RWX throughvalues.yamlfile.Following is an example to override the file:
persistence: accessMode: ReadWriteMany storageClass: your-rwx
To override the resources settings, specify the custom resources values in the
values.yamlfile as shown in the following example:resources: limits: cpu: 100m memory: 128Mi requests: cpu: 100m memory: 128Mi
To disable the
qkviewprocess, run the following command:f5_csm_qkview: enabled: false
To override the
fluentbit_sidecarimage settings, specify the custom image values as shown in the following example:fluentbit_sidecar: image: repository: repo.f5.com/images/f5-toda-docker name: f5-fluentbit tag: v pullPolicy: IfNotPresent
To override the
fluentbit_sidecarresources settings, specify the custom resources values as shown in the following example:fluentbit_sidecar: resources: limits: cpu: "0.5" memory: "512Mi" requests: cpu: "0.25" memory: "256Mi"
To override the
fluentbit_sidecarsecurity context settings, specify the customsecurityContextvalues as shown in the following example:fluentbit_sidecar: securityContext: allowPrivilegeEscalation: false # runAsUser: 10000
To override the
fluentbit_sidecaradditional settings, specify the custom fluentbit values as shown in the following example:fluentbit_sidecar: fluentbit: # Interval to flush output (seconds) flush_interval: 1 # Error/warning/info/debug/trace logLevel: debug # Pipe reading parameters input: pipes: bufSize: 8096 intervalSec: 1 intervalNsec: 0 tls: enabled: false # TLS debug verbosity level, values: 0 (No debug), 1 (Error), 2 (State change), 3 (Informational) and 4 (Verbose) debug: 1 # Force certificate validation verify: Off # key string known by the remote Fluentd used for authorization. shared_key: f5-toda-shared-key fluentd: host: '127.0.0.1' port: 54321
To disable
fluentbit_sidecarcontainer, set thefluentbit_sidecarvalue to false invalues.yamlfile:fluentbit_sidecar: enabled: false
How to generate a core file¶
Following are the steps to generate a core file:
Run the following command to get the list of pods.
oc get pods
Sample output with the list of pods:
NAME READY STATUS RESTARTS AGE client 1/1 Running 0 2m28s dssm-f5-dssm-db-0 2/2 Running 0 2m26s dssm-f5-dssm-db-1 2/2 Running 0 96s dssm-f5-dssm-sentinel-0 2/2 Running 0 2m26s dssm-f5-dssm-sentinel-1 2/2 Running 0 90s f5-cert-manager-84f857f786-gk6xq 1/1 Running 0 4m10s f5-cert-manager-cainjector-695866d7ff-m2h2g 1/1 Running 0 4m10s f5-cert-manager-webhook-8554fd5b58-xc89x 1/1 Running 0 4m10s f5-coremond-7gqfp 2/2 Running 0 2m54s f5-crdconversion-7df678d8fc-2vplv 1/1 Running 0 2m51s f5-rabbit-f9c58487c-vhtw2 1/1 Running 0 2m53s f5-spk-cwc-669f8c9dc-ptjb2 2/2 Running 0 2m52s f5-tmm-7b685cd57c-lp7cl 0/4 Pending 0 2m9s f5-tmm-7b685cd57c-rq92s 4/4 Running 0 2m9s f5-toda-fluentd-6bc5cb8bfb-wqsvx 1/1 Running 0 2m11s f5-toda-observer-788ddcd596-6qjpg 2/2 Running 0 2m12s f5-toda-stats-77cb79c44d-4cn4x 2/2 Running 0 2m25s otel-collector-5f48b7ccf7-s6wx7 1/1 Running 0 2m9s router 2/2 Running 0 2m27s server 1/1 Running 0 2m28s spk-f5ingress-797bdbb59-zssd6 4/4 Running 0 2m9s
To get the process list, run the following command:
oc exec <pod-name> -- ps aux
Example:
Defaulted container "f5-toda-observer" out of: f5-toda-observer, fluentbit USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND f5docker 1 0.0 0.0 711880 3024 ? Ssl 09:45 0:00 /init f5docker 25 0.0 0.0 3024 1200 ? S 09:45 0:00 s6-svscan -c30 -t0 /var/run/s6/services f5docker 27 0.0 0.0 3036 1264 ? S 09:45 0:00 s6-supervise observer f5docker 28 0.0 0.0 3036 1268 ? S 09:45 0:00 s6-supervise qkview-collect-daemon f5docker 29 1.2 0.3 1270624 49540 ? Ssl 09:45 0:01 observer f5docker 30 0.0 0.0 1235736 9892 ? Ssl 09:45 0:00 /usr/bin/qkview-collect-daemon f5docker 212 0.0 0.0 7072 1592 ? Rs 09:47 0:00 ps aux
To kill a process and generate the core dumps, run the following command:
oc exec <pod-name> -- kill -11 <process-id>
Sample output:
Defaulted container "f5-toda-observer" out of: f5-toda-observer, fluentbit
How to validate the core file¶
To verify that the core file is created, do the following:
Run
oc get podscommand to get the Coremon pod name.Sample output:
NAME READY STATUS RESTARTS AGE client 1/1 Running 0 2m28s dssm-f5-dssm-db-0 2/2 Running 0 2m26s dssm-f5-dssm-db-1 2/2 Running 0 96s dssm-f5-dssm-sentinel-0 2/2 Running 0 2m26s dssm-f5-dssm-sentinel-1 2/2 Running 0 90s f5-cert-manager-84f857f786-gk6xq 1/1 Running 0 4m10s f5-cert-manager-cainjector-695866d7ff-m2h2g 1/1 Running 0 4m10s f5-cert-manager-webhook-8554fd5b58-xc89x 1/1 Running 0 4m10s f5-coremond-7gqfp 2/2 Running 0 2m54s f5-crdconversion-7df678d8fc-2vplv 1/1 Running 0 2m51s f5-rabbit-f9c58487c-vhtw2 1/1 Running 0 2m53s f5-spk-cwc-669f8c9dc-ptjb2 2/2 Running 0 2m52s f5-tmm-7b685cd57c-lp7cl 0/4 Pending 0 2m9s f5-tmm-7b685cd57c-rq92s 4/4 Running 0 2m9s f5-toda-fluentd-6bc5cb8bfb-wqsvx 1/1 Running 0 2m11s f5-toda-observer-788ddcd596-6qjpg 2/2 Running 0 2m12s f5-toda-stats-77cb79c44d-4cn4x 2/2 Running 0 2m25s otel-collector-5f48b7ccf7-s6wx7 1/1 Running 0 2m9s router 2/2 Running 0 2m27s server 1/1 Running 0 2m28s spk-f5ingress-797bdbb59-zssd6 4/4 Running 0 2m9s
To find the core file created, run the
oc logs <coremon-pod>command.Sample output:
Defaulted container "f5-coremond" out of: f5-coremond, fluentbit, init-coremond-dir (init) 2024-09-16 09:44:37,954 CRIT Supervisor is running as root. Privileges were not dropped because no user is specified in the config file. If you intend to run as root, you can set user=root in the config file to avoid this message. 2024-09-16 09:44:37,957 INFO supervisord started with pid 1 2024-09-16 09:44:38,960 INFO spawned: 'coremond' with pid 13 2024-09-16 09:44:38,962 INFO spawned: 'qkview-collect' with pid 14 2024/09/16 09:44:38 INFO qkview-collect f5-log-ID=15216-000000 lt=A msg="Details: Client config details {Base:/etc/qkview-collect Overlay:/etc/qkview-collect/qkview-collect.config.yml GlobalTimeout:-1s LocalTimeout:-1s Outfile:/tmp/qkview.tar.gz PkgType:container MaxFileSize:25 RemovePrivateKeyFromFiles:true} base config file..." 2024/09/16 09:44:38 INFO qkview-collect f5-log-ID=15216-000000 lt=A msg="Details: Environment details &{IsDevVersion:false HostMode:false TLSCABundle:/etc/ssl/certs/ca-root-cert.pem TLSCertificateFile:/etc/ssl/certs/server-cert.pem TLSKeyFile:/etc/ssl/certs/server-key.pem TLSCertRetryWait:5s SecureOnly:true UsingCertOrchestrator:true ContainerName:f5-coremond GrpcPort:19891 MaxFileSize:25 BaseCfgPath:/etc/qkview-collect ContainerOverlayPath:/etc/qkview-collect/qkview-collect.config.yml TotalCollectionTimeout:-1s IndividualCmdTimeout:-1s Outfile:/tmp/qkview.tar.gz PkgType:container RemovePrivateKeyFromFiles:true} base config file..." 2024/09/16 09:44:38 INFO qkview-collect f5-log-ID=15216-000000 lt=A msg="Info: Starting GRPC server in secured mode" 2024/09/16 09:44:38 INFO qkview-collect f5-log-ID=15216-000000 lt=A msg="Info: starting secure server" "ts"="2024-09-16 09:44:39.000"|"l"="error"|"m"="failed to read levels file /logs/.minlevel.yaml: open /logs/.minlevel.yaml: no such file or directory"|"lt"="A"|"pod"="f5-coremond-7gqfp"|"ct"="f5-coremond"|"cv"="v0.5.12"|"ns"="default"|"v"="1.0" "ts"="2024-09-16 09:44:39.000"|"l"="info"|"m"="coremond started"|"lt"="A"|"version"="0.5.12"|"commitHash"="22bb5c8"|"buildDate"="2024-08-27T20:57:25Z"|"pod"="f5-coremond-7gqfp"|"ct"="f5-coremond"|"cv"="v0.5.12"|"ns"="default"|"v"="1.0" "ts"="2024-09-16 09:44:39.207"|"l"="error"|"m"="no such file or directory"|"lt"="A"|"path"="/logs/.minlevel.yaml"|"pod"="f5-coremond-7gqfp"|"ct"="f5-coremond"|"cv"="v0.5.12"|"ns"="default"|"v"="1.0" 2024-09-16 09:44:40,209 INFO success: coremond entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2024-09-16 09:44:40,209 INFO success: qkview-collect entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) "ts"="2024-09-16 09:47:47.997"|"l"="info"|"m"="new core file detected"|"lt"="A"|"file"="/var/crash/core.observer.29.f5-toda-observer-788ddcd596-6qjpg.1726480067"|"pod"="f5-coremond-7gqfp"|"ct"="f5-coremond"|"cv"="v0.5.12"|"ns"="default"|"v"="1.0" "ts"="2024-09-16 09:47:48.014"|"l"="info"|"m"="creating coredump"|"lt"="A"|"src"="/var/crash/core.observer.29.f5-toda-observer-788ddcd596-6qjpg.1726480067"|"dst"="/var/cores/core.f5-toda-observer.f5-toda-observer.observer.29.1726480067"|"pod"="f5-coremond-7gqfp"|"ct"="f5-coremond"|"cv"="v0.5.12"|"ns"="default"|"v"="1.0"
To validate the core file created by the operating system, run
oc exec <coremon-pod> -- ls /var/crashcommand.Sample output:
dev@datkube-devbox:~/ws/datkube$ oc exec f5-coremond-7gqfp -- ls /var/crash Defaulted container "f5-coremond" out of: f5-coremond, fluentbit, init-coremond-dir (init) core.observer.29.f5-toda-observer-788ddcd596-6qjpg.1726480067
To validate the core file created by F5, run
oc exec <coremon-pod> -- ls /var/corescommand.Sample output:
dev@datkube-devbox:~/ws/datkube$ oc exec f5-coremond-7gqfp -- ls /var/cores Defaulted container "f5-coremond" out of: f5-coremond, fluentbit, init-coremond-dir (init) core.f5-toda-observer.f5-toda-observer.observer.29.1726480067.gz core.f5-toda-observer.f5-toda-observer.observer.29.1726480067.gz.crc
Feedback
Provide feedback to improve this document by emailing cnfdocs@f5.com.