CNF Fixes and Known Issues¶
This list highlights fixes and known issues for this CNF release.
Version: 1.4.0
Build: 1.4.0
Note: This content is current as of the software release date
Updates to bug information occur periodically. For the most up-to-date bug data, see Bug Tracker.
Known Issues in CNF 1.4.0
CNF Fixes
ID Number | Severity | Links to More Info | Description |
1632313-1 | 1-Blocking | Virtual Servers Not Configured When FTP/Secure Context Config Includes Zero-Rating iRule for Scaled TMMs | |
1696993-1 | 1-Blocking | F5ingress Fails to Redeploy on a New Node After Current Node Failure | |
1644337-1 | 2-Critical | Redis connection must be included in the tmm liveness probe | |
1644333-1 | 2-Critical | TMM fails to reconnect to redis after current connection fails following five retries | |
1784829-1 | 2-Critical | Randomize port selection in CGNAT for SP-DAG cases (For NAPT mode) | |
1786321-1 | 2-Critical | When TMM is scaled down before being scaled up with a larger number of pods, the controller will not set up any new TMM pods | |
1596037-2 | 3-Major | QoS Class is showing 'Burstable' for tmm pods after enabling 'blobd' |
Fix details for CNF v1.4.0
1632313-1 : Virtual Servers Not Configured When FTP/Secure Context Config Includes Zero-Rating iRule for Scaled TMMs
Component: CNF
Symptoms:
When an FTP CR/Secure Context CR is configured with a zero-rating iRule, the existing TMMs receive the full configuration and virtual servers are created. However, when TMMs are scaled up, f5ingress sends a cached version of the configuration. Currently, f5ingress is not properly storing the FTP CR/Secure Context CR with the iRule configuration in its cache, leading to partial updates. As a result, when TMMs are scaled up, they are configured with the incomplete configuration.
Conditions:
This issue occurs when the following two conditions are met:
1. An FTP CR or Secure Context CR is configured with an iRule.
2. TMMs are scaled up.
Impact:
Due to the partial configuration sent to the scaled-up TMMs, the new TMMs are not properly configured and do not include the required virtual servers.
Workaround:
None.
Fix:
The issue with how f5ingress stores the existing configuration in its cache has been resolved, ensuring the correct configuration is sent to scaled TMMs.
1696993-1 : F5ingress Fails to Redeploy on a New Node After Current Node Failure
Component: CNF
Symptoms:
When the node hosting the f5ingress pod goes down or fails, the f5ingress pod encounters an error and fails to redeploy on another node.
Conditions:
The node hosting the f5ingress pod goes down. In the Kubernetes manager logs, the following message is observed for the f5ingress pod:
".env: duplicate entries for key [name=\"GRPC_SERVICE_NAME\"]"
Impact:
New configurations will not be applied to existing pods. Additionally, existing configurations will not be applied to newly deployed pods or pods that are restarted or scaled up.
Workaround:
None.
Fix:
Remove gRPC service name field from extraEnvs in TMM values file.
1786321-1 : When TMM is scaled down before being scaled up with a larger number of pods, the controller will not set up any new TMM pods.
Component: CNF
Symptoms:
When performing TMM scaling in a CNF/SPK deployment, if the number of replicas for TMM pods is reduced to 0 and then increased with a larger quantity, the controller is unable to set up new TMM pods.
Conditions:
TMM scaling down to 0, then followed by scaling up with a greater number of replicas configured for TMM pods.
Impact:
Traffic is impacted because the TMM pods are not configured.
Workaround:
None.
Fix:
Tmm-pod-manager is hardened in CNF with consistent way of sending TMM pod info to controller during scale events. Based on that, configuring new TMMs from controller is refactored and fixed.
1644337-1 : Redis connection must be included in the tmm liveness probe.
Component: CNF
Symptoms:
Once the connection between Redis and TMM is lost, it is necessary to check the logs for session sync errors in the TMM logs.
Conditions:
To reproduce the issue, Redis can be shut down (set replica to 0), causing TMM to continuously try to reconnect to Redis.
Impact:
TMM continues to accept and process requests even though there is no session sync between Redis and TMM. This could lead to inconsistent or incorrect processing of session data.
Workaround:
None.
Fix:
The Redis connection status has been included as part of the TMM liveness probe. If the connection fails and cannot be reestablished, the TMM is marked as unhealthy, and the TMM pod is deleted.
1596037-2 : QoS Class is showing 'Burstable' for tmm pods after enabling 'blobd'
Component: CNF
Symptoms:
Kubernetes assigns the Burstable QOS class to a POD when a container in the pod has more resource limit than the request value.
Conditions:
Burstable QoS Class: Kubernetes assigns the Burstable QOS class to a POD when a container in the pod has more resource limit than the request value.
Impact:
TMM POD QOS class is not guaranteed.
Workaround:
None.
Fix:
Resource Limits and Requests must be identical for all containers to get the Guaranteed Quality of Service for pods.
1644333-1 : TMM fails to reconnect to redis after current connection fails following five retries
Component: CNF
Symptoms:
When a connection to Redis is lost, TMM attempts to reconnect with up to five retries. If all retries fail, TMM gives up and will only retry upon receiving session requests.
Conditions:
1. Bring down the DSSM.
2. TMM attempts to connect to Redis with up to five retries.
3. If the connection fails, TMM gives up and will only retry when there is a session request at TMM.
Impact:
In the current implementation, if TMM's connection to Redis fails, it only attempts to reconnect when a new Session DB API call is made. Ideally, TMM should reconnect to Redis immediately, so Session DB callbacks (such as when Redis deletes an entry due to a timeout) can be handled correctly by TMM.
Workaround:
None.
Fix:
The MAX retry condition has been removed. With this fix, TMM will continuously retry to connect to Redis.
1784829-1 : Randomize port selection in CGNAT for SP-DAG cases (For NAPT mode)
Component: CNF
Symptoms:
The selected NAT translated endpoint prefers to use the same translated port as the client traffic's source port when cmp-hash in configured as SP-DAG.
Conditions:
-- NAT Policy configured in NAPT mode
-- VLAN cmp-hash is configured as SP-DAG (Disaggregation
based on source/destination IP address)
-- NAT policy attached to secure context
Impact:
There should not be any issue with the existing approach of preferring client source port during NAT endpoint selection in TMM.
For certain client applications using incremental source port in client traffic, TMM's endpoint selection approach will result in more frequent reuse of the endpoints when no other connection in TMM is using this endpoint.
This frequent reuse of endpoints expects other network devices along the path to close the connection if they were previously using this endpoint. If it does not occur, any new connections that reuse the NAT endpoint in TMM will fail.
Workaround:
None.
Fix:
Randomize the ports selected during NAT endpoint selection reducing the chance frequent reuse of endpoints.
Known Issues in CNF v1.4.0
CNF Issues
ID Number | Severity | Links to More Info | Description |
1642441-1 | 1-Blocking | NAT IPs become stale after restarting dssm-related containers | |
1691617-1 | 1-Blocking | IPS sends large gRPC messages to TMM causing failures in broadcasting the message to other threads | |
1696997-1 | 2-Critical | After an upgrade, the last line in ZebOS config is truncated | |
1783477-1 | 2-Critical | TMM pod restarts when the DNS custom resource is modified | |
1701349-1 | 2-Critical | Duplicate Environment Variable Settings Cause TMM Pod Rescheduling Failure | |
1782117-1 | 2-Critical | Configmap changes are not persisted following helm upgrade | |
1780305-1 | 2-Critical | Delay in configuring self IPs when multiple TMM pods fail at the same time. | |
1780329-1 | 2-Critical | CR status for SecureContext and ALG remains in False state when AFM pod is restarted during config application. | |
1780505-1 | 2-Critical | Creating or updating NAT policy may occasionally restart the TMM container | |
1616577-1 | 3-Major | NAT IPs become stale when all TMM pods go down | |
1753689-2 | 3-Major | iHealth dashboard displays incorrect platform and version information | |
1115593-1 | 3-Major | The TMM Proxy Pod is unable to process large files (~1GB) using the F5BigAlgPptp CR | |
1596181-1 | 3-Major | DNS cache traffic now egresses correctly via wildcard listeners configured with a SNAT pool when the protocol is set to "Any" to "Any" | |
1697309-1 | 3-Major | Default route incorrectly modified when F5BigNetStaticroute CR is applied with 'prefixlen' set to 0 | |
1711193-1 | 3-Major | Downgrade from CNF-1.4.0 to CNF-1.3.1 Not Supported | |
1574561-2 | 3-Major | The tmm-init ConfigMap Overwritten During Rolling Upgrade | |
1230749-2 | 3-Major | Warning Message Displayed When Applying a Modified Global-Context CR | |
1492301-2 | 3-Major | Increase in pool member statistics for some time after HSL pool members is down. | |
1591881-1 | 3-Major | Fluentd Flush Rate Remains Slow Despite Using Immediate Flush Mode | |
1578457-1 | 3-Major | Inconsistent 'imagePullSecret' Parameter in CNF Helm Charts | |
1567421-2 | 3-Major | Untagged VLANS are not currently supported | |
1556969-1 | 3-Major | BGP Dynamic Routing: Prefix list remains in TMM even after being removed from the ConfigMap | |
1783181-1 | 3-Major | CRs with multiple versions may not get applied after upgrade from CNF 1.3.2 Cert Manager to CNF 1.4.0 Open source Cert Manager. | |
1780241-1 | 3-Major | NAT IPs are becoming stale on rebooting all dssm pods | |
1785333-1 | 3-Major | When a virtual server is in EDENY state, the deletion of CR F5BigDnsApp may not be reflected in the debug container | |
1750317-1 | 3-Major | F5-tmm-pod-manager logs are not present in qkview output | |
1644033 | 3-Major | Initial CNF setup can be slower than expected★ |
Known Issue details for CNF v1.4.0
1642441-1 : NAT IPs become stale after restarting dssm-related containers
Component: CNF
Symptoms:
NAT IPs may appear as "stale" in the mrfdb output for NAT.
Conditions:
The dSSM containers are manually stopped, crash, or are stopped due to infrastructural issues, such as a worker node shutdown.
Impact:
These "stale" IPs can no longer be used for translation.
Workaround:
The only available workaround is to delete the NAT CR and recreate it. This will cause a temporary traffic disruption across all TMMs.
1644033 : Initial CNF setup can be slower than expected★
Component: CNF
Symptoms:
Initial CNF setup can be slower than expected.
Conditions:
When installing CNF, Kubernetes checks for secrets containing requirred certificates but cert-manager may still be processing the certificate creation. In addition, large container image sizes increase the image retrieval time, leading to a slower install.
Impact:
Initial CNF setup can be slower than expected, but functionality is not impacted once setup is complete.
Workaround:
None.
1701349-1 : Duplicate Environment Variable Settings Cause TMM Pod Rescheduling Failure
Component: CNF
Symptoms:
Duplicate environment variable settings with the same key exist in the TMM Helm chart. If the node running the TMM pod restarts or if the TMM pod needs to be rescheduled in the Robin cluster, the rescheduling fails due to the duplicate environment variable settings.
Conditions:
The TMM pod needs to be rescheduled in the Robin cluster.
Impact:
When the TMM pod needs to be rescheduled in the Robin cluster, the rescheduling fails due to duplicate environment variable settings with the same key.
Workaround:
None.
1782117-1 : Configmap changes are not persisted following helm upgrade
Component: CNF
Symptoms:
Helm upgrade will override manual changes made to config-maps, such as f5-otel-collector-conf or f5-toda-fluentd.
Conditions:
Upgrade Helm charts include a config-map which had manual changes applied to it.
Impact:
Manual changes are overridden / discarded.
Workaround:
The upgrade helm-charts/templates can be inspected and modified to include the manual changes/customizations before applying the upgrade.
1783477-1 : TMM pod restarts when the DNS custom resource is modified
Component: CNF
Symptoms:
TMM pod may restart when a DNS CR is modified.
Conditions:
* If ipProtocol value of DNS CR is set to UDP or TCP, and there is a protocolInspectionProfile attached.
* When the profile values are modified, it may cause a TMM crash.
Impact:
Traffic is disrupted when the TMM pod restarts.
Workaround:
Delete the DNS CR before applying any profile changes.
1780505-1 : Creating or updating NAT policy may occasionally restart the TMM container
Component: CNF
Symptoms:
Creating or updating NAT policy configuration may lead to TMM container restart due to SW exception.
Conditions:
Creating or updating NAT policy configuration.
Impact:
Whenever a TMM container restarts, all connections managed by the failed TMM are temporarily interrupted for a few seconds. It should be noted that this problem does not happen frequently and is quite rare.
Workaround:
None.
1753689-2 : iHealth dashboard displays incorrect platform and version information
Component: CNF
Symptoms:
In the iHealth dashboard (ihealth.f5.com/qkview-analyzer), the Platform field displays "Could not determine," and the Version field shows the default version (1.0.0) instead of the actual Platform and Version data.
Conditions:
iHealth is unable to discover the hostname and Platform information from the generated Qkview when using the CWC DebugAPI.
Impact:
You are unable to view the Platform and Version information in the iHealth Dashboard. To mitigate this, the required information will be shared via alternative communication methods, such as release announcements.
Workaround:
None
1115593-1 : The TMM Proxy Pod is unable to process large files (~1GB) using the F5BigAlgPptp CR
Component: CNF
Symptoms:
When clients try to download a large file over PPTP, the download may fail due to IP fragmentation errors.
Conditions:
The TMM Proxy Pod is configured to use a F5BigAlgPptp CR.
Client tries to download large files (~1GB) in size over PPTP.
The PPTP interface MTU is greater than 1450 bytes.
Impact:
PPTP file transfer fails.
Workaround:
Set the MTU of the PPTP interface to a value of 1450 or less.
ip link set dev ppp0 mtu 1450
1696997-1 : After an upgrade, the last line in ZebOS config is truncated
Component: CNF
Symptoms:
Last configuration line is invalid due to truncation.
Conditions:
Performing an upgrade that uses a values file for the ZebOS config
Impact:
The last line in the configuration is invalid.
Workaround:
The configuration is valid when configmap is used.
1596181-1 : DNS cache traffic now egresses correctly via wildcard listeners configured with a SNAT pool when the protocol is set to "Any" to "Any"
Component: CNF
Symptoms:
DNS query traffic generated by the DNS cache leaves the system using the self-IP address as the source, instead of an expected address from the SNAT pool.
Conditions:
1. The wildcard egress feature is enabled with the global option matchWildcardVip set to true, which directs the DNS cache implementation to attempt using a wildcard listener for outgoing traffic.
2. An eligible wildcard listener is configured, but its ipProtocol property is set to any instead of explicitly being set to tcp or udp.
Impact:
DNS query traffic from the system was egressing with an unintended source IP address, potentially causing routing issues or misalignment with expected network configurations.
Workaround:
Set up two separate wildcard listeners—one configured for UDP and the other for TCP—to ensure proper egress traffic handling for both protocols.
1697309-1 : Default route incorrectly modified when F5BigNetStaticroute CR is applied with 'prefixlen' set to 0
Component: CNF
Symptoms:
When a F5BigNetStaticroute CR is configured with a 'prefixLen' set to '0', the next hop of the default route in TMM is incorrectly modified.
Conditions:
F5BigNetStaticroute CR configured with a 'prefixLen' set to '0
Impact:
Traffic may be impacted for routes that are incorrectly updated.
Workaround:
Avoid configuring the 'prefixLen' as '0' when using the F5BigNetStaticroute CR.
1750317-1 : F5-tmm-pod-manager logs are not present in qkview output
Component: CNF
Symptoms:
F5-tmm-pod-manager logs are not present in qkview output.
Conditions:
Creation of a qkview report will not include log output from the f5-tmm-pod-manager container from the f5ingress pod.
Impact:
F5-tmm-pod-manager logs are not present in qkview output.
Workaround:
F5-tmm-pod-manager logs can still be viewed using the kubectl logs command.
1711193-1 : Downgrade from CNF-1.4.0 to CNF-1.3.1 Not Supported
Component: CNF
Symptoms:
Downgrading CNF could lead to invalid entries of selfIP and TMM mappings being stored in persistMap, which may result in duplicate or missing selfIPs on TMM pods. This improper allocation of selfIPs among TMM pods can cause traffic disruption.
Conditions:
CNF is downgraded from version 1.4.0 to version 1.3.1.
Impact:
Traffic disruption caused by selfIP misconfigurations.
Workaround:
After completing the downgrade, scale down both the controller and TMM pods to 0 replicas. Then, scale them back up to their desired replica count to resolve selfIP misconfigurations.
1616577-1 : NAT IPs become stale when all TMM pods go down
Component: CNF
Symptoms:
NAT IPs may appear as "stale" in the mrfdb output for NAT.
Conditions:
1. Scaling up a TMM while all existing TMM pods are in a terminating state.
2. All TMM pods go down
Impact:
These "stale" IPs can no longer be used for translation.
Workaround:
The only available workaround is to delete the NAT CR and recreate it. This will cause a temporary traffic disruption across all TMMs.
1574561-2 : The tmm-init ConfigMap Overwritten During Rolling Upgrade
Component: CNF
Symptoms:
During an f5ingress upgrade, custom TMM user data stored in the ConfigMap is overwritten, resulting in the loss of custom configurations
Conditions:
Upgrade f5ingress Helm chart to a newer version
Impact:
Overwriting custom configuration can lead to interruptions in services provided by CNF/SPK.
Workaround:
Save the tmm-init configuration before the upgrade. After updating the f5ingress Helm chart, transfer the custom configuration from the saved tmm-init file to the user_conf.tcl section of the new tmm-init configuration.
1230749-2 : Warning Message Displayed When Applying a Modified Global-Context CR
Component: CNF
Symptoms:
When updating the global-context CR, the following warning message may appear:
Warning: resource f5-big-context-globals/global-context is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
Conditions:
Executing the kubectl apply -f global-context.yaml command to update the global-context CR.
Impact:
There is no impact on the functionality. The global-context CR is successfully updated, and the update is handled correctly by the CNF. The warning message does not affect the operation.
Workaround:
None
1492301-2 : Increase in pool member statistics for some time after HSL pool members is down.
Component: CNF
Symptoms:
Pool member statistics continue to increment for the duration of configured monitor time-out even after pool member goes down.
Conditions:
While the traffic runs continuously, if HSL pool member goes down when monitor is attached to the pool.
Impact:
The stats increment for a while (monitor timer-out period) after the pool member has been down.
Workaround:
After pool member goes down, wait for the duration of configured monitor time-out value, before starting the traffic.
1591881-1 : Fluentd Flush Rate Remains Slow Despite Using Immediate Flush Mode
Component: CNF
Symptoms:
Fluentd uses the concept of a "chunk," which is essentially a portion of log data, like a file, temporarily stored in a buffer (such as a directory). As new data arrives, it is added to the chunk. In Fluentd, "flushing" refers to the process of moving the chunk from the buffer and writing it to the specified output destination, which could be a remote server, a directory, or another location.
The "flush mode" in Fluentd determines when the buffered log data should be flushed and sent to its destination.
* flush_mode immediate: Triggers Fluentd to immediately flush the buffered data as soon as it's received.
However, Fluentd is not flushing the data immediately, even though the flush_mode is set to immediate.
Conditions:
The issue arises when the 'flush_mode' parameter in the Fluentd configuration file (.conf) is set to immediate
Impact:
Logs are not flushed immediately as expected. Instead, it may take approximately one minute to clear the logs from the buffer, causing delays in log processing and delivery to the configured output destination.
Workaround:
Restart the Fluentd pod. However, if the Fluentd pod doesn’t have storage that saves logs permanently, some logs might be lost during the restart. To prevent losing logs, make sure Fluentd is set up with persistent storage.
1578457-1 : Inconsistent 'imagePullSecret' Parameter in CNF Helm Charts
Component: CNF
Symptoms:
Inconsistencies in the "imagePullSecret" parameter in the CNF Helm chart can cause confusion and potentially lead to failed deployments of some pods.
Conditions:
When deploying pods using the latest CNF tarball in OCP.
Impact:
Unable to deploy CNF properly or troubleshoot pod deployment failures efficiently due to incorrect "imagePullSecret" configuration.
Workaround:
None
1567421-2 : Untagged VLANS are not currently supported
Component: CNF
Symptoms:
When using untagged VLANs, there is an intermittent issue where only one TMM processes traffic
Conditions:
Intermittent issue occurs when using untagged VLANs
Impact:
When the issue occurs, traffic is dropped by all TMMs except one
Workaround:
None
1556969-1 : BGP Dynamic Routing: Prefix list remains in TMM even after being removed from the ConfigMap
Component: CNF
Symptoms:
Modifications or deletions in ZebOS configurations are not automatically tracked or managed. When making changes, such as updating an IP or altering a neighbor configuration, you must manually issue a "no" command to remove obsolete settings.
Conditions:
This issue occurs when modifying or removing existing configurations in ZebOS, particularly settings like neighbor IPs, without properly deleting the previous configurations first.
Impact:
You must manually track changes and issue “no” commands to prevent lingering obsolete configurations. This increases the potential for configuration errors and adds extra operational overhead.
Workaround:
To modify a configuration, first remove the existing setting using a "no" command (e.g., "no neighbor <IP>") before applying the new configuration.
1780305-1 : Delay in configuring self IPs when multiple TMM pods fail at the same time.
Component: CNF
Symptoms:
Delay in self IP configuration is observed after the failure of all TMM pods.
Conditions:
When all TMM pods fail and new TMMs come up, the controller takes additional time to determine the right set of running TMMs. Hence, there could be a delay in configuring the new TMMs.
Impact:
Delay may be observed while configuring self IPs on new TMMs.
Workaround:
Eventually, with a slight delay, all the TMMs will be configured.
1780329-1 : CR status for SecureContext and ALG remains in False state when AFM pod is restarted during config application.
Component: CNF
Symptoms:
Status remains False after a successful configuration. The root cause for this issue is that there is no CR status updates when CRs send to PCCD as part pccdGrpcConfig, so when a CR fails to send the first time (because of the AFM pod restart), the status for each of the CRs that get sent to PCCD will go into False ready status despite being configured correctly
Conditions:
CR status for SecureContext and ALG CRs remains in False state when AFM pod is restarted during config application.
Impact:
CR status remains in False state. However, the configuration still happens so it should work as expected.
Workaround:
CR Status is updated when there is a CR add/update/delete event or a dependent CR has an add/update/delete event
1783181-1 : CRs with multiple versions may not get applied after upgrade from CNF 1.3.2 Cert Manager to CNF 1.4.0 Open source Cert Manager.
Component: CNF
Symptoms:
Crd Conversion is unable to convert CRs for newly created namespaces after upgrading CNF 1.3.2 Cert Manager to CNF 1.4.0 OSS Cert Manager.
Conditions:
* Upgrade F5 Cert Manager to OSS Cert Manager
* Upgrade crd-conversion.
Impact:
CRs with multiple versions may not get applied.
Workaround:
Restart crd-conversion container. Once new crd-conversion container is up, the CRs can be applied.
1780241-1 : NAT IPs are becoming stale on rebooting all dssm pods
Component: CNF
Symptoms:
Some NAT IPs become stale. This can be observed in mrfdb output.
Conditions:
All DSSM pods are deleted manually or crash at the same time.
Impact:
Stale NAT IPs will not be used for translations.
Workaround:
Delete NAT CR and create it again.
1691617-1 : IPS sends large gRPC messages to TMM causing failures in broadcasting the message to other threads
Component: CNF
Symptoms:
* When IPSD is deployed, it sends large gRPC messages to TMM causing failures in broadcasting the message to other threads.
* After a TMM Pod is deleted, VLANs are not restored on one of the TMMs.
Conditions:
1. Apply a VLAN CR with multiple VLANs.
2. Delete the TMM Pods.
3. Check for VLAN restoration on the newly deployed TMM Pods.
Impact:
Newly deployed TMM Pods will not be able to serve traffic.
Workaround:
None.
1785333-1 : When a virtual server is in EDENY state, the deletion of CR F5BigDnsApp may not be reflected in the debug container
Component: CNF
Symptoms:
If you delete the CR F5BigDnsApp, tmctl pool_stat command may display the pool information when the nfigview output on debug container may not.
Conditions:
Deletion of CR F5BigDnsApp and virtual server is in EDENY state
Impact:
Tmctl pool_stat command may erroneously display the pool information
Workaround:
None.
★ This issue may cause the configuration to fail to load or may significantly impact system performance after upgrade
For additional support resources and technical documentation, see:
- The F5 Technical Support website: http://www.f5.com/support/
- The MyF5 website: https://my.f5.com/manage/s/
- The F5 DevCentral website: http://community.f5.com/