SPK Fixes and Known Issues¶
This list highlights fixes and known issues for this SPK release.
Version: 1.7.9
Build: 1.7.9
Note: This content is current as of the software release date
Updates to bug information occur periodically. For the most up-to-date bug data, see Bug Tracker.
Cumulative fixes from SPK v1.7.9 that are included in this release
Known Issues in SPK v1.7.9
Cumulative fixes from SPK v1.7.9 that are included in this release
ID Number | Severity | Links to More Info | Description |
1604729 | 1-Blocking | Readiness Probe and Liveness Probe for the CWC Pod | |
1612037 | 2-Critical | IPv6 routes are missing after the Snatpool CR update | |
1603469 | 2-Critical | Erroneous publishing of the Redis master DB in SPK DSSM deployment | |
1602625 | 2-Critical | The controller is not able to get license information when RabbitMQ connection is broken after the OCP upgrade | |
1599825 | 2-Critical | The RabbitMQ connection was broken after the OCP upgrade, and the reconnection is only partial | |
1575297-2 | 2-Critical | Snatpool config forgotten after TMM pod deletion | |
1518865-2 | 2-Critical | The f5-toda-tmstats service fails to reconnect to the otel-collector service after certificate rotation. | |
1294425-2 | 2-Critical | F5SPKIngressHTTP2 CR configuration fails with TLS or mTLS | |
1603461 | 3-Major | Remove repetitive and redundant logging | |
1578581-1 | 3-Major | Cannot update configured IPv4 SNAT pool with IPv6 SNAT pool members. | |
1327321-4 | 3-Major | License deactivation occurs after the f5ingress container restarts as a result of CR object deletion. | |
1235861-1 | 3-Major | Static routes created by the F5SPKIngressHTTP2 CR remain after the CR is deleted | |
1146241-6 | 3-Major | BT1146241 | FastL4 virtual server may egress packets with unexpected and erratic TTL values |
Cumulative fix details for SPK v1.7.9 that are included in this release
1612037 : IPv6 routes are missing after the Snatpool CR update
Component: SPK
Symptoms:
If an IPv4 SNAT pool is configured and it has to be updated to IPv4 and IPv6 SNAT pool by editing the CR, kernel routes for the IPv6 SNAT pool are not created.
Conditions:
Updating SNAT pool from IPv4 to IPv4 and IPv6 in F5SPKSnatpool CR.
Impact:
IPv6 Snatpool kernel routes are not created.
Workaround:
Delete the SNAT CR with IPv4 pool and reapply the CR with IPv4 and IPv6 pool.
Fix:
To update an already configured IPv4 SNAT pool to an IPv4 and IPv6 SNAT pool, delete the existing CR with the IPv4 pool and reapply the CR with both the IPv4 and IPv6 pools.
1604729 : Readiness Probe and Liveness Probe for CWC Pod
Component: SPK
Symptoms:
Currently, in SPK, if the RabbitMQ pod is down, the CWC pod is in a "Ready" state, even though the connection between CWC and RabbitMQ is broken.
Conditions:
1. Random start up of RabbitMQ pod & multiple restarts of RabbitMQ can cause the issue.
2. Stop the RabbitMQ Pod
Impact:
License status not getting updated in RabbitMQ.
Workaround:
Restart CWC pod.
Fix:
The Readiness Probe/Liveness Probe will check the server running in the CWC pod. The server will be up only when the connection between CWC and RabbitMQ is established and the queue is generated. If any of these conditions fail, the server will stop running, resulting in a readiness/liveness probe failure. Restarting the CWC pod will allow the server to start once the connection is re-established, and the CWC pod will transition to the "Ready" state.
1603469 : Erroneous publishing of the Redis master DB in SPK DSSM deployment
Component: SPK
Symptoms:
The DB and Sentinel pods might land into an erroneous state where the Sentinels can point to a non-master DB as the master DB erroneously under certain sequence of events
Conditions:
The following sequence of events would cause the above scenario to surface:
1.A failover is performed to make DB-2 the master and DB-0/DB-1 the replicas.
2. Scale down the Sentinels to 0, then delete pod DB-2, followed by scaling up the Sentinels to 3, or delete all 3 Sentinel pods and DB-2 together.
3. After step 2, during bootup, the init script in each pod must finish querying the master DB status from Sentinel within 5 seconds.
Impact:
TMM is not able to establish communication with the Redis DB, which causes a disruption to traffic flow.
Workaround:
The mitigation is to scale down both DB and Sentinel pods to 0 and then scale them up to 3 using the steps below:
1. Scale down DB pods to 0:
oc scale statefulset/f5-dssm-db --replicas=0 -n <namespace>
2. Scale down Sentinel to 0:
oc scale statefulset/f5-dssm-sentinel --replicas=0 -n <namespace>
3. Scale up DB to 3:
oc scale statefulset/f5-dssm-db --replicas=3 -n <namespace>
4. Scale up Sentinel to 3:
oc scale statefulset/f5-dssm-sentinel --replicas=3 -n <namespace>
Fix:
The fix is performed in two parts:
1. The TMM code is enhanced to disconnect and try to reconnect to the Redis DB in case it connects to a READONLY DB.
2. The init bootup script is enhanced to handle the intended scenario gracefully
1603461 : Remove repetitive and redundant logging
Component: SPK
Symptoms:
F5-ingress logs contain some entries that are repetitive and redundant
Conditions:
Always
Impact:
Comparatively succinct logging
Fix:
Some of the repetitive and redundant logs have been changed to make logging more effective.
1602625 : The controller is not able to get license information when RabbitMQ connection is broken after the OCP upgrade
Component: SPK
Symptoms:
If there is a RabbitMQ communication failure, then the controller will not be able to gather license information and, hence, will not configure TMMs thereby rendering the instance nonfunctional.
Conditions:
CWC lost connectivity with RabbitMQ and license status was not sent to F5ingress
Impact:
F5Ingress is not able to push the configuration to TMM because it cannot communicate with CWC via RabbitMQ. As a result, TMM is unable to process traffic.
Workaround:
Restart CWC pod.
Fix:
F5ingress reads license status from a secret to fetch license information when RabbitMQ is down.
1599825 : The RabbitMQ connection was broken after the OCP upgrade, and the reconnection is only partial
Component: SPK
Symptoms:
During boot up, as part of initialization, CWC creates a connection with RabbitMQ. It also creates a channel and queue to receive requests from F5 controllers. It then starts a routine to send heartbeat messages periodically to all subscribers. According to the logs, the initial connection was successful.
It appears that the worker node was upgraded, causing RabbitMQ to move to a different worker node, resulting in CWC losing its connection with RabbitMQ. CWC was notified about the disconnection and started retrying the connection with RabbitMQ. During the retry, if the connection is not successful, CWC waits (sleeps) for 3 seconds before attempting a new connection.
Conditions:
Random start up of RabbitMQ pod & multiple restarts of RabbitMQ can cause the issue.
Impact:
License status not getting updated in RabbitMQ.
Workaround:
Restart CWC pod.
Fix:
Connection retry shall be performed only by main thread.
When CWC receives disconnect notification, it should
1.Stop heartbeat routine/timer,
2.Retry connection and establish connection, create channel and queue
3.Start heartbeat timer
This way only one thread retries for connection and above scenario of race-condition doesn’t occur.
1578581-1 : Cannot update configured IPv4 SNAT pool with IPv6 SNAT pool members.
Component: SPK
Symptoms:
If an IPv4 SNAT pool is configured and it has to be changed to IPv6 pool members by editing the CR, kernel routes for the IPv6 SNAT pool are not created.
Conditions:
Updating SNAT pool members from IPv4 to IPv6 in F5BigCneSnatpool CR
Impact:
IPv6 snatpool kernel routes are not created.
Workaround:
Delete the snat CR with IPv4 pool members and reapply the CR with IPv6 pool members.
Fix:
To update already configured V4 snat pool to V6 snat pool, delete the existing CR with V4 pool and reapply CR with V6 pool members.
1575297-2 : Snatpool config forgotten after TMM pod deletion
Component: SPK
Symptoms:
When a TMM pod is deleted, there is a chance that some Snatpool addresses assigned to a pod will be forgotten by F5ingress while doing the conversion from pod uuid to ip. This can occur if the TMM pod is already deleted and so the conversion mapping for that pod is not accessible. This results in F5ingress thinking it has less Snatpool addresses than the Snatpool CR says it should.
Conditions:
1. Snatpool CR is applied
2. TMM pod is deleted, via any of the following: scale down, helm upgrade, direct pod deletion.
Impact:
Some TMMs won't get Snatpool config even though there is sufficient Snatpool config to assign to them. This can negatively effect traffic passage.
Workaround:
Restart F5ingress.
This causes F5ingress to reload the Snatpool CR into memory and get back the Snatpool config it lost.
Fix:
F5ingress will compare assigned addresses to the Snatpool CR's addresses when computing the available Snatpool addresses. This way, if any addresses are lost in the uuid to ip conversion due to pod deletion, they will be reacquired.
1518865-2 : The f5-toda-tmstats service fails to reconnect to the otel-collector service after certificate rotation.
Component: SPK
Symptoms:
The f5-toda-tmstats service fails to reconnect to the otel-collector service after the otel-collector certificates are rotated.
Conditions:
otel-collector certificates are rotated at least once
Impact:
SPK metrics would not be exported
Workaround:
Restart f5-toda-tmstats service
Fix:
The f5-toda-tmstats service was fixed to load new otel-collector certificates after they are rotated.
1327321-4 : License deactivation occurs after the f5ingress container restarts as a result of CR object deletion
Component: SPK
Symptoms:
When f5ingress crashes, the license is deactivated.
Conditions:
When multiple CRs are applied and then deleted immediately, the f5ingress crashes due to the CR object deletion for an object that is not present in cache.
Impact:
The f5ingress crash results in license deactivation because the license helper never communicates the license status back to f5ingress after the restart. As a result, the config is never re-sent to TMM and traffic is disrupted.
Workaround:
None
Fix:
F5ingress no longer crashes after applying and deleting CRs continuously. Licensing details are resent by f5-lic-helper after the f5ingress container restarts, followed by all configs being sent to TMM.
1294425-2 : F5SPKIngressHTTP2 CR configuration fails with TLS or mTLS
Component: SPK
Symptoms:
HTTP2 virtual servers are not created when configured for TLS or mTLS in an F5SPKIngressHTTP2 Custom Resource.
Conditions:
F5SPKIngress Custom Resource is configured for one of the following:
- mTLS on the clientside
- TLS on the serverside
- mTLS on the serverside
Impact:
HTTP2 traffic does not pass through the SPK.
Workaround:
Configure the F5SPKIngressHTTP2 CR for TLS offload, with no encryption on the serverside and TLS on the clientside.
Fix:
F5SPKIngressHTTP2 can be configured for TLS and mTLS on both clientside and serverside.
1235861-1 : Static routes created by the F5SPKIngressHTTP2 CR remain after the CR is deleted
Component: SPK
Symptoms:
Static routes created by the F5SPKIngressHTTP2 CR remain in the TMM configuration after the CR is deleted, or the service object endpoints are scaled down.
Conditions:
TMM is configured with routes to pool members for F5SPKIngressHTTP2 CR. After deleting the CR routes are not removed
Impact:
Any new static routes added to the TMM configuration may conflict with the erroneous static routes.
Workaround:
After deleting the CR scale TMM pod to 0 and scale it back to 1
Fix:
HTTP2 configuration is deleted after F5SPKIngressHTTP2 CR deleted.
1146241-6 : FastL4 virtual server may egress packets with unexpected and erratic TTL values
Links to More Info: BT1146241
Component: SPK
Symptoms:
A FastL4 virtual server may egress (either towards the client or the server) IP packets with unexpected and erratic TTL values. The same also applies to IPv6, where the TTL field is known as Hop Limit.
Conditions:
- The BIG-IP system is a Virtual Edition (VE).
- The Large Receive Offload (LRO) is enabled on the system (which it is by default), and is operating in software mode. You can determine whether LRO is enabled on the system by inspecting the tm.tcplargereceiveoffload DB key, and you can determine whether LRO is operating in software mode by trying to query the tcp_lro tmstat table (tmctl -d blade tcp_lro). If the table exists, LRO will be operating in software mode.
- The FastL4 profile is configured to decrement the TTL (this is the default mode).
- The virtual server uses mismatched IP versions on each side of the proxy (for example, an IPv6 client and an IPv4 server).
Impact:
Depending on the actual TTL values that will be sent out on the wire (which can be random and anything within the allowed range for the field) traffic can be dropped by routers on the way to the packet's destination.
This will happen if there are more routers (hops) on the way to the packet's destination than the value specified in the TTL field.
Ultimately, this will lead to retransmissions and possibly application failures.
Workaround:
You can work around this issue by doing either of the following things:
- Disable LRO on the BIG-IP system by setting DB key tm.tcplargereceiveoffload to disable.
- Use a TTL mode for the FastL4 profile other than decrement (for example, use proxy or set).
Fix:
The TTL decrement mode now works as expected under the conditions specified above.
Known Issues in SPK v1.7.9
SPK Issues
ID Number | Severity | Links to More Info | Description |
1566529-2 | 2-Critical | IngressEgressUDP Configuration does not scale correctly with f5-tmm pods | |
1612869 | 2-Critical | There is a disruption to TMM traffic flow when all sentinels start up and the configured master DB is down | |
1612401 | 4-Minor | Egress Custom Resource IP Values are not validated | |
1614833 | 3-Major | The expected deployment is failing even though imagePullSecrets are provided in the Helm charts |
Known Issues details for SPK v1.7.9
1566529-2 : IngressEgressUDP Configuration does not scale correctly with f5-tmm pods
Component: SPK
Symptoms:
The IngressEgressUDP Custom Resource does not send configuration to new f5-tmm pods when the pods are scaled up.
Conditions:
An IngressEgressUDP Custom Resource is created and f5-tmm pods are scaled up after CR creation.
Impact:
IngressEgressUDP traffic only flows through f5-tmm pods that existed at the time of the CR creation. Egress traffic may fail when directed to an f5-tmm pod that has not been configured.
Workaround:
Delete and re-create the IngressEgressUDP Custom Resource after scaling up f5-tmm pods.
1612869 : There is a disruption to TMM traffic flow when all sentinels start up and the configured master DB is down
Component: SPK
Symptoms:
In a scenario where all the sentinels start up freshly (e.g., scaling down the sentinels to 0 followed by scaling up), the sentinels require the configured master DB to be up and functional. This is necessary for the sentinels to gather complete information about all the configured databases. If the configured master DB is down during the sentinels' startup, the sentinels fail to retrieve the complete database information and, hence, fail to create/expose the master/replica DB framework for SPK until the master DB is up and running.
Conditions:
The configured master DB is consistently not accessible or down during the startup of all sentinels.
Impact:
TMM is not able to establish communication with the Redis DB, which causes a disruption to traffic flow.
Workaround:
The mitigation is to scale down both DB and Sentinel pods to 0 and then scale them up to 3 using the steps below:
1. Scale down DB pods to 0:
oc scale statefulset/f5-dssm-db --replicas=0 -n <namespace>
2. Scale down Sentinel to 0:
oc scale statefulset/f5-dssm-sentinel --replicas=0 -n <namespace>
3. Scale up DB to 3:
oc scale statefulset/f5-dssm-db --replicas=3 -n <namespace>
4. Scale up Sentinel to 3:
oc scale statefulset/f5-dssm-sentinel --replicas=3 -n <namespace>
1614833 : The expected deployment is failing even though imagePullSecrets are provided in the Helm charts
Component: SPK
Symptoms:
Configuring the F5 Ingress Controller using image secrets in the Helm values file results in image pull errors
Conditions:
Ensure that the secret is created in the same namespace and is referenced in the Helm values file.
Impact:
Cannot download the Docker images from the registry, and the pods will go into ImagePullBackOff state.
Workaround:
Set up a proxy that handles authentication for Docker image pulls across your Kubernetes cluster.
Example:
oc get secret/pull-secret -n openshift-config --template='{{index .data ".dockerconfigjson" | base64decode}}' > pullsecret
oc registry login --registry=docker-registry.com --auth-basic="admin:admin" --to=pullsecret
oc set data secret/pull-secret -n openshift-config --from-file=.dockerconfigjson=pullsecret
1612401 : Egress Custom Resource IP Values are not validated
Component: SPK
Symptoms:
IP Addresses entered into the fields in an Egress Custom Resource are not validated. Invalid IPs will not result in the failure of CR creation.
Conditions:
An Egress Custom Resource is applied in a namespace with invalid IP values in any of the following fields: dnsNat46PoolIps, dnsNat46Ipv4Subnet, dnsNat46SorryIp, nat64Ipv6Subnet
Impact:
Egress traffic and/or DNS traffic will not pass, depending on which IP values are invalid.
Workaround:
Delete the Egress Custom Resource with invalid IP Addresses, fix the invalid IP Address in the Egress CR, and reapply the CR.
★ This issue may cause the configuration to fail to load or may significantly impact system performance after upgrade
For additional support resources and technical documentation, see:
- The F5 Technical Support website: http://www.f5.com/support/
- The MyF5 website: https://my.f5.com/manage/s/
- The F5 DevCentral website: http://devcentral.f5.com/