SPK Fixes and Known Issues¶
This list highlights fixes and known issues for this SPK release.
Version: 1.9.2
Build: 1.9.2
Note: This content is current as of the software release date
Updates to bug information occur periodically. For the most up-to-date bug data, see Bug Tracker.
Cumulative fixes from SPK v1.9.2 that are included in this release
Known Issues in SPK v1.9.2
Cumulative fixes from SPK v1.9.2 that are included in this release
ID Number | Severity | Links to More Info | Description |
1591513 | 1-Blocking | TMM cores with qosClass set to 'Burstable' in OpenShift deployment. | |
1576229-2 | 2-Critical | TMM crashes when it receives an invalid packet fragment from driver. | |
1550609-1 | 2-Critical | F5SPKEgress CR Status Stuck Deploying when controller restarts | |
1581289 | 3-Major | Validation webhook is preventing reapply of an existing static route CR | |
1575605-2 | 3-Major | Same self IP address is assigned to multiple TMMs on scaling, restarting, or upgrading of TMM pods | |
1572977-1 | 3-Major | Controller pod restart and TMM container restart could result in a condition where restarted TMM pod would not have VLAN and self IP configuration | |
1551085-2 | 3-Major | F5-spk-ingresstcp that uses a vlan lists is unavailable | |
1329849-2 | 3-Major | TMM crashes on startup due to DPDK mlx5 driver | |
1327321-5 | 3-Major | License deactivation occurs after the f5ingress container restarts as a result of CR object deletion. | |
1326169-3 | 3-Major | CWC goes in to CrashLoopBackOff after upgrade★ | |
1282605-2 | 3-Major | Cannot configure TMM when an application Pod scales up if the ingress TCP CR is deployed without a service port | |
1329369-1 | 4-Minor | Telemetry report includes controller nodes in vCPU count |
Cumulative fix details for SPK v1.9.2 that are included in this release
1591513 : TMM cores with qosClass set to 'Burstable' in OpenShift deployment.
Component: SPK
Symptoms:
The F5-TMM pod goes into a restart loop, causing traffic disruption.
Conditions:
The SPK dataplane is deployed in a cluster with SMT enabled, which sets the qosClass to 'Burstable' for TMM pods.
Impact:
The F5-TMM pod goes into a restart loop, causing traffic disruption.
Workaround:
For clusters were TMM gets deployed, hyperthreading should be kept disabled to ensure 'Guaranteed' qosClass for TMM.
Fix:
Ensure 'Guaranteed' qosClass for TMM pods.
1581289 : Validation webhook is preventing reapply of an existing static route CR
Component: SPK
Symptoms:
Validation webhook is preventing reapply of an existing static route Custom Resources (CR).
Conditions:
- Run the command 'kubectl apply -f <static-route-cr>'.
- Run the command again with the same static route YAML file.
Impact:
The kubectl apply command throws an error if the CR is already applied.
Workaround:
None
Fix:
Validation webhook is modified to allow updates on static route CR. It does not block kubectl apply command if same CR is applied again.
1576229-2 : TMM crashes when it receives an invalid packet fragment from driver.
Component: SPK
Symptoms:
TMM crashes due to a failed assertion in Rx packet path --
Assertion "Rx cd is valid" failed.
Conditions:
TMM is deployed with dpdk driver.
Impact:
TMM container is in a restart loop.
Workaround:
Use sock driver.
Fix:
TMM gracefully handles such invalid packet fragments by dropping them and updating the driver stats to reflect the drop.
1575605-2 : Same self IP address is assigned to multiple TMMs on scaling, restarting, or upgrading of TMM pods
Component: SPK
Symptoms:
Multiple TMMs are configured with same self IP address. TMM only picks up the self IP which was sent first and ignores the subsequent updates received from F5Ingress controller.
Conditions:
When TMM pod takes too long to be ready and the condition could be hit when there are more number of TMM pods in a rolling upgrade scenario.
Impact:
F5Ingress fails to persist information (TMM-selfIP assignment) in its persistent configmap. Due to which it keeps attempting to re-configure the TMM with new self IP assuming the TMM was not previously configured. Eventually, it assigns distinct self IPs to all TMM pods and also persists the accurate information in configmap. But TMM only picks up the self IP which was sent first and ignores the subsequent updates. TMMs with duplicate self IPs will not process the traffic.
Workaround:
Scale down the TMM pods to 0 and bring them back up, all TMMs will have unique self IP addresses assigned to them.
Fix:
F5Ingress controller is hardened with consistent way of identifying the ready TMMs. Based on that, managing self IPs across ready TMMs is refactored and fixed.
1572977-1 : Controller pod restart and TMM container restart could result in a condition where restarted TMM pod would not have VLAN and self IP configuration
Component: SPK
Symptoms:
If controller pod restarts first and TMM container restarts later or when both restart simultaneously, the restarted TMM would be devoid of VLAN and self IP configuration.
Conditions:
A sequential restart of controller pod and TMM container or simultaneous restart of both.
Impact:
Restarted TMM pod would not be able to process traffic.
Workaround:
Delete the impacted TMM pod so that new pod receives the correct VLAN and self IP configuration.
Fix:
After restart, controller pushes complete configuration again to all TMMs. Earlier, it was sending only common configuration. It is enhanced to send TMM specific configuration as well.
1551085-2 : F5-spk-ingresstcp that uses a vlan lists is unavailable
Component: SPK
Symptoms:
Connections to a f5-spk-ingresstcp with a vlan list fail. This may occur at startup or when f5ingress or tmm are scaled up.
The virtual server, pool and pool member tmm stats tables show the f5-spk-ingresstcp CR related rows.
The tmm logs show messages like "Config info: Vlan <vlan name> was not found to be currently configured."
Conditions:
At startup or scale up when the f5-spk-ingresstcp CR already exists, has a vlan list and the necessary f5-spk-vlan CRs already exist.
Impact:
Traffic to the f5-spk-ingresstcp is dropped by TMM
Workaround:
None
Fix:
The f5-spk-ingresstcp is available.
1550609-1 : F5SPKEgress CR Status Stuck Deploying when controller restarts
Component: SPK
Symptoms:
F5SPKEgress CR stuck with "DEPLOYING" status
Conditions:
When SPK is deployed with F5SPKEgress CR, the tmms are successfully configured and F5SPKEgress CR has a SUCCESS status.
However, once the F5Ingress controller restarts, it assumes those tmms have already be configured (which they are), but the current result was that the F5SPKEgress CR status wasnt updated from DEPLOYING to SUCCCESS.
Impact:
Misleading status. Customer may assume that the CR was never successfully installed and something is wrong with their setup or configuration.
Workaround:
Uninstall and reinstall the F5SPKEgress CR
Fix:
This makes sure that F5Ingress Controller properly updates the F5SPKEgress CR status to SUCCESS if the controller restarts, and there are previously configured TMMs.
1329849-2 : TMM crashes on startup due to DPDK mlx5 driver
Component: SPK
Symptoms:
TMM container is in a restart loop.
Conditions:
SPK 1.8 is deployed with ConnectX-6 Lx NIC, and the NIC is on firmware version 26.35.1012.
Impact:
Traffic disrupted while TMM restarts.
Workaround:
None
Fix:
TMM no longer crashes on startup.
1329369-1 : Telemetry report includes controller nodes in vCPU count
Component: SPK
Symptoms:
Telemetry report includes controller nodes in vCPU count.
Conditions:
During licensing, initial configuration report and Telemetry report after licensing includes controller nodes in vCPU count.
Impact:
Telemetry report is inaccurate.
Workaround:
None
Fix:
Telemetry report now does not includes controller nodes in vCPU count.
Note: To define a node as controller it should be having label 'node-role.kubernetes.io/control-plane=' or 'kubernetes.io/role=control-plane'.
1327321-5 : License deactivation occurs after the f5ingress container restarts as a result of CR object deletion.
Component: SPK
Symptoms:
After f5ingress crashes, the license gets deactivated because f5-lic-helper never communicates the license status back to f5ingress after the restart.
Conditions:
When multiple CRs are applied and then deleted immediately, the f5ingress crashes due to the CR object deletion for an object that is not present in cache.
Impact:
The f5ingress crash results in license deactivation because the license helper never communicates the license status back to f5ingress after the restart. As a result, the config is never re-sent to TMM.
Workaround:
None
Fix:
F5ingress no longer crashes after applying and deleting CRs continuously. Licensing details are resent by f5-lic-helper after the f5ingress container restarts, followed by all configs being sent to TMM.
1326169-3 : CWC goes in to CrashLoopBackOff after upgrade★
Component: SPK
Symptoms:
CWC goes in to CrashLoopBackOff at the end of the month, after an upgrade.
Conditions:
This is observed in disconnected mode,
If you have not been sending telemetry reports for over a month, and upgrade the CWC, CWC goes into CrashLoopBackOff at the end of the month
Impact:
CWC becomes unoperational. Any new f5controller that comes up will not receive "license" status. This blocks pushing configuration by f5controller to TMMs. This may impact traffic processing.
Workaround:
Submit pending telemetry report, then perform the upgrade.
Fix:
Upgrading CWC should not crash CWC on periodic telemetry report generation.
1282605-2 : Cannot configure TMM when an application Pod scales up if the ingress TCP CR is deployed without a service port
Component: SPK
Symptoms:
F5ingress controller fails to configure TMM with virtual server and pool members after an F5SPKIngressTCP CR that does not have a service port is deployed.
Conditions:
When a single F5SPKIngressTCP CR that does not have a service port is deployed along with other CRs that do have service ports, TMM fails to configure for any subsequent application Pod scale events.
Impact:
When a F5SPKIngressTCP CR that does not have a service port is present and an application Pod is scaled up from zero, TMM does not get configured with virtual servers for already
deployed valid CRs. If an application Pod is scaled up from non-zero replicas, pool members to existing virtual server for valid CRs do not get updated.
Workaround:
Delete all F5SPKIngressTCP CRs that refer to non-existing service ports and then restart the f5ingress controller Pod.
Fix:
Now, TMM continues to be configured for CRs that have service ports even if there are CRs that do not have service ports.
Known Issues in SPK v1.9.2
SPK Issues
ID Number | Severity | Links to More Info | Description |
1332585-2 | 1-Blocking | CPS Performance has degraded by 20% | |
1578477-2 | 2-Critical | TMMs receive duplicate internal IP address after multiple f5ingress restart | |
1134241-3 | 2-Critical | Network packets are no longer dropped, or routed incorrectly after the TMM Pod is restarted or scaled up. |
Known Issue details for SPK v1.9.2
1578477-2 : TMMs receive duplicate internal IP address after multiple f5ingress restart
Component: SPK
Symptoms:
TMMs receive duplicate internal IP address after multiple back to back f5ingress restart
Conditions:
F5ingress is restarted before it completes reconciliation of the contents of config map where it persists the TMM specific config like selfip assignment, DNS subnet assignment etc. This causes the configmap to be out of sync with the config on TMM.
Any subsequent scaling of TMM or restart could result in TMMs receiving duplicate self IP.
Impact:
Traffic would be impacted on the TMMs with duplicate self Ips
Workaround:
Scale down TMM replicas to zero and then bring them back up. Please note that the workaround causes traffic disruption for a brief period.
1332585-2 : CPS Performance has degraded by 20%
Component: SPK
Symptoms:
SPK 1.9.2 CPS performance has degraded by 20% compared to SPK 1.8 version.
Conditions:
Always
Impact:
There are lower connections per second.
1134241-3 : Network packets are no longer dropped, or routed incorrectly after the TMM Pod is restarted or scaled up.
Component: SPK
Symptoms:
The TMM Pod configuration may experience a race condition after the Pod is restarted or scaled up, causing networking issues such as incorrect routing or incorrect address translations. This occurs within the first few seconds of the TMM Pod accepting traffic.
Conditions:
The following networking Custom Resources (CRs) are installed: F5SPKVlan, F5SPKStaticRoute, F5SPKSnatPool.
One ore more application traffic CRs are installed that allows the TMM to process traffic, for example F5SPKIngressTCP, or F5SPKEgress.
The TMM pod restarts or is scaled up.
Network traffic routes through the TMM immediately when the Pod reaches ready state.
Impact:
Client connectivity may fail due to being routed incorrectly or when packets arrive before the configuration has finalized.
Workaround:
Delete all instances of Custom Resources that configure the TMM to process traffic before scaling the TMM pod and re-apply after newly scaled pods are "Running".
Fix:
Networking annotations for the TMM pod have been delayed by 10 seconds to allow the configuration to finalize before internal cluster traffic is routed to the TMM.
★ This issue may cause the configuration to fail to load or may significantly impact system performance after upgrade
For additional support resources and technical documentation, see:
- The F5 Technical Support website: http://www.f5.com/support/
- The MyF5 website: https://my.f5.com/manage/s/
- The F5 DevCentral website: http://devcentral.f5.com/