5.3. Troubleshooting SSL Orchestrator High Availability Issues

5.3.1. What it is

F5 SSL Orchestrator relies on a separate REST-based communication process between the peers to convey synchronization information. As with anything else, unexpected problems can arise usually due to configuration issues.



5.3.2. How to troubleshoot it

SSL Orchestrator provides a built-in high availability (HA) status utility to help in diagnosing HA communication issues. After selecting SSL Orchestrator -> Configuration, click the HA Status link in the top right corner. This will present a screen (on both devices) that displays the status of the various communication states.

../_images/image85.png
../_images/image86.png

Figure 85/86: SSL Orchestrator HA Status Dashboard


In the event that any of these are bad (red), or an SSL Orchestrator configuration has failed due to HA issues, follow the below matrix to troubleshoot the HA configuration. First, note the limitations imposed by SSL Orchestrator HA:

HA Limitations User Input
HA mode HA is restricted to two (2) devices in active/standby mode.
Sync mode HA requires manual sync mode, with either full or incremental updates.
Device groups HA supports one (1) device group.

Before configuring SSL Orchestrator, ensure that you note these prerequisites:

HA Configuration User Input
BIG-IP version and provisioning Both devices must be running the same BIG-IP version with the same licensing and modules provisioned.
Sync channel port lockdown After selecting Network -> Self-IPs, ensure that the self-IP used for peer synchronization has the Port Lockdown set to either Allow All or Allow Default. SSL Orchestrator sync happens via REST communications on port 443.
Time synchronization After selecting System -> Configuration -> Device -> NTP, ensure that both devices are configured to use NTP and that time is correct (synchronized) on both devices.
Initial config sync Ensure that both devices are synced before deploying the SSL Orchestrator configuration.
Non-SSL Orchestrator objects Ensure that any objects not created by SSL Orchestrator are created on both devices (ex. ingress/egress VLANs and self-IPs, SSL/TLS certificates).

Assuming the BIG-IP system is in a correct Active/Standby HA state, and the devices have been synchronized, the following troubleshooting matrix will guide you through the steps to troubleshooting SSL Orchestrator HA issues.

HA Troubleshooting Matrix User Input
Virtual Edition device-id

If on a VE platform, ensure that the peer devices do not have the same device-id. From the BIG-IP command line, enter the following:

cat /etc/f5-rest-device-id

If the values are the same on both devices, delete them both and restart services to regenerate new (unique) values:

rm -f /etc/f5-rest-device-id bigstart restart restjavad
Gossip worker active state

The Gossip worker is responsible for notifying the peer of configuration changes. Ensure that the Gossip worker is in an Active state with the correct peer group set to “tm-shared-all-big-ips”. From the BIG-IP command line, enter the following:

restcurl shared/gossip

Observe the output and verify:

“status”: “Active”

“gossip”: “tm-shared-all-big-ips”

Perform this check on both devices. If the values are not as listed above, tear down and rebuild the BIG-IP system HA.

Review SSL Orchestrator HA logs

Look for “Gossip” related warnings in the Restjavad log. From the BIG-IP command line, enter the following:

grep WARNING /var/log/restjavad.*.log | grep Gossip
Observe Gossip conflicts

When the Gossip worker cannot apply an update to the local worker, it adds a description of the problem in /shared/gossip-conflicts. To view this from the BIG-IP command line, enter the following:

restcurl shared/gossip-conflicts
Device group values

Ensure that SSL Orchestrator is using the correct management address for synchronization. From the BIG-IP command line, enter the following:

restcurl shared/resolver/device-groups /tm-shared-all-big-ips/devices

Verify that the output is the same on both devices. Specifically, observe the following:

“address”: “10.1.10.100”

“managementAddress”: “10.1.1.4”

The “address” value corresponds to the configSync IP configuration, and the “managementAddress” corresponds to the management IP address of each device in the device group. If the values are not as listed above, correct the configSync and management IP configurations in the BIG-IP system HA settings.

If gossip shows “UNPAIRED”, you may need to do the following on both devices:

Delete existing device information

restcurl -X DELETE shared/resolver/device-groups /tm-shared-all-bigips/devices

Force updating

restcurl -X POST -d ‘{}’ tm/shared/bigip-failover-state
Application ID

Check the value of the application ID on each device. From the BIG-IP command line, enter the following:

curl -sku ‘admin:admin’ https://localhost/mgmt/shared /iapp/global-installed-packages |jq

Replace “admin:admin” with the correct administrative username and password. In the output, verify that the ID value is the same on both devices. If they are not, delete all SSL Orchestrator configurations, uninstall the RPM, and re-install.

Failover state

The BIG-IP Failover worker detects device group and failover settings on the BIG-IP by continually polling these settings. It uses the REST framework’s Gossip mechanism to replicate configuration. Verify the failover state by entering the following on the BIG-IP command line:

restcurl tm/shared/bigip-failover-state

Observe the output and verify that values are the same (except for the “failoverState”, which should be “active” or “standby” on the active and standby peers, respectively). If the output values are not the same, trigger the Failover worker with the following command:

restcurl -X POST -d ‘{}’ tm/shared/bigip-failover-state

If the above fails, delete all SSL Orchestrator configurations, uninstall the RPM, and re-install.