HA Failover State¶
Overview¶
You can use the HA Failover State API to get the current HA failover state of the BIG-IQ high-availability (HA) configuration. Use the HA Add Peer API to add a BIG-IQ to the BIG-IQ HA configuration. Use the HA Remove Standby API to remove the standby mode from all BIG-IQs in the HA configuration. Use the HA Promote API to promote the secondary BIG-IQ into the primary BIG-IQ. Use the HA Reset API to reset a primary or secondary BIG IQ in a HA configuration to the standalone mode.
A BIG-IQ high availability (HA) configuration can ensure the continuous management of your BIG-IPs, by a standby BIG-IQ, in the event your primary BIG-IQ goes down. BIG-IQ offers the options of either manual failover mode or auto failover mode. When using the manual failover mode, manual intervention is required to promote the secondary BIG-IQ to become the primary BIG-IQ managing your BIG-IPs. To use manual failover mode, you require two BIG-IQs to act as the primary and secondary and both must be in the same network segment and have the same configuration. When using the auto failover mode, if the primary BIG-IQ goes down, a quorum DCD decides automatically to promote the secondary BIG-IQ to become the primary BIG-IQ. To use the auto failover mode, you require two BIG-IQs to act as primary and secondary BIG-IQs and a data collection device (DCD) to act as the quorum device.
Requests¶
haMonitorStatus¶
The haMonitorStatus object reflects the status of the daemons used to configure auto failover, corosync and pacemaker. All of the following fields apply only for auto failover.
Name | Type | Description |
---|---|---|
canFailover | boolean | Failover can happen but can be overridden. |
clusterHealth | string | The health of the auto failover HA. The health of the cluster is determined by the status of the resources managed by the nodes and the nodes’ health. It can have one of the following values: “healthy” - The HA cluster is healthy. “warning” - There are some warnings in the cluster state. “degraded” - The quorum device is offline. “failed” - When the primary or the secondary node is down. “down” - If the current node is down. “unconfigured” - Auto failover HA is not configured. |
failureCauses | string | Reasons for failure. |
fenceAction | string | Fence action that was taken on the node. It can have one of the following values: “none” - No fencing. “world” - the node has been fenced from the world. “primary” - fence action is recommended but can be overridden. |
hasFailedOver | boolean | If the secondary node is the current primary after the failing over. |
isConfigured | boolean | If true, the corosync and pacemaker daemons are configured. |
isQuorate | boolean | If true, a quorum device has been added to the HA configuration. |
isRunning | boolean | If true, the corosync and pacemaker daemons are running. |
nodeHealth | string | Node health in the auto failover HA configuration. It can have one of the following values: “healthy”, “warning”, “degraded”, “failed”, “down”, “unconfigured” |
offlineNodes | array | An array of nodes that are offline in the auto failover HA configuration. |
onlineNodes | array | An array of nodes that are online in the auto failover HA configuration. |
primaryResources | string | Resources managed by the primary node in the HA configuration. |
role | string | Node’s role in HA as recorded in the configuration. |
secondaryResources | string | Resources managed by the secondary node in the HA configuration. |
shouldFailover | boolean | If true, the failover criteria have been met. This will then trigger the auto failover mechanism. |
Permissions¶
Role | Allow |
---|---|
admin | Yes |
Examples¶
GET to retrieve status of the HA configuration¶
The following example shows a GET request sent to the BIG-IQ to retrieve the status of the HA configuration.
GET https://192.0.2.0/mgmt/shared/failover-state
Response if HA is in auto failover mode¶
The JSON in the response can look similar to the following example.
HTTP/1.1 200 OK
{
"systemMode": "HA",
"nodeRole": "PRIMARY",
"primaryMachineID": "245acab3-0e8d-43a5-9c12-e3554ee64c85",
"secondaryMachineID": "4af03ba9-16ae-40a0-a3e0-e695c04d5868",
"useQuorumHA": true,
"haStatus": {
"currentState": "OPERATIONAL",
"currentStatusTimestamp": 1583961637987247,
"peerAvailable": true,
"primaryStatus": {
"nodeState": "OPERATIONAL",
"timestamp": 1583961636902324,
"replicationStatus": {
"replicationStat": {
"pid": "1077",
"usesysid": "18666",
"usename": "postgres_replication",
"application_name": "walreceiver",
"client_addr": "10.6.0.42",
"client_hostname": "null",
"client_port": "54351",
"backend_start": "2020-03-11 12:53:58.237866",
"backend_xmin": "null",
"state": "streaming",
"sent_lsn": "1/38467860",
"write_lsn": "1/38467860",
"flush_lsn": "1/38467860",
"replay_lsn": "1/38467860",
"write_lag": "0 years 0 mons 0 days 0 hours 0 mins 0.000747 secs",
"flush_lag": "0 years 0 mons 0 days 0 hours 0 mins 0.000755 secs",
"replay_lag": "0 years 0 mons 0 days 0 hours 0 mins 0.000824 secs",
"sync_priority": "0",
"sync_state": "async"
},
"replicationSlotStat": {
"slot_name": "ha_replication",
"plugin": "null",
"slot_type": "physical",
"datoid": "null",
"database": "null",
"temporary": "false",
"active": "true",
"active_pid": "1077",
"xmin": "null",
"catalog_xmin": "null",
"restart_lsn": "1/38467860",
"confirmed_flush_lsn": "null"
},
"laggingBytes": 0,
"walKeepSegments": 16,
"isInRecovery": false,
"walSegments": 21,
"myState": 1
}
},
"secondaryStatus": {
"nodeState": "OPERATIONAL",
"timestamp": 1583961628153103,
"replicationStatus": {
"replicationReceiverStat": {
"pid": "46115",
"status": "streaming",
"receive_start_lsn": "1/2E000000",
"receive_start_tli": "11",
"received_lsn": "1/3840D7E4",
"received_tli": "11",
"last_msg_send_time": "2020-03-11 14:20:28.120833",
"last_msg_receipt_time": "2020-03-11 14:20:28.122192",
"latest_end_lsn": "1/3840D7E4",
"latest_end_time": "2020-03-11 14:20:28.120833",
"slot_name": "ha_replication",
"sender_host": "10.6.0.43",
"sender_port": "5432",
"conninfo": "user\u003dpostgres_replication password\u003d******** dbname\u003dreplication host\u003d10.6.0.43 port\u003d5432 fallback_application_name\u003dwalreceiver sslmode\u003dverify-full sslcompression\u003d0 target_session_attrs\u003dany"
},
"walKeepSegments": 16,
"isInRecovery": true,
"walSegments": 13,
"myState": 2
}
},
"localNodeStatus": {
"nodeState": "OPERATIONAL",
"timestamp": 1583961636902324,
"replicationStatus": {
"replicationStat": {
"pid": "1077",
"usesysid": "18666",
"usename": "postgres_replication",
"application_name": "walreceiver",
"client_addr": "10.6.0.42",
"client_hostname": "null",
"client_port": "54351",
"backend_start": "2020-03-11 12:53:58.237866",
"backend_xmin": "null",
"state": "streaming",
"sent_lsn": "1/38467860",
"write_lsn": "1/38467860",
"flush_lsn": "1/38467860",
"replay_lsn": "1/38467860",
"write_lag": "0 years 0 mons 0 days 0 hours 0 mins 0.000747 secs",
"flush_lag": "0 years 0 mons 0 days 0 hours 0 mins 0.000755 secs",
"replay_lag": "0 years 0 mons 0 days 0 hours 0 mins 0.000824 secs",
"sync_priority": "0",
"sync_state": "async"
},
"replicationSlotStat": {
"slot_name": "ha_replication",
"plugin": "null",
"slot_type": "physical",
"datoid": "null",
"database": "null",
"temporary": "false",
"active": "true",
"active_pid": "1077",
"xmin": "null",
"catalog_xmin": "null",
"restart_lsn": "1/38467860",
"confirmed_flush_lsn": "null"
},
"laggingBytes": 0,
"walKeepSegments": 16,
"isInRecovery": false,
"walSegments": 21,
"myState": 1
}
},
"primary": {
"nodeState": "OPERATIONAL",
"nodeRole": "PRIMARY",
"machineID": "245acab3-0e8d-43a5-9c12-e3554ee64c85",
"hostname": "azure-ha1.com",
"address": "10.6.0.43",
"version": "7.1.0"
},
"secondary": {
"nodeState": "OPERATIONAL",
"nodeRole": "SECONDARY",
"machineID": "4af03ba9-16ae-40a0-a3e0-e695c04d5868",
"hostname": "azure-ha2.com",
"address": "10.6.0.42",
"version": "7.1.0"
},
"quorum": {
"nodeState": "OPERATIONAL",
"statusMessage": "Quorum device is online",
"nodeRole": "QUORUM",
"machineID": "7359d12a-31bc-4776-9304-d9ce2f10afc0",
"hostname": "azure-dcd.com",
"address": "10.6.0.40",
"version": "7.1.0"
},
"pcsStatus": {
"enabled": true,
"config": {
"primaryIPAddress": "10.6.0.43",
"secondaryIPAddress": "10.6.0.42",
"quorumIPAddress": "10.6.0.40"
},
"haMonitorStatus": {
"role": "primary",
"isConfigured": true,
"isRunning": true,
"isQuorate": true,
"clusterHealth": "healthy",
"nodeHealth": "healthy",
"fenceAction": "none",
"shouldFailOver": false,
"canFailOver": false,
"onlineNodes": [
"ha-primary",
"ha-quorum",
"ha-secondary"
],
"offlineNodes": [],
"primaryResources": [
"RESTJAVAD",
"SEARCHMOND",
"WEBD"
],
"secondaryResources": [
"RESTJAVAD",
"SEARCHMOND",
"WEBD"
],
"failureCauses": [],
"hasFailedOver": false
}
}
},
"syncIntervalSeconds": 30,
"minVarSpaceAvailable": 10,
"lastSuccessfulSync": "2020-03-11T14:20:16.636-07:00",
"nextSync": "2020-03-11T14:20:26.477-07:00",
"skipCorosync": true,
"generation": 87,
"lastUpdateMicros": 1583956611131551,
"kind": "shared:failover-state:failoverstate",
"selfLink": "https://localhost/mgmt/shared/failover-state"
}
Response if HA is in manual failover mode¶
The JSON in the response can look similar to the following example.
HTTP/1.1 200 OK
{
"systemMode": "HA",
"nodeRole": "PRIMARY",
"primaryMachineID": "96b850e6-b9f4-4f9f-8e10-c24adb418c3f",
"secondaryMachineID": "d491813f-d5e3-4f15-948a-8cfaae3273b9",
"useQuorumHA": false,
"haStatus": {
"currentState": "OPERATIONAL",
"currentStatusTimestamp": 1583876923332207,
"peerAvailable": true,
"primaryStatus": {
"nodeState": "OPERATIONAL",
"timestamp": 1583876919675780,
"replicationStatus": {
"replicationStat": {
"pid": "6034",
"usesysid": "103515",
"usename": "postgres_replication",
"application_name": "walreceiver",
"client_addr": "10.10.10.12",
"client_hostname": "null",
"client_port": "55888",
"backend_start": "2020-03-09 20:28:29.649389",
"backend_xmin": "null",
"state": "streaming",
"sent_lsn": "2/4B7E818",
"write_lsn": "2/4B7E818",
"flush_lsn": "2/4B7E818",
"replay_lsn": "2/4B7E818",
"write_lag": "0 years 0 mons 0 days 0 hours 0 mins 0.006097 secs",
"flush_lag": "0 years 0 mons 0 days 0 hours 0 mins 0.006102 secs",
"replay_lag": "0 years 0 mons 0 days 0 hours 0 mins 0.00611 secs",
"sync_priority": "0",
"sync_state": "async"
},
"replicationSlotStat": {
"slot_name": "ha_replication",
"plugin": "null",
"slot_type": "physical",
"datoid": "null",
"database": "null",
"temporary": "false",
"active": "true",
"active_pid": "6034",
"xmin": "null",
"catalog_xmin": "null",
"restart_lsn": "2/4B7E818",
"confirmed_flush_lsn": "null"
},
"laggingBytes": 0,
"walKeepSegments": 16,
"isInRecovery": false,
"walSegments": 22,
"myState": 1
}
},
"secondaryStatus": {
"nodeState": "OPERATIONAL",
"timestamp": 1583876912585625,
"replicationStatus": {
"replicationReceiverStat": {
"pid": "11189",
"status": "streaming",
"receive_start_lsn": "1/98000000",
"receive_start_tli": "5",
"received_lsn": "2/4B6E6BC",
"received_tli": "5",
"last_msg_send_time": "2020-03-10 14:48:31.244643",
"last_msg_receipt_time": "2020-03-10 14:48:31.088299",
"latest_end_lsn": "2/4B6E6BC",
"latest_end_time": "2020-03-10 14:48:31.244643",
"slot_name": "ha_replication",
"sender_host": "10.10.10.15",
"sender_port": "5432",
"conninfo": "user=postgres_replication password=******** dbname=replication host=10.10.10.15 port=5432 fallback_application_name=walreceiver sslmode=verify-full sslcompression=0 target_session_attrs=any"
},
"walKeepSegments": 16,
"isInRecovery": true,
"walSegments": 20,
"myState": 2
}
},
"localNodeStatus": {
"nodeState": "OPERATIONAL",
"timestamp": 1583876919675780,
"replicationStatus": {
"replicationStat": {
"pid": "6034",
"usesysid": "103515",
"usename": "postgres_replication",
"application_name": "walreceiver",
"client_addr": "10.10.10.12",
"client_hostname": "null",
"client_port": "55888",
"backend_start": "2020-03-09 20:28:29.649389",
"backend_xmin": "null",
"state": "streaming",
"sent_lsn": "2/4B7E818",
"write_lsn": "2/4B7E818",
"flush_lsn": "2/4B7E818",
"replay_lsn": "2/4B7E818",
"write_lag": "0 years 0 mons 0 days 0 hours 0 mins 0.006097 secs",
"flush_lag": "0 years 0 mons 0 days 0 hours 0 mins 0.006102 secs",
"replay_lag": "0 years 0 mons 0 days 0 hours 0 mins 0.00611 secs",
"sync_priority": "0",
"sync_state": "async"
},
"replicationSlotStat": {
"slot_name": "ha_replication",
"plugin": "null",
"slot_type": "physical",
"datoid": "null",
"database": "null",
"temporary": "false",
"active": "true",
"active_pid": "6034",
"xmin": "null",
"catalog_xmin": "null",
"restart_lsn": "2/4B7E818",
"confirmed_flush_lsn": "null"
},
"laggingBytes": 0,
"walKeepSegments": 16,
"isInRecovery": false,
"walSegments": 22,
"myState": 1
}
},
"primary": {
"nodeState": "OPERATIONAL",
"nodeRole": "PRIMARY",
"machineID": "96b850e6-b9f4-4f9f-8e10-c24adb418c3f",
"hostname": "bigiq106.f5net.com",
"address": "10.10.10.15",
"version": "7.1.0"
},
"secondary": {
"nodeState": "OPERATIONAL",
"nodeRole": "SECONDARY",
"machineID": "d491813f-d5e3-4f15-948a-8cfaae3273b9",
"hostname": "bigiq165.f5.com",
"address": "10.10.10.12",
"version": "7.1.0"
},
"pcsStatus": {
"enabled": false,
"config": {},
"haMonitorStatus": {
"role": "none",
"isConfigured": false,
"isRunning": false,
"isQuorate": false,
"clusterHealth": "unconfigured",
"nodeHealth": "unconfigured",
"fenceAction": "primary",
"shouldFailOver": false,
"canFailOver": false,
"onlineNodes": [],
"offlineNodes": [],
"primaryResources": [],
"secondaryResources": [],
"failureCauses": [
"unconfigured - Cluster host names are not set",
"unconfigured - Cluster is not configured",
"unconfigured - Pacemaker daemon is not running",
"unconfigured - Primary node is offline",
"unconfigured - Secondary node is offline",
"unconfigured - Quorum node is offline",
"unconfigured - Restjavad is down on primary node",
"unconfigured - Webd is down on primary node",
"unconfigured - Searchmon is down on primary node",
"unconfigured - Restjavad is down on secondary node",
"unconfigured - Webd is down on secondary node",
"unconfigured - Searchmon is down on secondary node",
"unconfigured - This node does not have quorum"
],
"hasFailedOver": false
}
}
},
"syncIntervalSeconds": 30,
"minVarSpaceAvailable": 10,
"lastSuccessfulSync": "2020-03-10T14:48:15.459-07:00",
"nextSync": "2020-03-10T14:48:25.313-07:00",
"generation": 4,
"lastUpdateMicros": 1583800225439257,
"kind": "shared:failover-state:failoverstate",
"selfLink": "https://localhost/mgmt/shared/failover-state"
}
PATCH to change the file sync interval¶
The following example shows how a PATCH request can be sent to a BIG-IQ to change change the file sync interval
PATCH https://192.0.2.0/mgmt/shared/failover-state
The JSON in the body of the PATCH request can look similar to the following example.
{
"syncIntervalSeconds": 35
}
Response¶
The JSON in the response can look similar to the following example.
HTTP/1.1 200 OK
{
"syncIntervalSeconds": 35
}