Disaster Recovery Guide

This guide provides instructions for configuring and managing the F5 Insight Disaster Recovery (DR) feature.

What is F5 Insight DR?

The F5 Insight Disaster Recovery feature ensures your system keeps running even if one server fails. It works by maintaining two synchronized systems:

  1. Primary instance: Handles all active operations.
  2. Standby instance: A backup system that continuously syncs data from the primary.

If the primary fails or requires maintenance, you can “promote” the standby to take over as the primary. After fixing the first system, you can perform a “failback” to restore it to normal operation.

Important

Before promoting the standby, ensure the primary is fenced (not actively scraping BIG-IP devices). This prevents problems caused by both instances trying to process data at the same time.


How DR Works

  1. DR Status and User Interface
  • Primary Node: Shows “Active” and “Read/Write” statuses in green. You can make and apply changes here.
  • Standby Node: Shows “Standby” and “Read-only” in gray. You can not make changes on the standby node.
  • DR Pair Status: Displays “Operational Status: Active” in green when everything is running well.

When you access the standby node, you will see a banner saying: “This node is in standby mode. Configuration changes must be made on the primary node.” This visual indicator confirms you are viewing the backup instance rather than the active system.

  1. Data Synchronization

F5 Insight automatically copies all data (for example, configuration changes, device information, performance metrics) from the primary instance to the standby instance in real time.

  1. Security with WireGuard Tunnel

All data replication is secured through a WireGuard tunnel:

  • This is an encrypted connection between the primary and standby nodes.
  • It uses UDP port 51820 and requires minimal system resources.

Before You Start

Infrastructure Requirements

  • Two F5 Insight instances with the same software version. The instances must have network connectivity to each other, and you must have SSH access with sudo privileges on both systems.
  • Ensure these network ports are open between the instances:
    • UDP Port 51820 for the WireGuard tunnel (critical).
    • TCP Port 22 for SSH (initial setup only).
    • TCP Port 443 for the HTTPS UI.
  • Administrator credentials are required for both instances for the web UI and SSH.

Checklist Before Setting Up DR

  • Both instances are operational.
  • Password-less SSH setup between the primary and standby instances is completed.

Setting Up DR

Prepare the Instances

Verify the system is working on both instances. Write down their IP addresses, for example:

  • Primary: 10.145.18.134
  • Standby: 10.145.16.150
F5 Insight Login Page

Configure SSH Access

  1. Generate SSH key on the primary if it does not already exist.
  2. Copy the public key to the standby instance
  3. Repeat the process from standby to primary.

Set Up the WireGuard Tunnel

  1. Log in to the primary instance via SSH.
  2. Navigate to /opt/f5insight/scripts.
  3. Run the WireGuard setup script to: - Generate keys. - Establish the tunnel. - Configure iptable rules.

Once complete, a confirmation message will display.

Wireguard Setup

Set Up DR in the UI

  1. Access the primary instance’s web UI.

  2. Navigate to Settings → Disaster Recovery.

  3. Click Configure on the “Standby Peer Node” card.

  4. Provide details: - Primary (Hostname and IP): Example: Hostname: primary-node, IP: 10.145.18.134. - Standby (Hostname and IP): Example: Hostname: standby-node, IP: 10.145.16.150.

    DR Configuration Save DR
  5. Click Save Configuration.

This may take a few minutes while initial data synchronization occurs.

Verify DR Configuration

Go back to the DR settings page:

  • Primary Node: Should show “Active” and “Read/Write.”
  • Standby Node: Should show “Standby” and “Read-only.”
  • The DR Pair Status will show “Active” with a green indicator.
Verify DR Configuration

To verify standby configuration, open a browser and navigate to your standby instance IP address. The standby mode banner should be visible: “This node is in standby mode. Configuration changes must be made on the primary node.” on all settings pages.

Standby Mode

Verify Data Synchronization

To verify data replication, make a configuration change on the primary instance, such as adding a device or modifying a setting. Then access the standby UI and refresh the page to verify the change has been synchronized.

A BIG-IP device with name bigip/bigip-africa-central is added to primary.

Verify Data Synchronization Verify Data Synchronization

All the dashboards and Homepage on secondary should start showing data for this device.

Homepage Screen Homepage Dashboard

Disaster Recovery Operations

Failover (Promoting Standby to Primary)

If the primary instance fails or needs maintenance, promote the standby to act as the primary. Follow below steps to Standby to Primary:

  1. Scale down 0 on the primary. Run the following command to scale OTEL to 0:

    sudo kubectl scale deployment otel-collector --replicas=0 -n f5-insight
    

    When the primary instance fails or requires maintenance, promote the standby instance to assume the primary role.

  2. Access the standby instance (https://<standby-ip>).

    Disaster Recovery Disaster Recovery Confirmation Message
  3. Navigate to Settings → Disaster Recovery.

  4. Click Promote Node to Primary on the Standby Node card.

    • A confirmation message will display: “Switching node roles in progress.”
    Disaster Recovery Promote Node to Primary Switch Role Complete
  5. Log back into the new primary instance and verify status.

Post-promotion Steps:

  • Update BIG-IP log configurations to redirect logs to the new primary.
  • You can now make changes in the new primary instance.

Failback (Restoring the Original Primary)

After fixing the original primary instance:

  1. Access the original primary (https://<primary-ip>).

  2. Navigate to Settings → Disaster Recovery.

  3. Click Demote Node to Standby on the Primary Node card. - A confirmation message will display: “Switching node roles in progress.”

    • A confirmation dialog appears:
    Demote Node to Standby Demote Node to Standby Complete
    • After logging back in, verify the standby mode banner is now visible, confirming the instance has returned to standby mode.
    Disaster Recovery Pair Status Big IP Settings
  4. Log back in to verify the system returns to standby mode.


Conclusion

F5 Insight’s DR feature ensures your system stays functional even during failures. Its synchronized instances, encrypted data replication, and easy UI-based controls make it simple for teams to maintain business continuity. Regular maintenance and familiarity with the failover and failback processes will keep your system running smoothly.