Backup and restore guide¶
Snapshots provide a way for the state of VNFM HA cluster. A VNFM snapshot should be done on a daily basis (suggest in an off peak time) and can be automated using the REST API as an alternative to an operator manually running the snapshot as shown here in this user guide.
Backing up the virtual machine that the F5 VNF managers run on should be done at regular intervals, this would be dictated by a backup policies and would involve daily, weekly, monthly and yearly backups as required. The method for backing up the F5 VNF Manager virtual machines falls outside the scope of this document.
Snapshots¶
Snapshots of the VNFM HA cluster should be taken at regular intervals (suggest daily), and can be automated through the REST-based Service API or done manually by an operator using the Snapshots or VNFM CLI.
Create snapshot¶
Create snapshot:
Code Block 1 CLI
vnfm snapshots create —include-metrics —include-credentials SNAPSHOT_ID
Code Block 2 REST
curl -X PUT --header "Tenant: <manager-tenant>" -u <manager-username>:<manager-password> "http://<manager-ip>/api/v3.1/snapshots/<snapshot-id>" Parameters specification available in the :ref:`VNFM REST API <CreateSnapshot>`.
Download snapshot:
Code Block 3 CLI
vnfm snapshots download [OPTIONS] SNAPSHOT_ID
Code Block 4 REST
curl -X GET --header "Tenant: <manager-tenant>" -u <manager-username>:<manager-password> "http://<manager-ip>/api/v3.1/snapshot/<snapshot-id>/archive" > <snapshot-archive-filename>.zip
Parameters specification available in the VNFM REST API.
Apply snapshot¶
Upload snapshot
Code Block 5 CLI
vnfm snapshots upload [OPTIONS] SNAPSHOT_PATH
Code Block 6 REST
curl -X PUT --header "Tenant: <manager-tenant>" -u <manager-username>:<manager-password> "http://<manager-ip>/api/v3.1/snapshots/archive?snapshot_archive_url=http://url/to/archive.zip" Parameters specification available in the :ref:`VNFM REST API <UpldSnapshot>`.
Restore snapshot
Code Block 7 CLI
vnfm snapshots restore [OPTIONS] —tenant-name <TEXT> SNAPSHOT_ID
Code Block 8 REST
curl -s -X POST --header "Content-Type: application/json" --header "Tenant: <manager-tenant>" -u <manager-username>:<manager-password> -d '{"tenant_name": "<manager-tenant>", "recreate_deployments_envs": true, "force": false, "restore_certificates": false, "no_reboot": false}' "http://<manager-ip>/api/v3.1/snapshots/<snapshot-id>/restore" Parameters specification available in the :ref:`VNFM REST API <RestoreSnapshot>`.
Failure Recovery¶
You can use the following steps to recover failed nodes for specific situations.
Whole cluster down or working incorrectly¶
- Save /home/admin/files/ssl/* files. Save /etc/cloudify/ssl/* files.
- Teardown managers.
- Install fresh managers with existing certificates in /home/admin/config.yaml.
- Create and join cluster.
- Apply latest working version snapshot on active manager.
One manager cluster node down¶
- Remove manager from the cluster.
- Destroy manager.
- Bootstrap fresh manager.
- Join existing cluster.
Effect: Healthy manager cluster
Active manager node down¶
- Other healthy manager from the cluster automatically becomes active manager.
- Investigate error:
- Either:
- Fix problem
- Destroy manager.
- Install manager.
- Join cluster.
Effect: Healthy manager cluster
Split brain¶
A split brain situation happens when for a while there is no connectivity between managers. Then each of them thinks that other managers are unhealthy and become master. After connectivity is back, the master becomes the only one in the cluster. It’s chosen based on the newest version of PostgreSQL database. All data from other managers will be synced with the active one and others will become standbys. All data/installed deployments/plugins will get lost.
What’s Next?