2. Start the HA server
Please keep in mind that plain HTTP should only be used for testing and debugging. It can be intercepted and modified on the fly to DDoS your cluster, cause server reboots, and lock up the whole HA system which will require a lot of work to revert back to a normal state. Always use some kind of encryption and avoid plain HTTP in production.
Debug mode
To start the HA server in a debug mode, execute the command below:
Debug mode gives us the ability to test the HA within our current deployment:
- Is the network connection stable enough to support HA?
- Are the servers behaving in a predictable manner?
- Would there be enough disk and network bandwidth to support the replication and HA at the same time?
- Etc
In the debug mode we can answer all these questions, because any action that would have been applied in the production cluster, would be simply logged instead of the immediate execution:
- HA watchdog will gracefully exit, and not reboot the system if
hoster_rest_api
service has crashed - HA cluster manager will log the cluster failover actions, and not execute them (failing over the VMs from an offline host to a healthy host)
- Etc
Check the HA logs:
It will give you an overview of what currently happens in the cluster:
Sep 26 22:13:39 IM-HYP01 HOSTER_HA_REST[76537]: INFO: node failover time is: 20 seconds
Sep 26 22:13:39 IM-HYP01 HOSTER_HA_REST[79124]: DEBUG: hoster_rest_api started in DEBUG mode
Sep 26 22:13:39 IM-HYP01 HOSTER_HA_REST[80879]: DEBUG: hoster_rest_api service start-up
Sep 26 22:13:39 IM-HYP01 HOSTER_HA_WATCHDOG[82430]: DEBUG: ha_watchdog service start-up
Sep 26 22:13:39 IM-HYP01 HOSTER_HA_REST[83634]: DEBUG: ha_watchdog started in DEBUG mode
Sep 26 22:13:46 IM-HYP01 HOSTER_HA_REST[87767]: INFO: registered a new node: IM-HYP03
Sep 26 22:13:48 IM-HYP01 HOSTER_HA_REST[89926]: INFO: registered a new node: IM-HYP02
Sep 26 22:13:49 IM-HYP01 HOSTER_HA_REST[92037]: INFO: registered a new node: IM-HYP01
Sep 26 22:13:49 IM-HYP01 HOSTER_HA_REST[92641]: SUCCESS: joined the candidate: IM-HYP01
Sep 26 22:13:49 IM-HYP01 HOSTER_HA_REST[94813]: SUCCESS: joined the candidate: IM-HYP02
Sep 26 22:13:49 IM-HYP01 HOSTER_HA_REST[97049]: SUCCESS: joined the candidate: IM-HYP03
Check the API logs:
It will give you an overview of the latest API endpoint connections (usually /api/v1/ha/ping
and /api/v1/ha/register
when it comes to HA):
2023-09-27_14-12-32 || 10.5.199.81:56921 || 200 || POST || /api/v1/ha/ping || 20.323µs || bytesSent: 18
2023-09-27_14-12-32 || 10.5.199.85:43967 || 200 || POST || /api/v1/ha/ping || 13.159µs || bytesSent: 18
2023-09-27_14-12-32 || 10.5.199.33:35179 || 200 || POST || /api/v1/ha/ping || 15.632µs || bytesSent: 18
2023-09-27_14-12-34 || 10.5.199.81:63889 || 200 || POST || /api/v1/ha/ping || 20.294µs || bytesSent: 18
2023-09-27_14-12-34 || 10.5.199.85:24646 || 200 || POST || /api/v1/ha/ping || 11.942µs || bytesSent: 18
2023-09-27_14-12-34 || 10.5.199.33:25478 || 200 || POST || /api/v1/ha/ping || 8.734µs || bytesSent: 18
2023-09-27_14-12-36 || 10.5.199.81:35923 || 200 || POST || /api/v1/ha/ping || 25.108µs || bytesSent: 18
2023-09-27_14-12-36 || 10.5.199.85:13987 || 200 || POST || /api/v1/ha/ping || 11.294µs || bytesSent: 18
2023-09-27_14-12-36 || 10.5.199.33:55854 || 200 || POST || /api/v1/ha/ping || 7.71µs || bytesSent: 18
2023-09-27_14-12-38 || 10.5.199.81:31933 || 200 || POST || /api/v1/ha/ping || 22.195µs || bytesSent: 18
2023-09-27_14-12-38 || 10.5.199.85:28218 || 200 || POST || /api/v1/ha/ping || 12.353µs || bytesSent: 18
2023-09-27_14-12-38 || 10.5.199.33:38328 || 200 || POST || /api/v1/ha/ping || 14.97µs || bytesSent: 18
2023-09-27_14-12-40 || 10.5.199.81:37120 || 200 || POST || /api/v1/ha/ping || 24.954µs || bytesSent: 18
2023-09-27_14-12-40 || 10.5.199.33:35276 || 200 || POST || /api/v1/ha/ping || 8.544µs || bytesSent: 18
Production mode
After observing the debug
mode for some period of time, you can move to production (if no issues were found in the process, of course):
You can still view the same logs, and observe the changes in the cluster that way, but now if there are any issues spotted - a problematic node will be fenced from the rest of the cluster (rebooted, and waiting for manual interaction from you, to add it back to the cluster), and VMs/Jails will be automatically migrated to other (healthy) members.