Index
- What is High-Availability?
- What are the Primary and Secondary nodes?
- How can I check the High-Availability status of my servers?
- What is the Data-Store and how does it facilitate fail-over?
- What is the Data Sync Link?
- What is VHID?
- What situations will trigger a fail-over?
- How quick is the fail-over process?
- What are the system requirements for High-Availability installations?
- How do I configure the network for Statseeker?
- How to access Statseeker web interface when it is operating in High-Availability mode?
- How do I install and license a High-Availability deployment?
- Should I still run backups if I'm using High-Availability?
- What happens if the Secondary becomes unavailable?
- What happens if the Data Sync Link goes down?
- What happens if the Secondary's Main Interface goes down?
- What happens if the Primary's Main Interface goes down?
- What happens if both Primary and Secondary Main Interfaces go down?
- What happens if the Main Interfaces remain active but the Servers cannot communicate via these Interfaces?
- How do you switch Primary/Secondary roles?
- Is there and Data Loss during a fail-over?
- How do I upgrade a High-Availability Installation?
- Can I run a Primary without a Secondary?
- How do I recover a Primary if it goes down?
- How many Secondarys can I have?
- How do I recover a Secondary if it goes down?
- How is time synchronized between Primary and Secondary?
- Can you install a High-Availability image and license it as a standard Statseeker server?
- Can Statseeker High-Availability be run in a Virtual Environment?
- What type of data is sync'd between Primary and Secondary?
- What is the recommended Data Sync Link Interface connection speed?
What is High-Availability?
Statseeker's High-Availability platform aims to ensure that the service remains available, even when faults occur with the Statseeker server. This involves eliminating a 'single point of failure' situation by having a second (Secondary) server that is ready to take over if the main (Primary) Server fails due to network issues, hardware failure, or other means.
Statseeker HA:
- Is a high-availability platform, the aim is to minimize the impact to Statseeker's services in the event of a failure
- Is Not a disaster recovery solution and should not be considered a process to return Statseeker to an operational state once the system has been rendered inoperative
What are the Primary and Secondary nodes?
The Primary is the Server that is currently running Statseeker. The Secondary is the server which is syncing data from the Primary node, and kept ready to activate, and take over the Statseeker service if the Primary becomes unavailable.
How can I check the High-Availability status of my servers?
The current High-Availability Status of the Statseeker servers can be checked from several locations:
Server's Main IP Address
Navigating to the main IP address of each server (not the shared virtual IP address), allows you to access the High-Availability Status page. This page contains information about the configuration and status of both servers and can be useful in troubleshooting High-Availability server connectivity.
Administration Tool
See Administration Tool > Statseeker Administration > High Availability
This displays a page containing the status and configuration details of both High-Availability servers.
Console Alerts
HA state change alerts, and other notifications, are displayed in the top-right of the screen. These alerts use an icon indicating their status:
- : an information alert
- : a warning alert
- : an error alert
The current High-Availability status will be a constant information alert.
If the servers get out of sync, then a warning alert will be generated.
If a fail-over occurs or the server is unreachable then an error alert will be generated.
What is the Data-Store and how does it facilitate fail-over?
Each server has a specialized file system (Data-Store) dedicated to store data. Data is shared between the two servers but only one Data-Store will be active (available to query) at a time.
In the event of a fail-over the Secondary will self-promote to Primary, acquire the virtual IP and start populating its data-store. When the previous Primary is brought back up it will assume the role of the Secondary and synchronise its data-store to acquire the data that was populated in its absence.
What is the Data Sync Link?
The data sync link is a physical link between the Primary and Secondary nodes which should be configured as a separate network when implementing Statseeker High-Availability. The Data-Stores use this link to sync their data.
What is VHID?
Virtual Host ID is the ID associated with the Virtual IP Address and is used when determining which node should serve the Virtual IP Address.
If there is more than a single set of High-Availability Statseeker servers operating in the same network, then they will each need a unique VHID.
Note: ensure that the Statseeker VHID is unique on the network and does not conflict with the High-Availability set up of other FreeBSD based services, such as PFSense or FreeNAS, that may be running on the same network.
What situations will trigger a fail-over?
A fail-over refers to the process where the Secondary node takes over from the current Primary node, becoming the new Primary node. There are two types of fail-overs:
- Soft Fail-Overs - the Primary and Secondary nodes change roles while both remain active and co-ordinate the switch. This process occurs when the Primary is shut down or restarted cleanly, or when a role switch is initiated via ssadmin command line tool.
- Hard Fail-Overs - the Secondary forcefully becomes the Primary as the curent Primary is unavailable due to a hardware or power failure
Soft Fail-Overs Process
- The Primary releases the Virtual IP Address, and the Secondary takes the Virtual IP Address
- The Primary shuts down Statseeker and fully flushes the Data-Store
- The Primary releases the Data-Store, and the Secondary takes the Data-Store
- The Secondary starts up Statseeker and is considered the new Primary
Hard Fail-Over Process
- The Secondary detects that the Primary is not present
- The Secondary takes over the Virtual IP Address
- The Secondary takes over the Data-Store
- The Secondary starts up Statseeker and is considered the new Primary
How quick is the fail-over process?
The Primary continually sends notifications to the Secondary, via the Main Interface, indicating that it is still present and active. If the Secondary has not received a notification from the Primary in the previous 3 seconds it considers the Primary to be unavailable and initiates the hard fail-over process.
Once triggered, the fail-over process takes about 2 minutes to complete with some minor variance due the size of the installation and the disk speed of the devices concerned.
What are the system requirements for High-Availability installations?
Statseeker High-Availability mode has several additional hardware requirements to a standard install:
- Two servers are required, each meeting the standard Statseeker System Requirements
- Each Server requires at least 2 physical interfaces
- A minimum HDD size of 120GB
- The HDD on the Secondary system must be at least as large as that on the Primary
See Version 5.x System Requirements for more details.
How do I configure the network for Statseeker?
The two servers should reside on the same physical network and adhere to the guidelines illustrated below.
How do I access the Statseeker Web Interface when it is operating in High-Availability mode?
To access the Statseeker Web Interface on a High-Availability Setup you need to enter the Virtual IP Address into your browser rather than the Main IP Address of the server.
This means that in the event of a fail-over, Statseeker will still be accessible from the same IP Address.
How do I install and license a High-Availability deployment?
The Statseeker High-Availability installation process differs from that of a standard Statseeker installation.
- Install Statseeker on the Primary server
- Install Statseeker on the Secondary server
- License the installation and perform standard Statseeker Setup by accessing the web interface on virtual IP shared by the High-Availability servers
It is possible to run Statseeker on the Primary only, however, if a Secondary is later added then Statseeker must be notified and a new license installed. This is the same procedure if one server fails and needs to be replaced. To arrange the download of the new license contact keys@statseeker.com.
For details instruction on deploying an HA solution, see the High-Availability Installation Guide.
Should I still run backups if I'm using High-Availability?
Yes, it is always recommended to keep backups.
Backups are way to provide a snapshot of your system which can be reverted to in the event of a critical failure or corruption. Our HA system is better thought of as a Raid1 configuration, if your Statseeker server becomes unreachable, another will take its place.
Statseeker HA:
- Is a high-availability platform, the aim is to minimize the impact to Statseeker's services in the event of a failure
- Is Not a disaster recovery solution and should not be considered a process to return Statseeker to an operational state once the system has been rendered inoperative
What happens if the Secondary becomes unavailable?
If the Secondary goes down:
- Statseeker will continue to run as normal on the Primary, but Statseeker is no longer considered Highly Available
- An error level console alert will be displayed detailing that the Secondary is not connected
- When the Secondary becomes available again it will sync any changes that have been made on the Primary while the Secondary was unavailable
- A warning level console alert will be displayed during the sync process detailing that the Secondary is connected but is still synchronizing. Statseeker is not considered Highly Available until this synchronization is complete.
What happens if the Data Sync Link goes down?
If the two Statseeker servers can't communicate via the Data Interface, then the Data Sync Link is considered down. In this instance:
- No data will be copied to the Secondary while the link is down
- An error level console alert will be displayed detailing that the Secondary is not connected
- When the Secondary becomes available again it will sync any changes that have been made on the Primary while the Secondary was unavailable
- A warning level console alert will be displayed during the sync process detailing that the Secondary is connected but is still synchronizing. Statseeker is not considered Highly Available until this synchronization is complete.
- If the Primary becomes unavailable before the sync process completes a fail-over will occur but any un-sync'd data will be permanently lost
What happens if the Secondary's Main Interface goes down?
If the Main Interface on the Secondary goes down this does not affect the running of Statseeker or the synchronization of the Data-Store. However, while it is down no fail-over can occur.
If the Primary goes down:
- The Virtual IP Address and Statseeker Web Interface will be unreachable
- A warning level console alert will be displayed detailing the issue
- Statseeker is not considered Highly Available until the Main Interface is active again
What happens if the Primary's Main Interface goes down?
If the Main Interface on the Primary goes down:
- A fail-over will occur and the Secondary server will become the new Primary
- If the Data Sync Link is still active, then this will be a Soft Fail-Over otherwise, this will be a Hard Fail-Over
What happens if both Primary and Secondary Main Interfaces go down?
- Statseeker will be unavailable
- Statseeker will be shut down on the Primary and neither server will claim the Virtual IP or Data-Store
- The first server to recover its Main Interface will become the Primary
What happens if the Main Interfaces remain active but the Servers cannot communicate via these Interfaces?
This may occur as a result of a networking error or a mis-configuration on the servers.
In this situation:
- The Secondary, not detecting the Primary, will attempt to claim the Virtual IP Address start up as the Primary
- Accessing Statseeker via the Virtual IP Address may be interrupted during this time
- If the Data Sync Link is still active, then the Primary will not release the Data-Store and the Secondary will not start up Statseeker
- When communication via the Main link is re-established, it is uncertain as to which will claim the Virtual IP Address:
- If the Primary reclaims the Virtual IP Address, the Secondary will stop waiting for the Data-Store and revert to being the Secondary again
- If the Secondary claims the Virtual IP Address, a Soft Fail-Over will occur
- If the Data Sync Link is also down, then the Secondary will forcefully take the Data-Store and both servers will believe they are the Primary, this is known as a split-brain situation:
- Once the servers can communicate again, one server will remain the Primary and the other will become the Secondary
- Any changes that were made on the Secondary after the split-brain situation will be lost and the changes made to the Primary will be kept
Note: that this may not be the same Primary / Secondary as before the communication was interrupted.
- The Secondary will need to resync the entire Data-Store from the Primary
How do you switch Primary/Secondary roles?
The roles of the two Statseeker servers can be switched from either server via the ssadmin command line tool.
From the tool select High-Availability > Force HA Role Switch to initiate a Soft Fail-Over.
Is there and Data Loss during a fail-over?
Data loss is possible during a fail-over event but the type of data lost and extent of the loss depends on the circumstances.
Performance Data
Statseeker collects performance metrics every 60 seconds, in addition this collection also occurs whenever Statseeker starts up. A Statseeker start-up event occurs during fail-over when the Secondary becomes the new Primary. If this start-up event occurs within 60 seconds from the last Performance data collection, then no data is lost.
Note: if the performance metric is a counter and a single polling interval is missed, then it will be presented as two missed points on a graph. This is due to Statseeker requiring two valid collection points to calculate the deltas.
Configuration Changes
In the event of a Soft Fail-Over, any configuration changes that were successful on the Primary will be available on the Secondary. Performing a configuration change (e.g. disabling polling on a Device via the Web Interface) while the fail-over is occurring may return an error and need to be performed again after the fail-over.
In the case of a Hard Fail-Over, there is a chance that configuration changes that were verified as being successful on the Primary may not be available on the Secondary after the fail-over. This will not result in an inconsistent state, the change is either made or not made, and will only happen if the fail-over occurs immediately after a change is made, prior to the change being sync'd to the Secondary.
A Hard Fail-Over case is similar to a normal Statseeker installation that experiences a power cycle. The Operating System ensures transactional consistency of writes, but also buffers writes for performance reasons. With High-Availability the shared Data-Store also ensures in-order writes but adds another layer of buffering. This means the user may be notified of success before the changes are truly persistent across the nodes.
Events and Alerting
Events and Alerts should continue to work across a fail-over, however in some instances two Events or Alerts may be generated for the same incident.
Events are generated based on a difference with the previous state and this state is only periodically made persistent. In the case that an Event is generated, an Alert run, and then a fail-over occurs before the new state is made persistent, the Events and Alert may be generated again on the new server.
How do I upgrade a High-Availability Installation?
The upgrade process is very similar to the installation process and can be initiated via the web interface, see Administration Tool > Statseeker Administration > Software Upgrade, or via the command line interface by running ssadmin.
Either an upgrade or an installation image can be used.
Note: if upgrading from Statseeker version 4.0.x via ssadmin, the upgrade image should be uploaded to /home/statseeker/cdrom.
If a Secondary is present, it will be upgraded first, and then the Primary will be upgraded. If no Secondary is present when the Primary is upgraded, then the Secondary will need to be installed using the same Statseeker version as the upgraded Primary.
If the Operating System needs to be upgraded, Statseeker will be available and considered Highly Available during the installation portion of the upgrade. Once installation is complete, both servers will be rebooted, at which time Statseeker will not be available and will only be considered Highly Available when both servers have restarted.
If the Operating System is not being upgraded, Statseeker will remain available on the Primary while upgrading the Secondary, but is not considered Highly Available until the installation on the Secondary has finished. Once the Secondary is upgraded, Statseeker will be shut down on the Primary and will not be available until the installation on the Primary has completed.
Can I run a Primary without a Secondary?
The Primary can be run without a Secondary, but:
- The installation is not considered Highly Available and Statseeker will not be available if Primary becomes unavailable
- An error level console alert will be displayed while no Secondary is present
- If a Secondary is added to Statseeker at a later time, a new license will need to be generated before the Secondary can be used. Contact Statseeker at keys@statseeker.com for assistance.
How do I recover a Primary if it goes down?
Recovery of a Primary can be achieved by:
- Resolving the issues that caused the server to go down (e.g. power/hardware failure)
- Restarting the server
In the event of a corrupt file system or the server refusing to restart cleanly:
- Reinstall as a Secondary
- Use the ssadmin command line tool to switch the server's role back to Primary if desired
How many Secondarys can I have?
Currently Statseeker High-Availability supports a single Secondary installation.
How do I recover a Secondary if it goes down?
Recovery of a Secondary can be achieved by:
- Resolving the issues that caused the server to go down (e.g. power/hardware failure)
- Restarting the server
In the event of a corrupt file system or the server refusing to restart cleanly:
- Reinstall as a Secondary
How is time synchronized between Primary and Secondary?
Time is not automatically synchronized between the servers, however time synchronization is important to ensure consistency after a fail-over. It is recommended each server is configured to synchronize time from an external NTP server to achieve consistency.
Can you install a High-Availability image and license it as a standard Statseeker server?
No, Statseeker installations require the appropriate license.
Can Statseeker High-Availability be run in a Virtual Environment?
Yes, Statseeker HA can be installed to a virtual environment, see Statseeker High-Availability (HA) Installation Guides for details.
What type of data is sync'd between Primary and Secondary?
The types of information sync'd include:
- The Data-Store containing Statseeker's data and configuration
- Information regarding the current high-availability state
- System configuration changes made from ssadmin
The Data-Store is a partition mounted on the Statseeker users home directory (/home/statseeker), once mounted on the Primary, any changes in the directory will automatically be synchronized with the Secondary.
Generally, what is not shared is:
- Operating system changes or manually added packages
- Server specific system configuration changes
- System configuration changes made outside of ssadmin
- Time/Date synchronization
- Statseeker Custom Script packages are currently not installed on the Secondary and will not be available on fail-over
What is the recommended Data Sync Link Interface connection speed?
The Data Sync Link speed will limit the throughput when writing to the Data-Store, Statseeker recommends you use the fastest link available.
If using a GB link, then the effective Hard Disk throughput will be around 100MB/s. The minimum recommended specification is:
# of Interfaces | Minimum Link speed |
< 100k | 1 GB/s |
>100k | 10GB/s |
See Version 5.x System Requirements for more details.