The cluster properties file v4
Each node in a Failover Manager cluster has a properties file (by default, named efm.properties
) that contains the properties of the individual node on which it resides. The Failover Manager installer creates a file template for the properties file named efm.properties.in
in the /etc/edb/efm-4.<x>
directory.
After completing the Failover Manager installation, make a working copy of the template before modifying the file contents:
After copying the template file, change the owner of the file to efm:
Note
By default, Failover Manager expects the cluster properties file to be named efm.properties
. If you name the properties file something other than efm.properties
, modify the service script or unit file to instruct Failover Manager to use a different name.
After creating the cluster properties file, add or modify configuration parameter values as required. For detailed information about each property, see Specifying cluster properties.
The property files are owned by root. The Failover Manager service script expects to find the files in the /etc/edb/efm-4.<x>
directory. If you move the property file to another location, you must create a symbolic link that specifies the new location.
Note
All user scripts referenced in the properties file are invoked as the Failover Manager user.
Specifying cluster properties
You can use the properties listed in the cluster properties file to specify connection properties and behaviors for your Failover Manager cluster. Modifications to property settings are applied when Failover Manager starts. If you modify a property value, you must restart Failover Manager to apply the changes.
Property values are case sensitive. While Postgres uses quoted strings in parameter values, Failover Manager doesn't allow quoted strings in property values. For example, while you might specify an IP address in a Postgres configuration parameter as:
listen_addresses='192.168.2.47'
With Failover Manager, don't enclose the value in quotes:
bind.address=192.168.2.54:7800
Use the properties in the efm.properties
file to specify connection, administrative, and operational details for Failover Manager.
Legends: In the following table:
A
: Required on primary or standby nodeW
: Required on witness nodeY
: Yes
Property name | A | W | Default value | Comments |
---|---|---|---|---|
db.user | Y | Y | Username for the database. | |
db.password.encrypted | Y | Y | Password encrypted using 'efm encrypt'. | |
db.port | Y | Y | This value must be same for all the agents. | |
db.database | Y | Y | Database name. | |
db.service.owner | Y | Owner of $PGDATA dir for db.database. | ||
db.service.name | Required if running the database as a service. | |||
db.bin | Y | Directory containing the pg_controldata/pg_ctl commands such as '/usr/edb/asnn/bin'. | ||
db.data.dir | Y | Same as the output of query 'show data_directory;' | ||
db.config.dir | Same as the output of query 'show config_file;'. Specify if it's not the same as db.data.dir. | |||
jdbc.sslmode | Y | Y | disable | See the note. |
user.email | This value must be same for all the agents; can be left blank if using a notification script. | |||
from.email. | efm@localhost | Leave blank to use the default efm@localhost. | ||
notification.level | Y | Y | INFO | See the list of notifications. |
notification.text.prefix | ||||
script.notification | Required if user.email property is not used; both parameters can be used together. | |||
bind.address | Y | Y | Example: <ip_address>:<port> | |
external.address | Example: <ip_address/hostname> | |||
admin.port | Y | Y | 7809 | Modify if the default port is already in use. |
is.witness | Y | Y | See description. | |
local.period | Y | 10 | ||
local.timeout | Y | 60 | ||
local.timeout.final | Y | 10 | ||
remote.timeout | Y | Y | 10 | |
node.timeout | Y | Y | 50 | This value must be same for all the agents. |
encrypt.agent.messages | Y | Y | false | This value must be same for all the agents |
enable.stop.cluster | true | This value must be same for all the agents. Available in Failover Manager 4.2 and later. | ||
stop.isolated.primary | Y | true | This value must be same for all the agents. | |
stop.failed.primary | Y | true | ||
primary.shutdown.as.failure | Y | Y | false | |
update.physical.slots.period | Y | 0 | ||
ping.server.ip | Y | Y | 8.8.8.8 | |
ping.server.command | Y | Y | /bin/ping -q -c3 -w5 | |
auto.allow.hosts | Y | Y | false | |
stable.nodes.file | Y | Y | false | |
db.reuse.connection.count | Y | 0 | ||
auto.failover | Y | Y | true | |
auto.reconfigure | Y | true | This value must be same for all the agents. | |
promotable | Y | true | ||
use.replay.tiebreaker | Y | Y | true | This value must be same for all the agents. |
standby.restart.delay | 0 | |||
application.name | Set to replace the application_name portion of the primary_conninfo entry with this property value before starting the original primary database as a standby. | |||
restore.command | Example: restore.command=scp <db_service_owner>@%h: <archive_path>/%f %p | |||
reconfigure.num.sync | Y | false | If you are on Failover Manager 4.1, see reconfigure_num_sync_max to raise num_sync. | |
reconfigure.num.sync.max | Available in Failover Manager 4.1 and later. | |||
reconfigure.sync.primary | Y | false | ||
minimum.standbys | Y | Y | 0 | This value must be same for all the nodes. |
priority.standbys | Available in Failover Manager 4.2 and later. | |||
recovery.check.period | Y | 1 | ||
restart.connection.timeout | 60 | |||
auto.resume.period | Y | 0 | ||
virtual.ip | (see virtual.ip.single) | Leave blank if you do not specify a VIP. | ||
virtual.ip.interface | Required if you specify a VIP. | |||
virtual.ip.prefix | Required if you specify a VIP. | |||
virtual.ip.single | Y | Y | Yes | This value must be same for all the nodes. |
check.vip.before.promotion | Y | Y | Yes | |
pgpool.enable | false | Available in Failover Manager 4.1 and later. | ||
pcp.user | Required if pgpool.enable is set to true. Available in Failover Manager 4.1 and later. | |||
pcp.host | Required if pgpool.enable is set to true, this value must be same for all the agents. Available in Failover Manager 4.1 and later. | |||
pcp.port | Required if pgpool.enable is set to true, this value must be same for all the agents. Available in Failover Manager 4.1 and later. | |||
pcp.pass.file | Required if pgpool.enable is set to true. Available in Failover Manager 4.1 and later. | |||
pgpool.bin | Required if pgpool.enable is set to true. Available in Failover Manager 4.1 and later. | |||
script.load.balancer.attach | Example: script.load.balancer.attach= /<path>/<attach_script> %h %t | |||
script.load.balancer.detach | Example: script.load.balancer.detach= /<path>/<detach_script> %h %t | |||
detach.on.agent.failure | true | Set to false if you want to keep a running primary database attached. Available in Failover Manager 4.2 and later. | ||
script.fence | Example: script.fence= /<path>/<script_name> %p %f | |||
script.post.promotion | Example: script.post.promotion= /<path>/<script_name> %f %p | |||
script.resumed | Example: script.resumed= /<path>/<script_name> | |||
script.db.failure | Example: script.db.failure= /<path>/<script_name> | |||
script.primary.isolated | Example: script.primary.isolated= /<path>/<script_name> | |||
script.remote.pre.promotion | Example: script.remote.pre.promotion= /<path>/<script_name> %p | |||
script.remote.post.promotion | Example: script.remote.post.promotion= /<path>/<script_name> %p | |||
script.custom.monitor | Example: script.custom.monitor= /<path>/<script_name> | |||
custom.monitor.interval | Required if a custom monitoring script is specified. | |||
custom.monitor.timeout | Required if a custom monitoring script is specified. | |||
custom.monitor.safe.mode | Required if a custom monitoring script is specified. | |||
sudo.command | Y | Y | sudo | |
sudo.user.command | Y | Y | sudo -u %u | |
lock.dir | If not specified, defaults to '/var/lock/efm-<version>' | |||
log.dir | If not specified, defaults to '/var/log/efm-<version>' | |||
syslog.host | localhost | |||
syslog.port | 514 | |||
syslog.protocol | ||||
syslog.facility | UDP | |||
file.log.enabled | Y | Y | true | |
syslog.enabled | Y | Y | false | |
jgroups.loglevel | info | |||
efm.loglevel | info | |||
jvm.options | -Xmx128m |
Cluster properties
Use the following properties to specify connection details for the Failover Manager cluster:
The db.user
specified must have enough privileges to invoke selected PostgreSQL commands on behalf of Failover Manager. For more information, see Prerequisites.
For information about encrypting the password for the database user, see Encrypting your database password.
Use the db.service.owner
property to specify the name of the operating system user that owns the cluster that is being managed by Failover Manager. This property isn't required on a dedicated witness node.
Specify the name of the database service in the db.service.name
property if you use the service
or systemctl
command when starting or stopping the service.
Use the same service control mechanism (pg_ctl
, service
, or systemctl
) each time you start or stop the database service. If you use the pg_ctl
program to control the service, specify the location of the pg_ctl
program in the db.bin
property.
Use the db.data.dir
property to specify the location where a standby.signal
or recovery.conf
file will be created. This property is required on primary and standby nodes. It isn't required on a dedicated witness node.
Use the db.config.dir
property to specify the location of database configuration files if they aren't stored in the same directory as the recovery.conf
or standby.signal
file. This is the value specified by the config_file
parameter directory of your EDB Postgres Advanced Server or PostgreSQL installation. This value is used as the location of the EDB Postgres Advanced Server data
directory when stopping, starting, or restarting the database.
For more information about database configuration files, visit the PostgreSQL website.
Use the jdbc.sslmode
property to instruct Failover Manager to use SSL connections. By default, SSL is disabled.
Note
If you set the value of jdbc.sslmode
to verify-ca
and you want to use Java trust store for certificate validation, you need to set the following value. This line can be added anywhere in the cluster properties file:
jdbc.properties=sslfactory=org.postgresql.ssl.DefaultJavaSSLFactory
For information about configuring and using SSL, see Secure TCP/IP Connections with SSL and Using SSL in the PostgreSQL documentation.
Use the user.email
property to specify an email address (or multiple email addresses) to receive notifications sent by Failover Manager.
The from.email
property specifies the value to use as the sender's address for email notifications from Failover Manager. You can:
- Leave
from.email
blank to use the default value (efm@localhost
). - Specify a custom value for the email address.
- Specify a custom email address, using the
%h
placeholder to represent the name of the node host (for example, example@%h). The placeholder is replaced with the name of the host as returned by the Linux hostname utility.
For more information about notifications, see Notifications.
Use the notification.level
property to specify the minimum severity level at which Failover Manager sends user notifications or when a notification script is called. For a complete list of notifications, see Notifications.
Use the notification.text.prefix
property to specify the text to add to the beginning of every notification.
Use the script.notification
property to specify the path to a user-supplied script that acts as a notification service. The script is passed a message subject and a message body. The script is invoked each time Failover Manager generates a user notification.
The bind.address
property specifies the IP address and port number of the agent on the current node of the Failover Manager cluster.
Use the external.address
property to specify the IP address or hostname to use for communication with all other Failover Manager agents in a NAT environment.
Use the admin.port
property to specify a port on which Failover Manager listens for administrative commands.
Set the is.witness
property to true
to indicate that the current node is a witness node. If is.witness
is true
, the local agent doesn't check to see if a local database is running.
The EDB Postgres Advanced Server pg_is_in_recovery()
function is a Boolean function that reports the recovery state of a database. The function returns true
if the database is in recovery or false
if the database isn't in recovery. When an agent starts, it connects to the local database and invokes the pg_is_in_recovery()
function.
- If the server responds true, the agent assumes the role of standby.
- If the server responds false, the agent assumes the role of primary.
- If there's no local database, the agent assumes an idle state.
Note
If is.witness
is true
, Failover Manager doesn't check the recovery state.
The following properties apply to the local server:
- The
local.period
property specifies the number of seconds between attempts to contact the database server. - The
local.timeout
property specifies the number of seconds an agent waits for a positive response from the local database server. - The
local.timeout.final
property specifies the number of seconds an agent waits after the previous checks have failed to contact the database server on the current node. If a response isn't received from the database within the number of seconds specified by thelocal.timeout.final
property, the database is assumed to have failed.
For example, given the default values of these properties, a check of the local database happens once every 10 seconds. If an attempt to contact the local database doesn't come back positive within 60 seconds, Failover Manager makes a final attempt to contact the database. If a response isn't received within 10 seconds, Failover Manager declares database failure and notifies the administrator listed in the user.email
property. These properties aren't required on a dedicated witness node.
If necessary, modify these values to suit your business model.
Use the remote.timeout
property to limit how many seconds an agent waits for a response from a remote agent or database. Agents only send messages to each other during cluster events. Examples include:
- Attempting to connect to a remote database that may have failed and asking other agents if they can connect.
- A primary agent requesting recovery settings from a standby agent as part of a switchover.
- Telling nodes to prepare to shut down when stopping the Failover Manager cluster.
Use the node.timeout
property to specify the number of seconds for an agent to wait for a heartbeat from another node when determining if a node has failed.
Summary/comparison of timeout properties
- The
local.*
properties are for failure detection of an agent's local database. - The
node.timeout
property is for failure detection of other nodes. - The
remote.timeout
property limits how long agents wait for responses from other agents.
Use the encrypt.agent.messages
property to specify whether to encrypt the messages sent between agents.
Use the enable.stop.cluster
property to enable or disable the stop-cluster
command. The command is a convenience in some environments but can cause issues when unintentionally invoked. In Eager Failover mode, the command results in stopping EDB Postgres Advanced Server without failover.
Use the stop.isolated.primary
property to instruct Failover Manager to shut down the database if a primary agent detects that it's isolated. When true
(the default), Failover Manager stops the database before invoking the script specified in the script.primary.isolated
property.
Use the stop.failed.primary
property to instruct Failover Manager to attempt to shut down a primary database if it can't reach the database. If true
, Failover Manager runs the script specified in the script.db.failure
property after attempting to shut down the database.
Use the primary.shutdown.as.failure
property to treat any shutdown of the Failover Manager agent on the primary node as a failure. If this property is set to true
and the primary agent is shut down, the rest of the cluster treats the shutdown as a failure. This includes any proper shutdown of the agent such as a shutdown of the whole node. None of the timeout properties apply in this case: when the agent exits, the rest of the cluster is notified immediately. After the agent exits, the rest of the cluster performs checks that happen in the case of a primary agent failure. The checks include attempting to connect to the primary database, seeing if the VIP is reachable if used, and so on).
- If the database is reached, a notification is sent informing you of the agent status.
- If the database isn't reached, a failover occurs.
The primary.shutdown.as.failure
property is meant to catch user error, rather than failures, such as the accidental shutdown of a primary node. The proper shutdown of a node can appear to the rest of the cluster as if a user has stopped the primary Failover Manager agent, for example to perform maintenance on the primary database. If you set the primary.shutdown.as.failure
property to true
, take care when performing maintenance.
To perform maintenance on the primary database when primary.shutdown.as.failure
is true
, stop the primary agent and wait to receive a notification that the primary agent has failed but the database is still running. Then, it is safe to stop the primary database. Alternatively, you can use the stop-cluster
command to stop all of the agents without performing failure checks.
Use the update.physical.slots.period
property to define the slot advance frequency. When update.physical.slots.period
is set to a positive integer value, the primary agent reads the current restart_lsn
of the physical replication slots after every update.physical.slots.period
seconds. It sends this information with its pg_current_wal_lsn
and primary_slot_name
(if it's set in the postgresql.conf
file) to the standbys. The physical slots must already exist on the primary for the agent to find them. If physical slots don't already exist on the standbys, standby agents create the slots and then update the restart_lsn
parameter for these slots. A non-promotable standby doesn't create new slots but updates them if they exist.
Before updating the restart_lsn
value of a slot, the agent checks to see if an xmin
value has been set, which may happen if this was previously a primary node. If an xmin
value has been set for the slot, the agent drops and recreates the slot before updating the restart_lsn
value.
Note: all slot names, including one set on the current primary if desired, must be unique.
Use the ping.server.ip
property to specify the IP address of a server that Failover Manager can use to confirm that network connectivity isn't a problem.
Use the ping.server.command
property to specify the command used to test network connectivity.
Use the auto.allow.hosts
property to instruct the server to use the addresses specified in the .nodes
file of the first node to start to set the allowed host list. Enabling this property by setting auto.allow.hosts
to true
can simplify cluster startup.
Use the stable.nodes.file
property to instruct the server not to rewrite the nodes file when a node joins or leaves the cluster. This property is most useful in clusters with IP addresses that don't change.
The db.reuse.connection.count
property allows the administrator to specify the number of times Failover Manager reuses the same database connection to check the database health. The default value is 0, indicating that Failover Manager creates a fresh connection each time. This property isn't required on a dedicated witness node.
The auto.failover
property enables automatic failover. By default, auto.failover
is set to true
.
Use the auto.reconfigure
property to instruct Failover Manager to enable or disable automatic reconfiguration of remaining standby servers after the primary standby is promoted to primary. Set the property to true
(the default) to enable automatic reconfiguration or false
to disable automatic reconfiguration. This property isn't required on a dedicated witness node.
Note
primary_conninfo
is a space-delimited list of keyword=value pairs.
Use the promotable
property to indicate not to promote a node. The promotable
property is ignored when a primary agent starts. This simplifies switching back to the original primary after a switchover or failover. To override the setting, use the efm set-priority
command at runtime. For more information about the efm set-priority
command, see Using the efm utility.
If the same amount of data was written to more than one standby node and a failover occurs, the use.replay.tiebreaker
value determines how Failover Manager selects a replacement primary. Set the use.replay.tiebreaker
property to true
to instruct Failover Manager to failover to the node that will come out of recovery faster, as determined by the log sequence number. To ignore the log sequence number and promote a node based on user preference, set use.replay.tiebreaker
to false
.
Use the standby.restart.delay
property to specify the time in seconds for the standby to wait before it gets reconfigured (stoppstarts) to follow the new primary after a promotion.
You can use the application.name
property to provide the name of an application to copy to the primary_conninfo
parameter before restarting an old primary node as a standby.
Note
Set the application.name
property on the primary and any promotable standby. In the event of a failover/switchover, the primary node can potentially become a standby node again.
Use the restore.command
property to instruct Failover Manager to update the restore_command
value when a new primary is promoted. %h
represents the address of the new primary. Failover Manager replaces %h
with the address of the new primary. %f
and %p
are placeholders used by the server. If the property is left blank, Failover Manager doesn't update the restore_command
values on the standbys after a promotion.
See the PostgreSQL documentation for more information about using a restore_command.
The database parameter synchronous_standby_names
on the primary node specifies the names and count of the synchronous standby servers that confirm receipt of data to ensure that the primary nodes can accept write transactions. When the reconfigure.num.sync
property is set to true
, Failover Manager reduces the number of synchronous standby servers and reloads the configuration of the primary node to reflect the current value.
Note
If you're using the reconfigure.num.sync
property, make sure that the wal_sender_timeout
value in the primary database is set to at least 10 seconds less than the efm.node.timeout
value.
Use the reconfigure.num.sync.max
property to specify the maximum number to which num-sync can be raised when a standby is added to the cluster.
Set the reconfigure.sync.primary
property to true
to take the primary database out of synchronous replication mode if the number of standby nodes drops below the level required. Set reconfigure.sync.primary
to false
to send a notification if the standby count drops without interrupting synchronous replication.
Note
If you're using the reconfigure.sync.primary
property, ensure that the wal_sender_timeout
value in the primary database is set to at least 10 seconds less than the efm.node.timeout
value.
Use the minimum.standbys
property to specify the minimum number of standby nodes to retain on a cluster. If the standby count drops to the specified minimum, a replica node isn't promoted if a failure of the primary node occurs.
Use the priority.standbys
property to specify the priority of standbys after this node is promoted.
Use the recovery.check.period
property to specify the number of seconds for Failover Manager to wait before it checks to see if a database is out of recovery.
Use the restart.connection.timeout
property to specify the number of seconds for Failover Manager to attempt to connect to a newly reconfigured primary or standby node while the database on that node prepares to accept connections.
Use the auto.resume.period
property to specify the number of seconds for an agent to attempt to resume monitoring that database. This property applies after a monitored database fails and an agent has assumed an idle state or when starting in IDLE mode.
Failover Manager provides support for clusters that use a virtual IP. If your cluster uses a virtual IP, provide the host name or IP address in the virtual.ip
property. Specify the corresponding prefix in the virtual.ip.prefix
property. Leave virtual.ip
to disable virtual IP support.
Use the virtual.ip.interface
property to provide the network interface used by the VIP.
The specified virtual IP address is assigned only to the primary node of the cluster. If you specify virtual.ip.single=true
, the same VIP address is used on the new primary if a failover occurs. Specify a value of false
to provide a unique IP address for each node of the cluster.
For information about using a virtual IP address, see Using Failover Manager with virtual IP addresses.
Note
If a primary agent starts and the node doesn't currently have the VIP, the Failover Manager agent acquires it. Stopping a primary agent doesn't drop the VIP from the node.
Set the check.vip.before.promotion
property to false
to prevent Failover Manager from checking to see if a VIP is in use before assigning it to a new primary in case of a failure. This might result in multiple nodes broadcasting on the same VIP address. Unless the primary node is isolated or can be shut down via another process, set this property to true
.
Use the pgpool.enable
property to specify if you want to enable the Failover Manager and Pgpool integration for high availability. If you want to enable Pgpool integration in a non-sudo mode (running as the DB owner), the PCPPASS file must be owned by the DB owner operating system user and you must set the file permissions to 600.
Use the following parameters to specify the values to use for Pgpool integration.
Use the following properties to provide paths to scripts that reconfigure your load balancer in case of a switchover or primary failure scenario. The scripts are also invoked when a standby failure occurs. If you're using these properties, provide them on every node of the cluster (primary, standby, and witness) to ensure that if a database node fails, another node will call the detach script with the failed node's address.
You don't need to set the following properties if you are using Pgpool as the load balancer solution and you have set the Pgpool integration properties.
Provide a script named after the script.load.balancer.attach
property to identify a script to invoke when you want to attach a node to the load balancer. Use the script.load.balancer.detach
property to specify the name of a script to invoke when you want to detach node from the load balancer. Include the %h
placeholder to represent the IP address of the node that's being attached or removed from the cluster. Include the %t
placeholder to instruct Failover Manager to include a p (for a primary node) or an s (for a standby node) in the string.
Use the detach.on.agent.failure
property to indicate that you don't want to detach a node from the load balancer in a scenario where the primary agent fails but the database is still reachable. The default value is true.
The script.fence
property specifies the path to an optional user-supplied script to invoke during the promotion of a standby node to primary node.
Use the script.post.promotion
property to specify the path to an optional user-supplied script to invoke after a standby node is promoted to primary.
Use the script.resumed
property to specify an optional path to a user-supplied script to invoke when an agent resumes monitoring a database.
Use the script.db.failure
property to specify the complete path to an optional user-supplied script that Failover Manager invokes if an agent detects that the database that it monitors has failed.
Use the script.primary.isolated
property to specify the complete path to an optional user-supplied script that Failover Manager invokes if the agent monitoring the primary database detects that the primary is isolated from the majority of the Failover Manager cluster. This script is called immediately after the VIP is released (if a VIP is in use).
Use the script.remote.pre.promotion
property to specify the path and name of a script to invoke on any agent nodes not involved in the promotion when a node is about to promote its database to primary.
Include the %p placeholder to identify the address of the new primary node.
Use the script.remote.post.promotion
property to specify the path and name of a script to invoke on any nonprimary nodes after a promotion occurs.
Include the %p placeholder to identify the address of the new primary node.
Use the script.custom.monitor
property to provide the name and location of an optional script to invoke on regular intervals, specified in seconds by the custom.monitor.interval
property.
Use custom.monitor.timeout
to specify the maximum time for the script to run. If script execution doesn't finish in the time specified, Failover Manager sends a notification.
Set custom.monitor.safe.mode
to true
to instruct Failover Manager to report nonzero exit codes from the script but not promote a standby as a result of an exit code.
Use the sudo.command
property to specify a command for Failover Manager to invoke when performing tasks that require extended permissions. Use this option to include command options that might be specific to your system authentication.
Use the sudo.user.command
property to specify a command for Failover Manager to invoke when executing commands performed by the database owner.
Use the lock.dir
property to specify an alternative location for the Failover Manager lock file. The file prevents Failover Manager from starting multiple, potentially orphaned, agents for a single cluster on the node.
Use the log.dir
property to specify the location to write agent log files. Failover Manager attempts to create the directory if the directory doesn't exist. The log.dir
parameter defined in the efm.properties
file determines the directory path where the EFM logs are stored. This parameter applies exclusively to the EFM logs and doesn't affect the logging configuration for any other components or services.
To change the startup log location for EFM, modify the runefm.sh
script located in the EFM bin directory. Set the LOG
parameter in this script to define the desired log file location.
After enabling the UDP or TCP protocol on a Failover Manager host, you can enable logging to syslog. Use the syslog.protocol
parameter to specify the protocol type (UDP or TCP) and the syslog.port
parameter to specify the listener port of the syslog host. You can use the syslog.facility
value as an identifier for the process that created the entry. Use a value between LOCAL0 and LOCAL7.