Article
Figure 1
Listing 1
Listing 2
Listing 3

aug2003.tar

Enterprise Database Backup Check

Ed Schaefer

If you administer a phalanx of enterprise, worldwide, distributed databases, how can you verify remote server sites are performing scheduled database backups or archives? How about executing a shell script on a Unix client that accesses each remote database server, checks how long it's been since the last backup completed, and emails a warning to the proper administrators? This is child's play if the backup start and stop times are stored in the database. Also, it's simple enough if the backup statistics are generated by a Unix executable -- just execute a remote shell, remsh, and return the necessary data.

Unfortunately, for security reasons, many network firewalls no longer allow executing the "r" commands -- remsh (remote shell) or rcp (remote file copy) -- severely limiting remote access. This article presents one method for avoiding the "r" problem; however, the database vendor's access utility must have remote server access, and the database stored procedure language, SPL, must allow executing a subprogram.

My database is Informix, but to enhance portability, I hide database-specific implementation details (such as the access utility) in shell variables. (See my article "db_update: Controlling UNIX Enterprise Database Releases" in Sys Admin, December 2002, http://samag.com/documents/s=7732/sam0212c/.)

General Design

On a single client server, I periodically execute the shell script, buc_client.ss (Listing 1). For each remote server defined in a configuration file, the client executes stored procedure, SP, sp_spawn_rem on the remote database. The SP executes shell script, buc_server.ss (Listing 2), on the remote server (see Figure 1).

The buc_server.ss script executes the subprogram oncheck, which determines the last backup date and time. The buc_client.ss script stores the last backup date-time and the current date-time in the remote database, and returns control to the client that completes the report processing. If the time since the last backup exceeds the allowable, it sends a report to the administrators stored in another configuration file.

Next, I will discuss selected issues of buc_client.ss and buc_client.ss:

Buc_client.ss Design Logic

For each server:

Determine the time to wait before reporting a backup required error.
Execute stored procedure sp_spawn_rem, which executes the buc_server.ss script on the remote server. Pass a PRIMARY KEY to the remote server to store the backup data in the database.
After the remote server completes execution, using the PRIMARY KEY, retrieve the data from the remote server via the database's remote_values table containing the current and backup date-times.
If a network or other connection error occurs, email the DBA list, and continue to the next server.
Using the PRIMARY KEY, delete the entry in the remote remote_values table.
Determine the elapsed time between the backup time and the current system time.
If the elapsed time is greater than the time to wait, report to the administrator's mail list.

Buc_server.ss Design Logic

On the remote server:

Execute the Informix oncheck utility, which determines the backup date and time.
Determine the current date and time.
Save, by PRIMARY key, the backup date and time and the current date and time in the remote_values table.
Terminate the server script returning to the client.

Configuration Objects

config_data.file

The config_data.file, located in the Informix user home directory, contains the number of hours to wait before reporting that a backup is required. The colon-delimited file structure is the variable name, the number of hours to wait, and a comment, respectively:

hours_from_archive:18:number of hours to wait before reporting backup required

For administrator convenience, the wait parameter is in hours. The script converts the hours to seconds since date arithmetic in the shell is most easily performed from the Unix Epoch.

buc_server_list.txt file

The names of the servers to be accessed reside, one per line, in the buc_server_list.txt file:

betty
doomsday
#babylon

Each database and server are accessed as such:

dbaccess testdb@betty

Any server commented out is ignored.

buc_mail_list.txt file

If a report is generated, it is mailx'ed to the administrators listed, one per line, in the buc_mail_list.txt:

ed@schaefer.com
jwayne
#rrogers

Any administrator commented out is ignored.

dba_mail_list.txt file

Any network or database connection problems are mailx'ed only to the administrators listed, one per line, in the dba_mail_list.txt:

ed@schaefer.com
#rmitchum

Again, any administrator commented out is ignored.

Database Objects

Executing the stored procedure sp_spawn_rem (Listing 3) allows execution of the buc_server.ss script on the remote server. For security purposes, I use a full path to the script to be executed:

EXECUTE PROCEDURE sp_spawn_rem("/home/eds/buc/buc_server.ss", "$rem_key");

remote_values table

Each remote server's database must contain the following table:

create table remote_values
  (
  remote_key char(20) not null,
  remote_values varchar(64),
  primary key(remote_key)
  );

This table is the coupling between the client and the server. After the client initiates a start message, the server gathers the backup statistics and stores it in the remote_values column.

The primary key maintains the coupling between the primary and server process and is removed when the server process terminates. The primary key not only is good database design, but also allows other client/server scripts performing other functions to use this table.

Report Output

Example data example from remote_values looks like:

2003-04-06 06:30:01  2003-04-06 16:57:21

If the time to wait is exceeded, this is the generated report structure:

==== Backup Report for betty  ==== Sun Apr  6 16:57:21 PDT 2003
Backup Date: 2003-04-06 06:30:01 Current Date: 2003-04-06 16:57:21
number of hours since archive: 10.4 Sun Apr 6 16:57:21 PDT 2003

Setting the Time Zone

In both the client and server programs, I set the time zone, TZ:

export TZ=$(grep TZ= /etc/TIMEZONE|cut -d= -f2)

This may be Solaris 7 or database specific, but if the TZ isn't set within the buc_server.ss script, the oncheck utility uses the client's TZ -- an obvious error. An improperly set TZ in a worldwide network configuration may be catastrophic.

Cron Execution

By design, the client script executes from cron. The following example runs buc_client.ss hourly starting at midnight:

0 * * * * cd /home/eds/buc; [ -x ./buc_client.ss ] && \
  ./buc_client.ss > /dev/null 2>&1

Script Security

A network design such as this is always a security risk. To minimize problems such as a Trojan horse, I implement the following:

Only effective user Informix may execute either the client or server script.
The sp_spawn_rem stored procedure may execute only the buc_client.ss.
External programs buc_client.ss and oncheck execute with a full path.

Conclusion

Originally, this client/server design was entirely for my benefit; I am able to identify which sites aren't doing timely backups. However, since the buc_client.ss script is distributed to all sites, it can run on the same database server as the buc_server script.ss script. The site DBA has only to set up his or her own local configuration files and cron.

Finally, is this client/server design the only solution for this backup problem? Certainly not. This is especially true if the server process returns large amounts of data. The Expect environment might be a better option, but requires climbing the Expect learning curve and querying the keepers and denizens of the network.

Ed Schaefer is a frequent contributor to Sys Admin. He is a software developer and DBA for Intel's Factory Integrated Information Systems, FIIS, in Aloha, Oregon. Ed also hosts the UnixReview.com monthly Shell Corner column. He can be reached at: olded@ix.netcom.com.