Cover V12, i12

Article
Figure 1
Figure 2
Listing 1
Listing 2
Sidebar

dec2003.tar

Oracle Database Backups via UFS Snapshots and Enterprise Backup Products

Rick Moakley

In the fall of 2002, I was assigned the task of putting together a Solaris 9 system for a new machine, including system and database backups. Typically, we would install an ATL and backup product for this task, but we wanted to move away from that model to something more enterprise-centric. We use IBM's Tivoli Storage Manager for our enterprise backup product, so we looked into their Data Protection for Oracle add-on. In the meantime, I put together an interim solution, as described in this article, which worked so well that we kept it.

I faced the task of backing up this 250+ GB Oracle database over a 100 Base T NIC, and the problem was the amount of time it would take to do so. Rough estimates indicated that it would take 8 to 12 hours on a good day. Then, my research uncovered UFS snapshots. Using UFS snapshots (and a little multitasking), we were able to complete the backup in about 4 hours (during off-shift time). With this setup, there is only a minute or two of downtime for the database when doing a cold backup, and the database is in backup mode only briefly during a hot backup.

The advantages of the resulting scripts using snapshots and multithreading the process were significant. We saw significant cost savings by not purchasing an ATL, tapes, and software. The downtime for a backup is brief, and the wall clock time is a fraction of a regular backup. Also, we don't have to maintain lists files to be backed up -- a big plus to any admin. Once the space is set aside for doing snapshots, the scripts are installed and cron'ed, and the Oracle userid is profile verified, the process runs automatically and is virtually maintenance free.

UFS Snapshots and Wrapper Scripts

UFS snapshots are a little-used facility incorporated into Solaris (since release 8) that allow a privileged user (i.e., root) to create a point in time copy of a file system afer quieting it. Copy-on-write operations place pre-change copies of the filesystem blocks into a "backing store file", and when the associated fssnap device is mounted, the blocks stored in the backing store file overlay the changes to the original. Thus, reading from the snapshot device presents a point-in-time copy of the original file system to the backup product of choice.

Doing a database backup is typically more complicated than a file system backup, and because I was adding snapshots and multitasking to the mix, things got even more complicated. My solution was to put a couple of wrapper scripts together to manage the backup process and do the actual backups. There is a backup.parent script (Listing 1) that does the management. It queries the database for its critical file information (i.e., tablespace, control filenames, archive logs, etc.), readies the database (i.e., puts it in backup mode), snapshots the required filesystems, and then starts a set of backup.child scripts (Listing 2) that do the actual backups. Each invocation of a child script is passed a filesystem, set of files within, and proceeds to do its share of the work backing up the tablespace files and deleting the snapshot when complete.

For cold backups, the datafiles, controlfiles, and logfiles are backed up. For hot backups, the datafiles, archive logs in use between the start and end (signified by a "switch log"), and a backup copy of the control file is created and backed up.

How the Scripts Work

The scripts are pretty straightforward. The use of functions (although perhaps easier in some sense) was discouraged to facilitate easier reading by non-programmers. There are very few variables to adjust, and those are detailed in this article. When started, the scripts are passed a database instance name (the ORACLE_SID), and optionally the type of backup (cold or hot), number of concurrent streams, and a retention period (used by TSM):

./backup.parent database01 HOT 4 ARC5WEEKS
The defaults are cold -- one per CPU -- and ARC5WEEKS (which must be defined by the TSM administrator). The parent script, after profiling its environment, ensures that the snapshot filesystem (i.e., snapshot_fs=/fssnap) is actually mounted (it will alert you to this and point out where to change this value). At one point in my testing, I discovered that the snapshot filesystem wasn't mounted, and the process continued merrily along its way filling up root, so I added the above section into the script.

Provided the snapshot_fs is okay, the next step is to determine which files are actually going to be backed up. I chose to avoid manually keeping track of files or directories, which is error-prone at best, and at worst you don't know what was forgotten in a backup until you need to restore it. The script goes right to the source -- the database itself -- for each run. When the DBA adds a file, it doesn't get missed. Files are identified by doing a "switch user" to the Oracle id and executing a sqlplus command with a few selects, while spooling the output to a file. If your Oracle userid is not "oracle", change it in the script (i.e., oracle="oracle"):

CONTROLFILE='v\$controlfile'
LOGFILE='v\$logfile'
DATAFILE='v\$datafile'
TEMPFILE='v\$tempfile'
...
${su} - ${oracle} << EOF >> $log1 2>&1
${prep_script}
sqlplus -SILENT "/ as sysdba" << EON >>  $log2
whenever sqlerror exit 1
set pagesize 0
set heading off
set timing off
spool $tmpfile1
select 'CONTROLFILE::'|| name      from $CONTROLFILE;
select 'TABLESPACE:'||tablespace_name||':'||file_name from sys.dba_data_files;
select 'LOGFILE::'||member    from $LOGFILE;
select 'TEMPFILE::'||NAME      from $TEMPFILE;
spool off
set echo off
spool $tmpfile2
SHOW parameter db_block_size
spool off
exit 0
EON
EOF
The resulting file can be greped for the needed files. When doing a cold backup, the control files, tablespaces files, and log files must be backed up via snapshot. For a hot backup, we only back up the tablespace files via snapshot. Special considerations are made for the control file and archive logs.

Since each filesystem must be snapped only once, we use grep to select the ones we want and to weed out the ones we don't want. According to the above rules, run a df on each database file, writing the filesystem name to a temporary file. A sort -u command removes the duplicates, resulting in a list of filesystems to snapshot:

if [ $hot_or_cold == "HOT" ]; then
        pathlist=$(grep ^"TABLESPACE:" $tmpfile1 | nawk -F: '{print $NF}')
else    pathlist=$(grep -v "TEMPFILE:" $tmpfile1 | nawk -F: '{print $NF}')
fi
for path in $pathlist; do
    filesys=$(df $path|cut -f1 -d'('|tr -d ' ')
    outfile=$(echo $filesys|tr / _)
    echo $path >> $tmpname.$outfile
    echo $filesys $tmpname.$outfile >> $tmpfile2
done

# sort and de-dup the temp file
sort -u -o $filesystems $tmpfile2
${rm} $tmpfile2
Before we snapshot the filesystems, the database must be "readied for backup". In the case of a cold backup, we do another switch user to Oracle, and a "shutdown immediate". All the commands are in stream with another "here is" document, as shown in the previous example. The process is a bit different for a hot backup; we need to put each tablespace in "backup mode". Because the list of files isn't hard-coded, we use awk to build a couple of "command files" with the standard commands, a complete set of BEGIN and END BACKUP tablespace commands, as well as a couple of commands to print the status of the archive logs. Note that for this method to work, the DBA must ensure that the database is running in "archive log" mode.

Once we've prepared the database, we run through the list of filesystems, taking a snapshot of each one. Things get sticky if any of the snapshots fail. You can't just quit and leave the database in backup mode, or worse, down. Users and DBAs don't like that, and you can't just leave the snapshots hanging there. I coded the script so that if a snapshot error is encountered, it stops trying to take the others, deletes the ones already taken, and readies the database for normal operations. This rarely happens, but it's best to be prepared:

chunksize=$(expr $blksize / 1024)k
while read filesystem pathnames
do
    outfile=$(echo $filesystem|tr / _)
    snapshot=$(${fssnap} -F ufs \
        -o backing-store=$bstmpname.$outfile.bs,chunksize=$chunksize $filesystem)
    rc=$?
    if [ $rc -ne 0 ]; then
        logger  -ip local1.notice $0": Error" $rc "doing snapshot for:" $filesystem
        echo "Error" $rc "doing snapshot for:" $filesystem
        # error? don't exit here (database status?) see below
        break
    else  echo $filesystem "snapshot taken, Ok"
        logger  -ip local1.notice $0":" $filesystem "snapshot taken, Ok"
    fi
    echo $filesystem $pathnames $snapshot >> $tmpfile2
done < $filesystems
...
if [ $rc -ne 0 ]; then
    if [ -f $filesystems ]; then
        while read filesystem pathnames snapshot
        do
            logger  -ip local1.notice $0": Removing snapshot" $snapshot
            echo "Removing snapshot" $snapshot
            if [ ! -z "$snapshot" ]; then
                ${fssnap} -i -o mountpoint,backing-store, backing-store-len $filesystem >> $log1
            ${fssnap} -d $filesystem
            fi
        done < $filesystems
    fi
fi
After making the snapshots, return the database to service either by restarting it for cold, or ending backup mode for hot. At this point, the backup is done as far as the database is concerned. This process takes less than two minutes on the systems utilizing this technique. The remainder of the task is to start up child processes that mount the snapshots, back up the files, and delete the snapshots when done. By spawning child processes to perform the actual backup, we multi-stream the task and reduce the wall clock time considerably.

There are additional benefits. For example, when all the files in the file system have been backed up, the snapshot actually becomes a liability. Not only does it waste space, but until it's deleted, any write activity in the snapped filesystem will cause the snapshot to grow. When the child process finishes and deletes the snapshot, the used space is returned and becomes available for other filesystem snapshots to grow into. By starting multiple child processes and simultaneously backing up multiple filesystems, the lifespan of each snapshot is shorter, the eventual size (when the backup is complete) is smaller, and thus the total space requirement is reduced.

Benefits for Hot Backups

I occasionally hear complaints about hot backups. Most often, there is the issue of maintaining the archive logs, particularly keeping them together as part of a "backup set". There is also concern about what the Oracle Backup & Recovery Handbook (by Rama Velpuri, 1997, Oracle Press) describes as the "split block phenomenon". Split blocks are (logical) blocks where the front half doesn't match the back half. This can occur when a block is written while it was being read by the backup task.

To ensure that the complete block is in place following a restore, logs must be re-played during the "recover" operation. This can be a complicated process if your backup has taken hours to run, and you can wind up with a significant amount of additional data to maintain. Herein lies an additional benefit to using UFS snapshots. Because the database is only in backup mode during the moments the snapshots are being taken, you have minimal archive log data to keep, and because it only takes a moment to make the snapshot, the only split blocks can be those that were being written during the time the snapshots were actually being made.

The Oracle Backup & Recovery Handbook indicates that the snapshots and hot backups tablespaces should be backed up one by one and that you should not batch them together. We deviate from that rule because to perform the backup properly via snapshot, all the files in a filesystem must be in backup mode before taking the snapshot. The "backup" is then logically being made (from the database's point of view) when the snapshot occurs, not while it is being copied to tape. This is important to remember, because the backup is not truly complete until all the data has been copied to tape, and there are pluses and minuses to this effect. If your snapshot fails (filling the backing store file system, system crash, snapshots don't persist across a reboot, etc.), you must start over. But if the backup server fails during the backup, you can pick up where you left off, manually (or scripting) backing up the remaining files when you get into the office -- provided that you built your snapshot file system with sufficient free space.

Profiling and Space Needs

The considerations for using this technique (and these two scripts) include how you "profile" your Oracle environment, the space needed to hold the snapshots, and taking the snapshots. Our site uses the "oracle" userid and a "newdb" prep script to set up the environment. For the purposes of these scripts, I've included the following example of what is needed, namely the PATH information and database SID. If you have another id, or you profile things differently, you will need to either add this to your environment in the oracle userid profile, or as a script sourcing it in, or make some slight changes to the "parent" script. The backup.parent script (Listing 1) contains a ${prep_script} variable (located right below the Oracle userid variable), which is used to call the setup routine. If your situation doesn't need this, blank it out. An example of what is needed in the setup routine is:

export ORACLE_SID=$1
export ORACLE_HOME=/opt/oracle8/product/8.1.7
export PATH=${PATH}:${ORACLE_HOME}/bin
The space required to hold the snapshots is based on the rate of change that occurs in the filesystems being snapped (including non-database files). In my experience, allocating a separate filesystem of approximately 5 to 10 percent of the size of the combined files should be sufficient to start. The only way to determine what is right for your site is to try it. Fortunately, you don't actually need to perform a backup -- you can manually make a snapshot, estimate the time it would take to back it up, then wait and query the snapshot for its "backing store" size.

If you have a "logical volume manager" and the necessary free space, I recommend using that to build a permanent backing store filesystem and keep nothing there but snapshots. The scripts use a /fssnap file system; be sure that you change that to suit your installation. Because they are transient objects -- only good for the life of the backup -- the backing store filesystem itself can be transient if it contains only snapshots. You can create and delete it at will, and it probably does not need to be RAID protected. (There is the slight chance your backup could fail, but you must determine whether that justifies the additional disk I/O overhead or space requirements.) And, if the space requirements change, you can easily destroy and recreate it without needing to back up the files contained within.

The most important consideration when allocating space for a snapshot files system is scheduling of the backup job. If the job is run off-shift, outside of a window used for updating the database, the space requirements will probably be minimal. If the database in question has a significant amount of write activity during normal operations, you will need to test the process to determine the space requirements. In my experience, the more write operations, the faster the snapshot grows, and the slower the backup runs, thus leading to something that looks like a geometric progression of space requirements.

Specifying the chunk size properly can help minimize the space needs. Spreading the database over several file systems helps even more, because as each file system is backed up and the associated snapshot is deleted, that space becomes available for the remaining ones to grow into. The scripts will calculate and set chunk size and manage the snapshots, but the DBA and admin are the ones responsible for selecting the schedule. See the sidebar, "Managing the Snapshots".

Backing store sizes (Figures 1 and 2) indicate the sizes and distribution of the 500 snapshots taken for the backup jobs run during a six-month period. Less than 20% of them exceeded 10MB in size, and roughly 5% exceeded 1GB. The few occasions where they did grow into the gigabyte range were during periods where the backup either ran into early morning (prior to 8:30AM) user activity, or was run during the day. Given there are 7 filesystems holding the database files, getting 95% of the backups done with less then 7GB of backing store space makes it really cost effective.

TSM Considerations

As I mentioned, we used IBM's TSM product for doing backups. When doing backups, we used the archive command versus the backup command to more easily manage the cycles and retention. (Archives are a point-in-time copy for as long as the expiration date specified, unless deliberately deleted.)

There was one parameter that we modified from the defaults -- "resourceutilization 10" was added to the client dsm.sys file. This parameter allows more streams (sockets) between the client and server; you may or may not need to use this. The reference describes that specifying "10" would allow up to eight simultaneous client connections to the server. The script defines the number of concurrent child processes equal to the number of processors in the host machine, but also provides a command-line option to override that default. If your environment pushes that envelope or you just want to "throttle it", you can either pass it the number of child processes to run or modify the script.

I have other backup products at my disposal and have deliberately coded the scripts so that the preferred backup product can be used. There are three areas where this comes into play: the first is at startup checking to make sure the server is functioning; the second (immediately following) is verifying the retention period for the "archive" is valid (these two could be modified or omitted, if desired); and the third (in the child script) is the actual backup itself.

You can substitute the appropriate syntax for using Legato Networker's "save", Veritas Netbackup's "bpbackup", or any other network backup product you want in the "child" script. The scripts generate a unique name for each backup run, and use it in the TSM archive command's "description" field. The description name is based on the database SID, type of backup (hot or cold), the date, and time, and process id of the parent task. This description name is also incorporated directly into the file name of the log. This makes for simpler management of the backups when performing a query or restore. For other products, this would translate to Networker's "save set id" field and Netbackup's "keyword" field.

Restore Operations

To actually restore a tablespace (or the entire database), remember that the mount point from which it was backed up is where the snapshot was mounted, not where the tablespaces are mounted. When doing the restore, you must specify the two locations; for TSM, it is "file backed up" and the "destination". Networker uses a relocate option, and Netbackup has a "rename file" where the "backup file" and "restore name" are separated by a single space. With TSM restoring a single file is a relativity straightforward task. When you restore the whole database, it becomes a bit more complex. However, the syntax is still simple, you just need to specify each filesystem on a separate command line:

dsmc retrieve -replace=yes -desc=dbname.type.date.time.pid "/snapshot/filesys1/*" "/filesys1/"
dsmc retrieve -replace=yes -desc=dbname.type.date.time.pid "/snapshot/filesys2/*" "/filesys2/"
...
dsmc retrieve -replace=yes -desc=dbname.type.date.time.pid "/snapshot/filesysN/*" "/filesysN/"
While I have done this for small databases, I have not yet had to restore an entire large database, thus can't accurately estimate the time to do so. Things to take into consideration are that TSM (and probably others) tend to store files in an online storage pool during the backup process, and then migrate them to tape when resources permit. Thus, while you may be able to start multiple restores, many of the restore requests will likely queue up against the same physical tape media. If you do implement this process, insure that you set a reasonable level of expectation on when such a restore might complete.

Conclusion

While no process is foolproof, these scripts can simplify the task and allow it to be automated to run during the best times for performing the backup. These scripts will determine which files are to be backed up, execute the appropriate commands to ready the database (be it shutdown for cold, or backup mode for hot), snapshot the filesystems, bring the databases back online, and run the backup tasks in parallel to minimize the wall clock time. (The latter presumes the target system is doing some data compression, and has multiple CPUs with which to do so.) Diagnostic information is written by both scripts into the /tmp filesystem, collected at the end of the run and moved to a log directory (the default being adjacent to the location in which the scripts are placed). During the run, interesting messages are written to the system log using the logger command, having first added a "local1.notice" facility entry to the /etc/syslog.conf file.

When doing some tuning, I've found the following information helpful:

  • netstat -k has a "biostats" section that reports on hits and lookups in the cache.
  • kstat -p unix::var:v_bufhwm seems to provide the upper limit in K.
  • mdb -k with a bfreelist$<buf command returns the number of buffers not yet allocated in the bufsize field.
  • sar -b will produce statistics on the buffer cache.
Rick Moakley is a Unix Systems Administrator with Epsilon Data Mgmt. Over the past 25 years, he has served in Operations, Systems Programming, Software Engineering, and Process Improvement teams with an array of roles from managing and maintaining mainframes and Unix systems, to developing applications in System 390 assembler, C, Lotus Notes, SAS, and more. During his free time, he likes "to putter around" with home projects, and spend time with his fiancé.