System V Init Staged on an RS/6000 SP Platform

Bill McLean

Every administrator has to manage user applications, and on an SP complex with a large number of nodes that can be quite an ordeal. To help manage them, it's a good idea to consider using System V init. Briefly, System V init uses the runlevel state of the init daemon to call the rc command with the runlevel parameter from the inittab to send commands to all scripts needed for a specific init state. This document describes a project that utilizes System V init and a tool set providing easy management of all your user applications across an SP complex. It simplifies application control links, secured application control, application listings and status checks.

Installation

You will need to create a globally shared directory on all the nodes. In the SP complex this is easily done on the home filesystem of your SP complex, but you may use any NFS mounted on all the nodes. You will need to create a local directory on each node to house the local "rc" scripts. The local directory is necessary to avoid the need for any NFS filesystems at system boot (Figure 1). The next step is the creation of an "rc.script" standard template and naming scheme. Additionally, I will provide a set of Perl management tools with sample configuration files that you can adapt to your environment.

The "definition of standards" is critical to any System V init process. You have to decide content and local location of the application "rc" scripts. The first step is to create an "rc" script template. It's a simple case statement in ksh skeleton designed to accept the start and stop commands from the rc command. You may expand the template to include whatever you desire, but it must have start and stop functionality to be compatible with the "/etc/rc.d/rc" (Listing 1).

For my example, I added "status" to the template; rc will not send "status" but my management tools will. It is good idea to include status functionality because it provides an easy way to verify whether an application is running. Also, the "good" and "bad" status messages of your application scripts will have to be addressed. I used "Not running" and "Running" for my case-sensitive status, but any binary convention will work. The standardization of the application scripts is the basis of the entire System V init implementation. This is where you will need to impose restrictions and guidelines.

The next step is to decide on a naming standard for all the application scripts. My naming standard consisted of two script types -- a global script and a local script. The global application name was used in the global shared directory, and the local application resided locally on the node with an added field for startup ordering (Listing 2a).

Basically, the Global Application name and Local Application names are the same, except locally I add the "Start order" field at the beginning (Listing 2a and Listing 2b.) The field delimiter for both type of scripts is a ".". Each field may have any character other than the field delimiter and may be of any length. The heart of each script is the application name. Each application name will have to be unique to be contained in the global script directory. This allows all node scripts to reside in the global shared directory, helping ease management. Although you may decide on any naming standard, all my scripts are designed to use this standard name and field structure and will need to be adjusted on deviation.

After script standards and the naming standards have been chosen, you must create and build the application scripts. For each application, insert its start, stop, and status code into the case skeletons, then establish your global script directory. In my case, I used "/u/shared/appscripts" and created the local script directory "/appscripts" on each node (Figure 1). I recommend not placing the local scripts under /etc. Keeping the application scripts outside of /etc enables the assigning of rights to application or operations people to enable reading and debugging of application scripts. Of course, we will be linking to the "runlevel" directories under "/etc/rc.d/rc#.d", so their actual location is not critical to System V init.

Implementation

To begin, create all the necessary directories under "/etc/rc.d/rc#.d" to support the runlevels you will be using. Normally AIX uses runlevel 2. If you want to separate the kill and start scripts, you must edit the "rccfg.config" tools file. It's important to note that the "shutdown" command will not call the kill scripts, so put a hook into the rc.shutdown to call an "rccfg stop All" before the node goes down. Splitting the links is nice here because you can just call "rc #" where only the stop links reside. You must also create the inittab entries for your inittab. Depending on which version of AIX you use, they may already be there as:

l2:2:wait:/etc/rc.d/rc 2

Of course, I would add a log so that you can see what messages the scripts reported when called on to start and stop. For example:

l2:2:wait:/etc/rc.d/rc 2 >/tmp/rc2.log 2>&1

Next, create all the necessary directories. You will need a directory on the control workstation as the global script directory. This directory will house all the site's scripts in global name form (Listing 2b). You will also need a local directory on all the nodes. This directory will house all the local application rc.scripts. These need to be local and not on an NFS. After directory creation, you must prepare the "rccfg.config" file with the correct directory parameters (Listing 5). This configuration file will be used by "rccfg" and all the provided tools. Here is a list of the parameters needed in the configuration file:

gscripts /appscripts/global -- Global script location
lscripts /appscripts/local -- Local script location
stlink /etc/rc.d/rc2.d -- Start link location
stplink /etc/rc.d/rc2.d -- Stop link location
rclog /tmp/rclog -- rccfg log location
secaccess /etc -- Option parm for secured version

Once the parameters have been set, copy to all nodes that will be running "rccfg". The config file is to reside in "/etc".

Move all the script tools locally to the nodes. I typically put the tools in /etc/rc.d; the configuration file will be expected in the /etc directory. Once the "rccfg" is locally on the node, I like to make a link to it from /usr/local/bin, but it's not necessary.

Last, copy your scripts to the Global application script directory, then to the local nodes application directories. When placing the local scripts on the node, make sure to apply the starting field delimiter "SXX" onto each filename. This field will control the order in which the links are called by rc, and thereby the startup and shutdown order of the applications. You should now be able to run the "rccfg" tool to create the links on that node:

< /etc/rc.d/rccfg build All >

If it works, then it created your links in the correct rc directories based on the "rccfg.config" file; if not, then a common failure would be a parameter in the config file that is missing or inaccurate. Next, check the scripts to confirm that they can run on their own, etc. If you can't get "rccfg build All" to work, then you have a problem, and you may want to review all the steps.

Management Tools

Now let's review the tools to help you use these scripts. The main tool is "rccfg" (Listing 3). It's used to call all scripts with start, stop, or status. It additionally will manage link creation and removal based on application name of the local scripts. Keep in mind that the "rccfg" will create the link script in order of the "SXX" delimiter for startup links, and backwards for the "KXX" kill links:

<rccfg >:
rccfg <command> All -- Send command to all start scripts.
rccfg <app name> < command> -- Send command to script with matching app name.
rccfg <command> < app name > -- Send command to script with matching app name.
Rccfg <list> [All | appname] -- Will list all local application script names.

rccfg Valid Commands:
start -- Send "start" argument to the script.
stop -- Send "stop" argument to the script.
status -- Send "status" argument to the script.
noauto -- Remove the start link for application; the script name will still denote "S##".
build -- Send "build" argument to the script; restricted to "All" command.
list -- Will list all local application script names.

Examples:
rccfg start tsm -- Send start to tsm application script.
rccfg stop tsm -- Send stop to tsm application.
rccfg noauto tsm -- Remove the start link in start link run-level directory for tsm.
rccfg stop all -- Send stop to all applications.
rccfg noauto all -- Remove all application start links.
rcccfg tsm noauto -- Remove start link for tsm application only.
rccfg list -- Will list all local application script names.
rccfg build All -- Will remove and recreate all stop and start links.

The main benefit of this tool is that you may "dsh" it out from the control workstation to many nodes. You may easily stop all applications from coming up the next boot without editing an rc file or the inittab. Additionally, if you want to check the entire complex status, you may:

<dsh ' rccfg status All ' >

and grep for your status message. Another feature of "rccfg" is that it can also be farmed out as a secured tool. I have included all the security code in the original script. Just change the name from "rccfg" to any other name ("opscfg" for example) and it will trigger its security protocols. To use a secured version of the script, you must create a /etc/<scriptname>.access file according to your "rccfg.config" parameter. I have included a sample of the file needed for secure operation (Listing 4). The main limitation of the secured version is that any start or stop functions to an application name that is not in the access config file will fail and be logged. If someone tries to use a secured version to control an application to which it's not "configured", it will log the attempt and exit with a message. This feature enabled me to add one command in the sudoers file and allow an entire account team to control all their applications.

The next tool I include is a tool for updating the Local script from the global directory (Listing 6). This command uses the same config file as "rccfg" in the "/etc/rccfg.config" directory. The tool is to be run locally on the node or "dsh'd'" out to the node. If the modified time stamp of the global script matching the local scripts application name is newer then the local copy, it will push the global copy to the local node. It will also create a "/tmp/<scriptname>" backup of the overwritten script, just in case:

<rcndupdt>
              < dsh ' /etc/rc.d/rcndupdt '>

The rcndupdt will update all local scripts that are on a node from the global script directory.

Script Promotion

Moving something into promotion for a node is quite easy. If an application script already exists, you just update the script in the global directory and run rcndupdt on the node to pull it down. If the application is new, you must create a new application script and copy it into the "Global Applications Directory", and then copy it into the "Local Application Directory" with the start order field as necessary, and use "rccfg build All" to recreate the local nodes links.

Conclusion

Once the standards are created and the scripts put in place, the System V init approach is very beneficial. You will no longer have to edit the inittab or an rc.local script to stop an application from starting on reboot. Also, if you need to check an application, you can just use rccfg status <appname>. It really helps checking multiple node application status after an outage. It has helped standardize the way people start and stop applications on my nodes. The feature that management likes best is the logging of all application control calls by rccfg and the secured versions. The example here was presented for a SP complex, but it can be easily applied to standalone systems as well. Standalones will not have dsh -- it's a PSSP command -- but rexec will work, and the rccfg script could be easily converted if you do not have Perl on your systems.

Bill McLean is currently a systems administrator for IBM's Scaleable Processor systems on the Viewpointe Oneline Archive Services account. Bill has been in the computing field for 11 years, since his graduation at the University of Arkansas in 1992. For the past seven years he has been specializing in AIX and Unix server administration.