sep2003.tar

Embedding man Pages in Shell Scripts with kshdoc

Michael Wang and Ed Schaefer

Generally, large programs (such as Solaris, Oracle, Perl, and ksh) possess big documentation -- books, technical papers, man pages, etc. However, the shell scripts that we write every day usually aren't that big, and it's overhead to maintain separate documentation. Furthermore, shell programmers and administrators are notorious for not providing documentation. Why not simplify the task? One method is to maintain online documentation within the code so that when code does change, documentation updates exist in only one place. This article presents kshdoc (Listing 1), a Korn shell function that allows printing documentation embedded within a Korn shell script.

Perl programmers conveniently embed documentation in Perl scripts using the POD (plain old documentation) format. The kshdoc function uses the perldoc utility to process POD-formatted documentation existing in Korn shell scripts. This article briefly describes a modified POD format, kshdoc usage using the perldoc and pod2man utilities, and a thorough kshdoc code review.

POD Format -- ksh Style

The documentation is entered in scripts using the POD format, preceding each line with "## ", where the space can be omitted for empty lines. The "## =" signifies the beginning of the POD.

kshdoc supports splitting documentation, so you can include comments and code in between ksh-style POD format.

Generally, Unix man pages consist of highlighted NAME, SYNOPSIS, and DESCRIPTION sections:

## =head1 NAME
##
## B<kshdocexample.ss> - test the I<kshdoc function>
##
#* where B<> will embolden text, and I<> will italicize or underline
#* depending on the terminal.
#*
#* The above explains not only what B<> and I<> are, but also shows
#* how to use comments inside ksh style POD.  Lines not starting with "##"
#* are not part of modified POD, and, therefore, will not be processed.
#*
## =head1 SYNOPSIS
## ...
## =head1 DESCRIPTION
## ...

For more information on POD directives, see the man pages for perldoc, pod2man, perlsyn, or Chapter 26 of Programming Perl (by Larry Wall, Tom Christiansen, and Jon Orwant, O'Reilly & Associates).

Using "##", we differentiate ksh-style POD from normal shell comments while also maintaining 100% legal ksh code because ksh-style POD are still comment lines.

At run-time, anything after the exit command is not interpreted by the shell, so it is possible to include "bare" POD inside ksh scripts. However, this approach confuses the shell parser. For example, the two-line script:

exit 0
)

produces the following error when parsed with "ksh -n", which checks the syntax but does not execute:

syntax error at line 2: ')' unexpected

That is why we insist on 100% legal ksh syntax.

Executing kshdoc

Execute the kshdoc function either from a shell script or from the command line:

kshdoc  help=y|ps|pdf  [file=...]  [ path=... ]

kshdoc  show=y  [file=...]  [ path=... ]

Obviously, kshdoc may be embedded in any ksh shell script, but if it's called from the command line or used across the enterprise, it's best to autoload the function (i.e., load when it is first referenced).

The Korn shell automatically loads a function the first time it's called, provided the function resides in a directory in the shell variable FPATH (the format of FPATH is the same as PATH), and the command of the same name cannot be found, or the function is typeset with the -u option:

typeset -fu kshdoc

Executing from the Command Line

From the command line, process the POD of kshdoc itself when kshdoc is in FPATH, or when kshdoc is in the current directory:

kshdoc file=kshdoc

or simply

kshdoc

or, if kshdoc is not autoloaded from FPATH:

kshdoc file=/path/to/kshdoc

Process the POD of the foo file in current directory, or foo program in PATH, or foo function in FPATH:

kshdoc file=foo

Except for FPATH, which is ksh-specific, foo is not limited to a ksh script. It can be any file, or command with embedded modified POD.

Process the POD of the foo script located in /some/where:

kshdoc file=/path/to/foo

In the above examples, the default "help=y" is omitted. "help=y" displays a man page, "help=ps" generates a Postscript document, and "help=pdf" generates a PDF document.

Executing from a Script

The kshdoctest.ss (Listing 2) is a typical example of calling kshdoc from a script. Executing:

kshdoctest.ss help=y

outputs the man page converted from POD to standard output.

The following lines from within the test script:

my_getopts "$@"
[[ $HELP = Y ]] && { kshdoc file=$0; exit 0; }

sets HELP="Y", calls the kshdoc to process the POD, and exits the script. (For more on the my_getopts function, which evaluates the command line, see: http://www.unixreview.com/documents/s=1344/uni1042138723500/.)

The argument file=$0 instructs kshdoc to search the present file for POD. Unfortunately, the argument is necessary since $0 inside a function in ksh93 is the function name, instead of the calling program name. Passing the program name works in both ksh88 and ksh93.

Execute:

kshdoctest.ss

without the help=y command-line argument, skip printing the documentation, and continue normal program flow. Of course, production scripts may have other command-line arguments.

You don't need to use my_getopts. Any command-line argument processing sensing that help should be printed can be used:

[[ $1 = "-h" ]] && { kshdoc file=$0; exit 0; }

The above command executes kshdoc if argument 1 equals "-h".

So far, the examples have all displayed man pages. Sometimes you want to generate PS or PDF files for publishing. All you need is to tell the program what you want to do, pass it to kshdoc function, and your wish will be granted.

Here is an example:

[[ $HELP = @(Y|PS|PDF) ]] && { kshdoc file=$0 help=$HELP; exit 0; }

Retrieving Clean Source

The show=y argument of kshdoc prints the function itself to standard output suitable to be statically included in a shell program. The usage is:

If kshdoc is autoloaded from FPATH or from the current directory:

kshdoc show=y file=kshdoc

or simply:

kshdoc show=y

If kshdoc is not autoloaded:

kshdoc show=y file=/path/to/kshdoc

You can embed the kshdoc source from within the vi editor by executing this command:

:r!kshdoc show=y [file=/path/to/kshdoc]

The show=y option displays all code of the function not ending with "##". In the above example, the new copy of kshdoc disables show since that capability within the original code ends with "##", generating leaner code. The show=y argument also implies help=n, which disables POD processing.

Utilities Used and Passing the Path

kshdoc executes correctly only if the function can find the perldoc and pod2man utilities, and Unix commands sed, mkdir, rm. groff is needed for creating Postcript documents with the help=ps option, and ps2pdf is needed for converting Postscript to PDF documents when specifying the help=pdf option.

perldoc and pod2man are part of Perl package. groff is a GNU utility, while ps2pdf is part of the Ghostcript package.

The PATH defined inside the function defaults to the PATH of the calling program. If the calling program -- login shell or the shell program -- includes perldoc in its PATH, the path= option is not required. You can choose to create a different PATH and pass it to kshdoc via the path=... option.

kshdoc -- Code Review

The Korn shell supports two types of functions:

POSIX style -- name() {}
ksh style -- function name {}

kshdoc (Listing 1) is a ksh-style function since it allows defining local variables with typeset. Local variables (lines 2-5) inside the function, and variables of the same name outside, do not affect each other, making the function modular and safe to include in other programs.

In addition to typeset, help and show are defined as upper case, making the evaluation of their values easier. If the help argument isn't passed, it defaults to "Y" (line 7).

kshdoc uses a localized PATH, so a different PATH may be used inside the function and not affect the rest of the program.

In line 9, $PATH is expanded first. If "path" is not specified, PATH is assigned the value of $PATH defined outside of the function. The path=... option provides the function a way to override this default behavior. PATH is exported since perldoc needs PATH to locate pod2man in line 43.

On Solaris, the perldoc and other Perl binaries are installed in a less well-known place -- /usr/perl5/bin. Unfortunately, many users do not have this location in their PATH. Test whether perl2doc is in the current PATH. If it is, do nothing. If it's not, test whether perldoc is in /usr/perl5/bin. If it is, append /usr/perl5/bin to PATH; if it is not, return an error. (See lines 11-17.)

If no file=... option is specified, the file defaults to kshdoc. "kshdoc" is spelled out for portability. In ksh88, the value of "$0" inside a function is the calling program name, and in ksh93, it is the function name (line 19).

Lines 21-32 tries to locate the file specified via the file=... option. If [[ -r $file ]] is not locatable, check to see whether it exists as an executable in PATH. If it still isn't locatable, see whether it exists as a function in $FPATH (lines 24-26). Finally, if file is still not located, error out (lines 28 and 29).

Since $FPATH is colon-separated, set IFS=: to check each directory within FPATH. Since IFS is redefined in a subshell, the parent isn't affected.

show=y generates the function source. Since show=y and help=... are mutually exclusive, we need not specify show=y and help=n (line 35).

The heart of the function (lines 37-47) involves processing the modified POD with the perldoc or pod2man utilities. Since not all versions of perldoc and pod2man support standard input, convert the commented POD to native POD in a temporary file.

Since perldoc displays the program name, use the base name of the file as the POD file name with a "pod" extension. Set the temp directory name to the basename of the POD file name, a dot, the process id, another dot, and a RANDOM number. The trap (line 40) removes the directory when the function exits.

In line 41, the sed starts processing the POD from the line beginning with ## =. For lines beginning with ##, strip the comment characters, followed by 0 or 1 space, and redirect the resulting lines to a file.

At this point, what happens depends on what you want to do:

Specify help=Y, and perldoc processes the POD sending the man pages to the screen.
Specify help=PS and groff converts the POD to PostScript.
Specify help=PDF, and groff/ps2pdf converts the POD to PDF.

show=y prints out the entire function, except lines ending with ## (lines 49-51). At the end of function, return rather than exit, allows the calling program to continue further processing (line 53).

Future version of kshdoc, if any, can be found at:

http://www.unixlabplus.com/unix-prog/kshdoc

Conclusion

In conclusion, we've presented the Korn Shell function kshdoc, which uses Perl utilities to generate professional-looking man pages, Postscript, and PDF documents. The "commented POD format" is suitable for any ksh script, other source file, or configuration file that uses "#" for a comment.

References

Wall, Larry, Tom Christiansen, and Jon Orwant. 2000. Programming Perl. Sebastopol, CA: O'Reilly & Associates, Inc.

Bolsky, Morris, David Korn. 1995. The New Kornshell Command and Programming Language. Upper Saddle River, NJ: Prentice Hall PTR.

Michael Wang earned Master Degrees in Physics (Peking) and Statistics (Columbia). Currently, he is applying his knowledge to Unix systems administration, database management, and programming. He can be contacted at: xw73@columbia.edu.

Ed Schaefer is a frequent contributor to Sys Admin. He is a Software Developer and DBA for Intel's Factory Integrated Information Systems, FIIS, in Aloha, OR. Ed also hosts the monthly UnixReview.com Shell Corner column. He can be contacted at: shellcorner@comcast.net.