feb2003.tar

Unifying Web Clusters with Spread

John David Duncan

The most cost-effective way to scale up a Web application is to move it from a single Web server to a cluster of inexpensive servers. However, moving an application from one server to many can pose some challenges for systems administrators. It might be easy to analyze log files or monitor traffic on a single server, for example, but it's not nearly as simple to gather complete and meaningful information from multiple log files scattered across many servers. The standard syslog utility, useful for distributing many other sorts of logging operations, turns out to be less than ideal for managing Web logs due to a variety of design decisions, including its use of an unreliable network protocol (UDP) for transporting data.

The Spread Toolkit

While searching for an open source solution to this problem, I found Spread, a network group messaging toolkit developed at Johns Hopkins University that is available from http://www.spread.org. Spread is a mature project, now five years old, and was originally licensed under other terms but has been available under an open source license since June 2001. With Spread, I can develop distributed logging and monitoring applications running on many Web servers just as easily as single-server applications.

Spread itself is a multi-purpose networking toolkit, usable for many applications besides the one I describe here. It provides a simple API allowing an application to join a message group, to send a message to a message group, or to receive the messages sent to a group by other members. Whenever an application sends a message, the API passes it to a spread daemon running on the local machine, and the daemon distributes it to other the machines in the network. The underlying transport can vary depending on the network topology of a segment; Spread commonly uses a combination of UDP multicast within a LAN and unicast between LANs. The Spread protocol can provide "reliable ordered multicast" -- semantics that guarantee an application that every group member has received its messages, and received them in the correct order. If that guarantee is not needed, however, the application can ask Spread to send a message without it and obtain better performance.

Spread APIs are available for C, C++, Java, Perl, PHP, Ruby, Python, and a few other languages. Spread has been used to provide database replication in Postgres and replicated storage for Zope. The Backhand project (http://www.backhand.org) provides two useful applications built on top of Spread. One is a utility called Wackamole, used to manage a virtual IP address for high availability, moving it from one server to another if the first server fails. The other is an Apache module named mod_log_spread.

Mod_log_spread was the first Spread application to really catch my eye. It allows a Web server to write its access log to a Spread group rather than a local file. Any number of Web servers can send log messages using mod_log_spread; somewhere in the network, another machine runs a logging daemon, spreadlogd, which simply joins the groups and writes all of the messages into a single log file. For example, if your site needed to keep duplicate logs for redundancy, you could simply add a second log server, turn on spreadlogd, and start writing another copy. The module is actually distributed as a patch to Apache's mod_log_config. An alternative implementation by the same developer, George Schlossnagle, for mod_perl users is the Apache::Log::Spread module, available from CPAN.

At my site, various Web applications write their own specific log files. The site is developed in Perl, so I built an object-oriented layer on top of Spread's Perl API to provide a simple logging facility. The application programmers modified each Perl script to use this Log::SpreadLog module, and now the spreadlogd daemon retrieves each log message and writes it to its appropriate file on a central logging server.

Since each application sends messages to a particular Spread group, a sys admin can monitor any aspect of the site by joining the appropriate group and watching the messages flow by. For more systematic monitoring, the messages can be automatically counted and graphed. A graph of Spread messages can be used to measure the activity of a whole Web cluster in real time.

Configuring Spread

Spread is available from http://www.spread.org, and at the time of writing, version 3.17.1 is soon to be released. Version 3.17 is the first major release to use GNU autoconf and can be built on most UNIX machines using just "configure" and "make". Binaries are also available for other platforms, including Windows NT and 2000. A default "make install" will install the Spread binary in /usr/local/sbin, the Spread header files in /usr/local/include, and both the thread-safe Spread library libtspread and the unsafe library libspread in /usr/local/lib. (Through version 3.16, these two libraries were known as "libtsp" and "libsp". The new naming scheme is an improvement, but the change can cause a linker error when building applications against Spread 3.17 that were written using older versions. This problem is likely to affect any source code dated before September 2002.)

Spread is configured using a file named spread.conf, usually found in /etc or /usr/local/etc. Here is a sample spread.conf file for three machines on a private network. The machines communicate using the local multicast address 239.0.0.1 and Spread's default port, 4803 (the could alternatively have used a broadcast address like 192.168.1.255). RFC 2365 designates the multicast addresses from 239.0.0.0 to 239.255.255.255 for private use inside a single organization, so it is usually good to assign a Spread segment an address within that range.

Spread_Segment  239.0.0.1:4803 {
     larry   192.168.1.1
     moe     192.168.1.2
     curly   192.168.1.3
}

Once you have created the spread.conf file, you can start the spread daemon on each machine. To deploy Spread for production use, start the daemon automatically at boot time, and consider running it with enhanced priority (using the nice command) or even with real-time priority. Real-time priority is available on FreeBSD machines by running spread from the rtprio command. Under Linux, use the Scheduler module, available at http://www.omniti.com/~george/mod_log_spread/. This allows the Spread networking code to run closer to kernel-level priority -- where networking code usually runs -- rather than user-level, and helps keep a Spread segment stable even if a particular machine gets bogged down by cpu-intensive processes.

Once the daemon is running, you can test the network using spuser, Spread's command-line client. Start spuser running on two different machines, and use the "j" command to join a group, followed by the "s" command to send a message. If this all works, it looks something like the following example:

larry% spuser
User> j testgroup
User> s testgroup
enter message: Come here, Watson.  I need you.

curly% spuser
User> j testgroup

User>
============================
received SAFE message from #user#larry, of type 1, (endian 0) to 1 groups
(32 bytes): Come here, Watson.  I need you.

Configuring Spreadlogd

Spreadlogd can be found in the mod_log_spread package at:

http://www.backhand.org

It requires its own configuration file -- /etc/spreadlogd.conf -- specifying how to connect to Spread, which groups to listen to, and where to write each group's messages. Here is a sample /etc/spreadlogd.conf file:

Spread {
        Port = 4803
        Log {
                Group = "group1"
                File = /var/log/spread/test1.log
        }
        Log {
                Group = "group2"
                File = /var/log/spread/apache.log
        }
}

Once you have installed spreadlogd, created the /var/log/spread directory, and edited /etc/spreadlogd.conf, it should be possible to run and test spreadlogd. To test the configuration shown here, you could use spuser on any machine in the segment to send a message to the group "group1". The message should show up in the file /var/log/spread/test1.log on the spreadlogd server.

If you also run Apache and decide to try mod_log_spread, the package includes a file called INSTALL with instructions for building the module and inserting it into the Apache server.

Using the Perl API

Spread's Perl API is implemented by the Spread.pm module. The module is included with the Spread source in the directory perl/Spread. I built it using the typical procedure for building a Perl module -- perl Makefile.pl ; make ; make install.

Spread's Perl API is quite similar to its C API, and is well documented in Jonathan Stanton's "Users' Guide to Spread," available from http://www.spread.org/docs/guide/ in PDF and PostScript formats. The Log::SpreadLog package (Listing 1) is intended to hide the details of Spread's operation and allow an application to send a message using only three lines of code. The programmer initializes a Log::SpreadLog object using the name of a Spread group, and then sends a message to the group by calling the object's write() method. Internally, Log::SpreadLog calls Spread::Connect to establish an mbox (a Spread object similar to a socket) and Spread::multicast to send the message. The object does not provide any facilities for receiving messages, but it allows a distributed Web application to send the messages that will get logged by spreadlogd. At my site, we log a variety of information this way, including ad impressions, cookies, and errant search queries:

use Log::SpreadLog;

$log=new Log::SpreadLog("mygroup");
$log->write("this is a message");

Counting and Graphing

The spcount utility (at http://www.efn.org/~jdd/projects/spread) is designed to join a Spread group, count the messages, and output a periodic total. Counting the messages has several uses. For some groups, a total of zero messages in a given time period is a sure sign of failure, and can be used to generate an alert. At my site, we send a Spread message every time the server delivers an ad impression, and we know how many ads are on a page, so we count the number of ads served and use that number to determine the current rate of page views per minute in real time. It is more accurate to count ads, which cannot be cached, than to count the pages themselves, which are cacheable and therefore not necessarily served by one of our back-end servers. We create real-time graphs of site activity by running spcount as a daemon and directing its output to a file. Counting messages in five-minute intervals creates a file suitable for input into the graphing program MRTG. We also count in one-minute intervals, and display the output with a custom CGI script similar to the one included in the spcount distribution. A sample plot is shown in Figure 1.

Conclusion

Having a simple way to send a message to a named group, record a group's messages in log files with spreadlogd, eavesdrop on them with spuser, and count them using spcount has made my cluster of Web servers much more manageable. Spread allows me to measure and assess the group of servers as a single entity. It is supported by a diverse group of people on the spread-users mailing list at lists.spread.org, each using Spread for something slightly different from the next.

John David Duncan is a freelance Web programmer, DBA, and UNIX sys admin in San Francisco. He's currently the staff systems administrator at GreatSchools.net, and on the side he manages the election night database for a major news Web site. He can be contacted at: jdd@efn.org.