Cover V12, I02

Article

feb2003.tar

Tuning Apache Servers

Mohammed Kabir

Customers often ask my company to help them with their Web server performance. The solutions vary depending on the budget, complexity, size, and type (dynamic or static) of the Web contents. For customers with moderate to large budgets and high traffic requirements (hundreds of thousands to multi-million hits per day), we recommend deploying load-balanced Web networks utilizing hardware load balancers or even round-robin DNS. These customers often throw "fault tolerance" into the same bucket as "performance" and therefore require hardware and software redundancy. It always helps to have more RAM, faster disks, and faster CPUs. However, not everyone is ready to fork out money for new hardware. Customers often want to get more out of what they already have instead of buying slim, stylish, single-unit network appliances or servers.

In most cases, the Web platform is Apache on Linux, and Apache has been installed from an RPM package or built from the source. Almost always, the Apache server is a vanilla install and can be customized for both performance and security. In this article, I will share some techniques that we often use to get more juice out of Apache.

Getting the Fat off Your Apache Server

Downloading and installing a binary version of Apache from the Internet not only means you are stuck with someone else's idea of what goes inside the Apache server but you also risk security issues if the download site is not reputable. Therefore, the very first step is to download the Apache source from http://www.apache.org or one of its mirror sites and compile and install from the source.

Unfortunately, many people follow this advice without thinking about what they need or do not need. A vanilla install process for a downloaded source distribution includes:

# cd /usr/src/httpd-2.0.43
# ./configure --prefix=/usr/local/apache
# make
# make install
This gets Apache installed all right, but it also includes module fat that you might not want. To find out what modules are built into your Apache, run the http -l command from the bin directory of your Apache installation. For example, if you have installed Apache on /usr/local/apache, you can run:

# /usr/local/apache/bin/httpd -l
This will show all the modules installed in your Apache binary. The vanilla compilation and installation of 2.0.43 source distribution includes the following compiled in modules:

core.c
mod_access.c
mod_auth.c
mod_include.c
mod_log_config.c
mod_env.c
mod_setenvif.c
prefork.c
http_core.c
mod_mime.c
mod_status.c
mod_autoindex.c
mod_asis.c
mod_cgi.c
mod_negotiation.c
mod_dir.c
mod_imap.c
mod_actions.c
mod_userdir.c
mod_alias.c
mod_so.c

Lots of modules installed by default are likely not needed for your site. For example, if you do not use CGI scripts but use PHP instead, you do not need the mod_cgi module. Similarly, if you do not use Server Side Include (SSI) you can do without the mod_include module. To find out what a particular module does, consult the modules documentation at:

http://httpd.apache.org/docs-2.0/mod/

# cd /usr/src/httpd-2.0.43
# ./configure --prefix=/usr/local/apache --disable-cgi \
              --disable-include  --disable-status \
              --disable-negotiation --disable-imap --disable-userdir
# make
# make install
Here I have disabled CGI script support, SSI support, status support, content negotiations support, internal image map support, and personal Web directory support.

Once compiled and installed, the Apache binary (httpd) will be smaller. I have reduced the binary size by one megabyte or so in many cases. You can make a smaller memory footprint for Apache by stripping off the symbols using the following command:

# strip /usr/local/apache/bin/httpd
If you use dynamically shared object (DSO) modules for loading mod_php or mod_perl or other modules, be sure mod_so is installed as a static module in your Apache binary. You can enable mod_so support using the "--enable-so" option with the configure script. Using DSO modules has a great advantage over statically linked modules. You can disable the module by just commenting out the LoadModule line in the httpd.conf file in your conf directory. Whenever you use a DSO module, be sure you use the IfModule container as shown below:

 <IfModule your_module>
    # your module specific configuration
    # goes here
</IfModule>
This container isolates your module-specific configurations within itself, which ensures that you will not get a configuration error when you comment out the appropriate LoadModule directive to disable that module temporarily or permanently.

Tuning Apache Configuration

Once you have a lean and mean Apache binary installed, you need to investigate your httpd.conf file for tuning issues that relate to Apache configuration. I will discuss a few common issues.

Set Simultaneous Hit Limits

To begin, look at your MaxClient directive value. Depending on your Multi-processing Module (MPM) choice (prefork, worker, etc.), the meaning of MaxClient will change. In the prefork MPM model, the MaxClient value represents the maximum number of Apache child processes that will be run to service requests.

In the worker MPM model, this directive value means the maximum number of simultaneous client connections. The default value is 150 for both prefork and worker. If you want to service more simultaneous requests in either model, you need to increase this value. However, if you increase it beyond 255, you will need to set the ServerLimit directive for your MPM. For example:

<IfModule worker.c>
   StartServers        10
   MaxClients          300
   MinSpareThreads     50
   MaxSpareThreads     75
   ThreadsPerChild     20
   MaxRequestsPerChild 0
   ServerLimit         300
</IfModule>
If you have buggy, bulky Web applications that have a tendency to leak memory, consider using a nonzero value for MaxRequestsPerChild directive to allow the child process or thread to be killed after a few requests. This will give your Apache server better uptime.

Reduce Disk I/O

Say you request http://server/foo/bar/index.html page from an Apache server. You would expect the server to read that index.html file from the document root directory and return the contents via the network to your Web browser and be done with it, right? Well, Apache has to do more than that. If the document root is /usr/local/httpd/htdocs, by default, it checks the existence of the following files:

/.htacces
/usr/.htaccess
/usr/local/.htaccess
/usr/local/httpd/.htaccess
/usr/local/httpd/htdocs/.htaccess
/usr/local/httpd/htdocs/foo/.htaccess
/usr/local/httpd/htdocs/bar/.htaccess
If it finds any of these files, Apache reads them to see whether the request you made can be serviced. A URL that requests a single file requires many disk accesses, which are a performance drain for high-volume sites with many static Web pages. In such cases, the best choice is to disable .htaccess file checking altogether. This can be done by the following configuration:

<Directory />
   AllowOverride None
</Directory>
When the above configuration is used, Apache will simply perform a single disk I/O to read the requested static file and therefore gain performance in high-volume access scenarios.

Similarly, if you have instructed Apache in httpd.conf not to follow symbolic links (for good security reasons), using a configuration such as this:

<Directory />
  Options -FollowSymLinks
</Directory>
you would be paying a performance penalty for your good security practice. Every time a file is requested, Apache will perform a system call to ensure that it is not violating your symbolic link restriction as stated above.

If this performance price is too high, but you still want good security, you can decide not to use symbolic links at all in your document tree but still enable symbolic links:

<Directory />
  Options FollowSymLinks
</Directory>
You should only enable this once you have ensured there are no symbolic links that could result in exposure to Web visitors. The best choice is to completely remove all symbolic links and replace them with a new directory structure free of links. To find what links you have on your current document tree, run the following command from your document root directory:

# find  .  -type l -print
This will show all the symbolic links in your Web space.

Stop Asking for Names!

Do not set HostnameLookups directive to On to find host names of your Web visitors. This will slow down your Web server significantly as it has to use the DNS subsystem to query for a hostname for the IP address of the Web request.

Speak Many Languages Fluently

Using automatic content negotiation sounds wonderful, but it comes with a performance price. If you store multiple language pages (e.g., index.html.en (English), index.html.fr (French), index.html.jp (Japanese), etc.) in your document tree and allow Apache to determine the right contents for the Web browser based on content-negotiation headers, responses will slow down because Apache must make decisions about which file is appropriate each time it gets a request.

If you must serve multiple language contents, the better approach is to keep different language contents in different directories and redirect the user to the right directory on first request. For example, as soon as a Web visitor hits the main Web server (http://yourserver/), Apache would detect the preferred content based on the client supplied headers using a PHP or Perl script and switch the user to http://yourserver/fr/ for French pages, http://yourserver/jp/ for Japanese pages, etc. In these language-specific directories, keep only a desired language version of the contents. This way, with automatic content negotiation disabled (with "--disable-negotiation"), the Apache server will serve the pages from within the correct language directory and work more efficiently.

There might be other directives that you need to tune to get more juice out of your Apache server. The rule of thumb in configuring Apache is to know your directives well. Read all directive documentation and make sure you know what penalties a directive might have on your performance or security.

Measure Your Efforts

Once you have tuned Apache by recompiling and reconfiguring it as mentioned above, you can measure how it performs with the nifty ab, Apache Benchmark, tool that comes with the Apache distribution. By default, ab is installed in the bin directory. The ab tool allows you to perform stress tests on your server. For example, you can run it as follows:

# ./ab -n 1000 http://yourserver/
from the Apache bin directory to send 1000 requests to your Web server. However, running this tool from the Apache server itself will skew your results because the ab tool alone will consume a lot of your system resources. So, for best results, always run ab from another system in the same network. A sample, abridged result of the above command is shown below:

Finished 1000 requests
Server Software:        Apache/2.0.43
Server Hostname:        localhost
Server Port:            80
Document Path:          /
Document Length:        1456 bytes
Concurrency Level:      1
Time taken for tests:   1.427968 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      1722000 bytes
HTML transferred:       1456000 bytes
Requests per second:    700.30 [#/sec] (mean)
Time per request:       1.428 [ms] (mean)
Time per request:       1.428 [ms] (mean, across all concurrent requests)
Transfer rate:          1177.20 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   2.5      0      57
Processing:     0    0   5.0      0      71
Waiting:        0    0   4.2      0      71
Total:          0    0   5.6      0      71

Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      0
  80%      0
  90%      0
  95%      1
  98%      1
  99%      3
 100%     71 (longest request)
You can run many simultaneous connections to simulate a particular scenario. For example:

# ./ab -n 1000 -c 50 http://yourserver/
Here ab will run 50 simultaneous requests to complete 1000 requests for the home page of the named Web server. You can run a few of these tests to see how your Web server performs.

Performance tuning is not a quick-and-easy job, it takes time to get it right. And when you do, you will find out your organization has still more load demands and more complex requirements for you to meet. So treat it like an ongoing process and improve it on a regular basis. You must have a good understanding of how your server is used by Web visitors and where the bottlenecks are. I hope this article will get you started in the right direction.

Mohammed J. Kabir (prefers to be called Kabir) is the founder and CEO of EVOKNOW, Inc. Kabir is a strong believer in process-centric software development. He has spent many years in software development for startups and large US companies such as Lucent, BlueShield, Go America, AMI News, Outdoors.net, etc. He also served as the chief technologist for many US companies where he managed numerous software development projects. Kabir is also an IT author. Some of his most recent books include: Red Hat Linux Security and Optimization (Red Hat Press), Red Hat Linux Survival Guide (Red Hat Press), Red Hat Linux Server 7 (Hungry Minds (IDG Worldwide)), Red Hat Linux Server 6 (Hungry Minds), Red Hat Linux Server Administrator's Handbook (IDG Worldwide), Apache Server 2 (Hungry Minds), Apache Server Bible (IDG Worldwide), Apache Server Administrator's Handbook (IDG Worldwide), The SuSE Linux Server (M&T Books), and CGI Primer Plus for Windows (Macmillan (Waite Group)). Kabir can be contacted at: kabir@evoknow.com.