home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Writing Apache Modules with Perl and C
By:   Lincoln Stein and Doug MacEachern
Published:   O'Reilly & Associates, Inc.  - March 1999

Copyright © 1999 by O'Reilly & Associates, Inc.


 


   Show Contents   Previous Page   Next Page

Chapter 3 - The Apache Module Architecture and API / The Handler API
Perl API Configuration Directives

This section lists the configuration directives that the Perl API makes available. Most of these directives install handlers, but there are a few that affect the Perl engine in other ways.

PerlRequire
PerlModule

These directives are used to load Perl modules and files from disk. Both are implemented using the Perl built-in require operator. However, there are subtle differences between the two. A PerlModule must be a "bareword," that is, a package name without any path information. Perl will search the @INC paths for a .pm file that matches the name.

Example:

PerlModule Apache::Plotter

This will do the same as either of the following Perl language statements:

require Apache::Plotter;
use Apache::Plotter ();

In contrast, the PerlRequire directive expects an absolute or relative path to a file. The Perl API will enclose the path in quotes, then pass it to the require function. If you use a relative path, Perl will search through the @INC list for a match.

Examples:

PerlRequire /opt/www/lib/directory_colorizer.pl
PerlRequire scripts/delete_temporary_files.pl

This will do the same as the following Perl language statement:

require '/opt/www/lib/directory_colorizer.pl';
require 'scripts/delete_temporary_files.pl';

As with modules and files pulled in directly by the require operator, PerlRequire and PerlModule also require the modules to return a true value (usually 1) to indicate that they were evaluated successfully. Like require, these files will be added to the %INC hash so that it will not be evaluated more than once. The Apache::StatINC module and the PerlFreshRestart directive can alter this behavior so modules can be reloaded.

Both directives will accept any number of modules and files:

PerlModule CGI LWP::Simple Apache::Plotter
PerlRequire scripts/startup.pl scripts/config.pl

All PerlModule and PerlRequire files will be loaded during server startup by mod_perl during the module_init phase. The value of the ServerRoot directive is added to the @INC paths by mod_perl as an added convenience.

Remember that all the code that is run at server initialization time is run with root privileges when the server is bound to a privileged port, such as the default 80. This means that anyone who has write access to one of the server configuration files, or who has write access to a script or module that is loaded by PerlModule or PerlRequire, effectively has superuser access to the system. There is a new PerlOpmask directive and PERL_ OPMASK_DEFAULT compile time option, currently in the experimental stages, for disabling possible dangerous operators.

The PerlModule and PerlRequire directives are also permitted in .htaccess files. They will be loaded at request time and be run as the unprivileged web user.

PerlChildInitHandler

This directive installs a handler that is called immediately after a child process is launched. On Unix systems, it is called every time the parent process forks a new child to add to the flock of listening daemons. The handler is called only once in the Win32 version of Apache because that server uses a single-process model.

In contrast to the server initialization phase, the child will be running as an unprivileged user when this handler is called. All child_init handlers will be called unless one aborts by logging an error message and calling exit() to terminate the process.

Example:

PerlChildInitHandler Apache::DBLogin

This directive can appear in the main configuration files and within virtual host sections, but not within <Directory>, <Location>, or <Files> sections or within .htaccess files.

PerlPostReadRequestHandler

The post_read_request handler is called every time an Apache process receives an incoming request, at the point at which the server has read the incoming request's data and parsed the HTTP header fields but before the server has translated the URI to a filename. It is called once per transaction and is intended to allow modules to step in and perform special processing on the incoming data. However, because there's no way for modules to step in and actually contribute to the parsing of the HTTP header, this phase is more often used just as a convenient place to do processing that must occur once per transaction. All post_read_request handlers will be called unless one aborts by returning an error code or terminating the phase with DONE.

Example:

PerlPostReadRequestHandler Apache::StartTimer

This directive can appear in the main configuration files and within virtual host sections but not within <Directory>, <Location>, or <Files> sections or within .htaccess files. The reason for this restriction is simply that the request has not yet been associated with a particular filename or directory.

PerlInitHandler

When found at the "top-level" of a configuration file, that is, outside of any <Location>, <Directory>, or <Files> sections, this handler is an alias for PerlPost-ReadRequestHandler. When found inside one of these containers, this handler is an alias for PerlHeaderParserHandler described later. Its name makes it easy to remember that this is the first handler invoked when serving an HTTP request.

PerlTransHandler

The uri_translate handler is invoked after Apache has parsed out the request. Its job is to take the request, which is in the form of a partial URI, and transform it into a filename.

The handler can also step in to alter the URI itself, to change the request method, or to install new handlers based on the URI. The URI translation phase is often used to recognize and handle proxy requests; we give examples in Chapter 7.

Example:

PerlTransHandler Apache::AdBlocker

Apache will walk through the registered uri_translate handlers until one returns a status other than DECLINED. This is in contrast to most of the other phases, for which Apache will continue to invoke registered handlers even after one has returned OK.

Like PerlPostReadRequestHandler, the PerlTransHandler directive may appear in the main configuration files and within virtual host sections but not within <Directory>, <Location>, or <Files> sections or within .htaccess files. This is because the request has not yet been associated with a particular file or directory.

PerlHeaderParserHandler

After the URI translation phase, Apache again gives you another chance to examine the request headers and to take special action in the header_parser phase. Unlike the post_ read_request phase, at this point the URI has been mapped to a physical pathname. Therefore PerlHeaderParserHandler is the first handler directive that can appear within <Directory>, <Location>, or <Files> sections or within .htaccess files.

The header_parser phase is free to examine and change request fields in the HTTP header, or even to abort the transaction entirely. For this reason, it's common to use this phase to block abusive robots before they start chewing into the resources that may be required in the phases that follow. All registered header_parser handlers will be run unless one returns an error code or DONE.

Example:

PerlHeaderParserHandler Apache::BlockRobots
PerlAccessHandler

The access_checker handler is the first of three handlers that are involved in authentication and authorization. We go into this topic in greater depth in Chapter 6.

The access_checker handler is designed to do simple access control based on the browser's IP address, hostname, phase of the moon, or other aspects of the transaction that have nothing to do with the remote user's identity. The handler is expected to return OK to allow the transaction to continue, FORBIDDEN to abort the transaction with an unauthorized access error, or DECLINED to punt the decision to the next handler. Apache will continue to step through all registered access handlers until one returns a code other than DECLINED or OK.

Example:

PerlAccessHandler Apache::DayLimit

The PerlAccessHandler directive can occur anywhere, including <Directory> sections and .htaccess files.

PerlAuthenHandler

The authentication handler (sometimes referred to in the Apache documentation as check_ user_id) is called whenever the requested file or directory is password-protected. This, in turn, requires that the directory be associated with AuthName, AuthType, and at least one require directive. The interactions among these directives is covered more fully in Chapter 6.

It is the job of the authentication handler to check a user's identification credentials, usually by checking the username and password against a database. If the credentials check out, the handler should return OK. Otherwise the handler returns AUTH_REQUIRED to indicate that the user has not authenticated successfully. When Apache sends the HTTP header with this code, the browser will normally pop up a dialog box that prompts the user for login information.

Apache will call all registered authentication handlers, only ending the phase after the last handler has had a chance to weigh in on the decision or when a handler aborts the transaction by returning AUTH_REQUIRED or another error code. As usual, handlers may also return DECLINED to defer the decision to the next handler in line.

Example:

PerlAuthenHandler Apache::AuthAnon

PerlAuthenHandler can occur anywhere in the server configuration or in .htaccess files.

PerlAuthzHandler

Provided that the authentication handler has successfully verified the user's identity, the transaction passes into the authorization handler, where the server determines whether the authenticated user is authorized to access the requested URI. This is often used in conjunction with databases to restrict access to a document based on the user's membership in a particular group. However, the authorization handler can base its decision on anything that can be derived from the user's name, such as the user's position in an organizational chart or the user's gender.

Handlers for the authorization phase are only called when the file or directory is password-protected, using the same criteria described earlier for authentication. The handler is expected to return DECLINED to defer the decision, OK to indicate its acceptance of the user's authorization, or AUTH_REQUIRED to indicate that the user is not authorized to access the requested document. Like the authentication handler, Apache will try all the authorization handlers in turn until one returns AUTH_REQUIRED or another error code.

The authorization handler interacts with the require directive in a way described fully in Chapter 6.

Example:

PerlAuthzHandler Apache::AuthzGender

The PerlAuthzHandler directive can occur anywhere in the server configuration files or in individual .htaccess files.

PerlTypeHandler

After the optional access control and authentication phases, Apache enters the type_ checker phase. It is the responsibility of the type_checker handler to assign a provisional MIME type to the requested document. The assigned MIME type will be taken into consideration when Apache decides what content handler to call to generate the body of the document. Because content handlers are free to change the MIME types of the documents they process, the MIME type chosen during the type checking phase is not necessarily the same MIME type that is ultimately sent to the browser. The type checker is also used by Apache's automatic directory indexing routines to decide what icon to display next to the filename.

The default Apache type checker generally just looks up the filename extension in a table of MIME types. By declaring a custom type checker, you can replace this with something more sophisticated, such as looking up the file's MIME type in a document management database.

Because it makes no sense to have multiple handlers trying to set the MIME type of a file according to different sets of rules, the type checker handlers behave like content handlers and URI translation handlers. Apache steps through each registered handler in turn until one returns OK or aborts with an error code. The phase finishes as soon as one module indicates that it has successfully handled the transaction.

Example:

PerlTypeHandler Apache::MimeDBI

The PerlTypeHandler directive can occur anywhere in the server configuration or in .htaccess files.

PerlFixupHandler

After the type_checker phase but before the content handling phase is an odd beast called the fixup phase. This phase is a chance to make any last-minute changes to the transaction before the response is sent. The fixup handler's job is like that of the restaurant prep cook who gets all the ingredients cut, sorted, and put in their proper places before the chef goes to work. As an example alluded to earlier, mod_env defines a fixup handler to add variables to the environment from configured SetEnv and PassEnv directives. These variables are put to use by several different modules in the upcoming response phase, including mod_cgi, mod_include, and mod_perl.

All fixup handlers are run during an HTTP request, stopping only when a module aborts with an error code.

Example:

PerlFixupHandler Apache::HTTP::Equiv

The PerlFixupHandler directive can occur anywhere in the server configuration files or in .htaccess files.

PerlHandler

The next step is the content generation, or response phase, installed by the generic-sounding PerlHandler directive. Because of its importance, probably 90 percent of the modules you'll write will handle this part of the transaction. The content handler is the master chef of the Apache kitchen, taking all the ingredients assembled by the previous phases--the URI, the translated pathname, the provisional MIME type, and the parsed HTTP headers--whipping them up into a tasty document and serving the result to the browser.

Apache chooses the content handler according to a set of rules governed by the SetHandler , AddHandler, AddType, and ForceType directives. We go into the details in Chapter 4. For historical reasons as much as anything else, the idiom for installing a Perl content handler uses a combination of the SetHandler and PerlHandler directives:

<Directory /home/http/htdocs/compressed>
  SetHandler  perl-script
  PerlHandler Apache::Uncompress
</Directory>

The SetHandler directive tells Apache that the Perl interpreter will be the official content handler for all documents in this directory. The PerlHandler directive in turn tells Perl to hand off responsibility for the phase to the handler() subroutine in the Apache::Uncompress package. If no PerlHandler directive is specified, Perl will return an empty document.

It is also possible to use the <Files> and <FilesMatch> directives to assign mod_perl content handlers selectively to individual files based on their names. In this example, all files ending with the suffix .gz are passed through Apache::Uncompress:

<FilesMatch "\.gz$">
  SetHandler  perl-script
  PerlHandler Apache::Uncompress
</FilesMatch>

There can be only one master chef in a kitchen, and so it is with Apache content handlers. If multiple modules have registered their desire to be the content handler for a request, Apache will try them each in turn until one returns OK or aborts the transaction with an error code. If a handler returns DECLINED, Apache moves on to the next module in the list.

The Perl API relaxes this restriction somewhat, allowing several content handlers to collaborate to build up a composite document using a technique called "chaining." We show you how to take advantage of this feature in the next chapter.

The PerlHandler directive can appear anywhere in Apache's configuration files, including virtual host sections, <Location> sections, <Directory> sections, and <Files> sections. It can also appear in .htaccess files.

PerlLogHandler

Just before entering the cleanup phase, the log handler will be called in the logging phase. This is true regardless of whether the transaction was successfully completed or was aborted somewhere along the way with an error. Everything known about the transaction, including the original request, the translated file name, the MIME type, the number of bytes sent and received, the length of time the transaction took, and the status code returned by the last handler to be called, is passed to the log handler in the request record. The handler typically records the information in some way, either by writing the information to a file, as the standard logging modules do, or by storing the information into a relational database. Log handlers can of course do whatever they like with the information, such as keeping a running total of the number of bytes transferred and throwing out the rest. We show several practical examples of log handlers in Chapter 7.

All registered log handlers are called in turn, even after one of them returns OK. If a log handler returns an HTTP error status, it and all the log handlers that ordinarily follow it, including the built-in ones, will be aborted. This should be avoided unless you really want to prevent some transactions from being logged.

Example:

PerlLogHandler  Apache::LogMail

The PerlLogHandler directive can occur anywhere in the server configuration files or in .htaccess files.

PerlCleanupHandler

After each transaction is done, Apache cleans up. During this phase any module that has registered a cleanup handler will be called. This gives the module a chance to deallocate shared memory structures, close databases, clean up temporary files, or do whatever other housekeeping tasks it needs to perform. This phase is always invoked after logging, even if some previous handlers aborted the request handling process by returning some error code.

Internally the cleanup phase is different from the other phases we've discussed. In fact, there isn't really a cleanup phase per se. In the C API, modules that need to perform post-transaction housekeeping tasks register one or more function callbacks with the resource pool that they are passed during initialization. Before the resource pool is deallocated, Apache calls each of the module's callbacks in turn. For this reason, the structure of a cleanup handler routine in the C API is somewhat different from the standard handler. It has this function prototype:

void cleanup_handler (void* data);

We discuss how to register and use C-language cleanup handlers in Chapter 10.

The Perl API simplifies the situation by making cleanup handlers look and act like other handlers. The PerlCleanupHandler directive installs a Perl subroutine as a cleanup handler. Modules may also use the register_cleanup() call to install cleanup handlers themselves. Like other handlers in the Perl API, the cleanup subroutine will be called with the Apache request object as its argument. Unlike other handlers, however, a cleanup handler doesn't have to return a function result. If it does return a result code, Apache will ignore the value. An important implication of this is that all registered cleanup functions are always called, despite the status code returned by previous handlers.

Example:

PerlCleanupHandler  Apache::Plotter::clean_ink_cartridges

The PerlCleanupHandler directive can occur anywhere in the server configuration files or in .htaccess files.

PerlChildExitHandler

The last handler to be called is the child exit handler. This is called just before the child server process dies. On Unix systems the child exit handler will be called multiple times (but only once per process). On NT systems, the exit handler is called just once before the server itself exits.

Example:

PerlChildExitHandler  Apache::Plotter::close_driver
PerlFreshRestart

When this directive is set to On, mod_perl will reload all the modules found in %INC whenever the server is restarted. This feature is very useful during module development because otherwise, changes to .pm files would not take effect until the server was completely stopped and restarted.

The standard Apache::Registry module also respects the value of PerlFresh-Restart by flushing its cache and reloading all scripts when the server is restarted.

This directive can only appear in the main part of the configuration files or in <VirtualHost> sections.

PerlDispatchHandler
PerlRestartHandler

These two handlers are not part of the Apache API, but pseudophases added by mod_ perl to give programmers the ability to fine-tune the Perl API. They are rarely used but handy for certain specialized applications.

The PerlDispatchHandler callback, if defined, takes over the process of loading and executing handler code. Instead of processing the Perl*Handler directives directly, mod_perl will invoke the routine pointed to by PerlDispatchHandler and pass it the Apache request object and a second argument indicating the handler that would ordinarily be invoked to process this phase. If the handler has already been compiled, then the second argument is a CODE reference. Otherwise, it is the name of the handler's module or subroutine.

The dispatch handler should handle the request, which it will usually do by running the passed module's handler() method. The Apache::Safe module, currently under development, takes advantage of PerlDispatchHandler to put handlers into a restricted execution space using Malcom Beattie's Safe library.

Unlike other Perl*Handler directives, PerlDispatchHandler must always point to a subroutine name, not to a module name. This means that the dispatch module must be preloaded using PerlModule:

PerlModule Apache::Safe
<Files *.shtml>
 PerlDispatchHandler Apache::Safe::handler
</Files>

PerlRestartHandler points to a routine that is called when the server is restarted. This gives you the chance to step in and perform any cleanup required to tweak the Perl interpreter. For example, you could use this opportunity to trim the global @INC path or collect statistics about the modules that have been loaded.

   Show Contents   Previous Page   Next Page
Copyright © 1999 by O'Reilly & Associates, Inc.