Writing Apache Modules with Perl and C

Writing Apache Modules with Perl and C

By:	Lincoln Stein and Doug MacEachern
Published:	O'Reilly & Associates, Inc. - March 1999

Show Contents Previous Page Next Page

Chapter 7 - Other Request Phases / The Header Parser Phase
Implementing an Unsupported HTTP Method

One nontrivial use for the header parser phase is to implement an unsupported HTTP request method. The Apache server handles the most common HTTP methods, such as GET, HEAD, and POST. Apache also provides hooks for managing the less commonly used PUT and DELETE methods, but the work of processing the method is left to third-party modules to implement. In addition to these methods, there are certain methods that are part of the HTTP/1.1 draft that are not supported by Apache at this time. One such method is PATCH, which is used to change the contents of a document on the server side by applying a "diff" file provided by the client.²

This section will show how to extend the Apache server to support the PATCH method. The same techniques can be used to experiment with other parts of HTTP drafts or customize the HTTP protocol for special applications.

If you've never worked with patch files, you'll be surprised at how insanely useful they are. Say you have two versions of a large file, an older version named file.1.html and a newer version named file.2.html. You can use the Unix diff command to compute the difference between the two, like this:

% diff file.1.html file.2.html > file.diff

When diff is finished, the output file, file.diff, will contain only the lines that have changed between the two files, along with information indicating the positions of the changed lines in the files. You can examine a diff file in a text editor to see how the two files differ. More interestingly, however, you can use Larry Wall's patch program to apply the diff to file.1.html, transforming it into a new file identical to file.2.html. patch is simple to use:

% patch file.1.html < file.diff

Because two versions of the same file tend to be more similar than they are different, diff files are usually short, making it much more efficient to send the diff file around than the entire new version. This is the rationale for the HTTP/1.1 PATCH method. It complements PUT, which is used to transmit a whole new document to the server, by sending what should be changed between an existing document and a new one. When a client requests a document with the PATCH method, the URI it provides corresponds to the file to be patched, and the request's content is the diff file to be applied.

Example 7-5 gives the code for the PATCH handler, appropriately named Apache::PATCH. It defines both the server-side routines for accepting PATCH documents, and a small client-side program to use for submitting patch files to the server.

package Apache::PATCH;
# file: Apache/PATCH.pm

use strict;
use vars qw($VERSION @EXPORT @ISA);
use Apache::Constants qw(:common BAD_REQUEST);
use Apache::File ();
use File::Basename 'dirname';

@ISA = qw(Exporter);
@EXPORT = qw(PATCH);
$VERSION = '1.00';

use constant PATCH_TYPE => 'application/diff';
my $PATCH_CMD = "/usr/local/bin/patch";

We begin by pulling in required modules, including Apache::File and File::Basename. We also bring in the Exporter module. This is not used by the server-side routines but is needed by the client-side library to export the PATCH() subroutine. We now declare some constants, including a MIME type for the submitted patch files, the location of the patch program on our system, and two constants that will be used to create temporary scratch files.

The main entry point to server-side routines is through a header parsing phase handler named handler(). It detects whether the request uses the PATCH method and, if so, installs a custom response handler to deal with it. This means we install the patch routines with this configuration directive:

PerlHeaderParserHandler Apache::PATCH

The rationale for installing the patch handler with the PerlHeaderParserHandler directive rather than PerlTransHandler is that we can use the former directive within directory sections and .htaccess files, allowing us to make the PATCH method active only for certain parts of the document tree.

The definition of handler() is simple:

sub handler {
   my $r = shift;
   return DECLINED unless $r->method eq 'PATCH';
   unless ($r->some_auth_required) {
      $r->log_reason("Apache::PATCH requires access control");
      return FORBIDDEN;
   }
   $r->handler("perl-script");
   $r->push_handlers(PerlHandler => \&patch_handler);
   return OK;
}

We recover the request object and call method() to determine whether the request method equals PATCH. If not, we decline the transaction. Next we perform a simple but important security check. We call some_auth_required() to determine whether the requested URI is under password protection. If the document is not protected, we log an error and return a result code of FORBIDDEN. This is a hardwired insurance that the file to be patched is protected in some way using any of the many authentication modules available to Apache (see Chapter 6, Authentication and Authorization, for a few).

If the request passes the checks, we adjust the content handler to be the patch_handler() subroutine by calling the request object's handler() and push_handlers() methods. This done, we return OK, allowing other installed header parsers to process the request.

The true work of the module is done in the patch_handler() subroutine, which is called during the response phase:

sub patch_handler {
   my $r = shift;
   return BAD_REQUEST
      unless lc($r->header_in("Content-type")) eq PATCH_TYPE;

This subroutine recovers the request object and immediately checks the content type of the submitted data. Unless the submitted data has MIME type application/diff, indicating a diff file, we return a result code of BAD_REQUEST.

    # get file to patch
   my $filename = $r->filename;
   my $dirname = dirname($filename);
   my $reason;
   do {
      -e $r->finfo or $reason = "$filename does not exist", last;
      -w _         or $reason = "$filename is not writable", last;
      -w $dirname  or $reason = "$filename directory is not writable", last;
   };
   if ($reason) {
      $r->log_reason($reason);
      return FORBIDDEN;
   }

Next we check whether the patch operation is likely to succeed. In order for the patch program to work properly, both the file to be patched and the directory that contains it must be writable by the current process.³ This is because patch creates a temporary file while processing the diff and renames it when it has successfully completed its task. We recover the filename corresponding to the request and the name of the directory that contains it. We then subject the two to a series of file tests. If any of the tests fails, we log the error and return FORBIDDEN.

   # get patch data
   my $patch;
   $r->read($patch, $r->header_in("Content-length"));

   # new temporary file to hold output of patch command
   my($tmpname, $patch_out) = Apache::File->tmpfile;
   unless($patch_out) {
      $r->log_reason("can't create temporary output file: $!");
      return FORBIDDEN;
   }

The next job is to retrieve the patch data from the request. We do this using the request object's read() method to copy Content-length bytes of patch data from the request to a local variable named $patch. We are about to call the patch command, but before we do so we must arrange for its output (both standard output and standard error) to be saved to a temporary file so that we can relay the output to the user. We call the Apache::File method tmpfile() to return a unique temporary filename. We store the temporary file's name and handle into variables named $tmpname and $patch_out, respectively. If for some reason tmpfile() is unable to open a temporary file, it will return an empty list. We log the error and return FORBIDDEN.

    # redirect child processes stdout and stderr to temporary file
   open STDOUT, ">&=" . fileno($patch_out);

We want the output from patch to go to the temporary file rather than to standard output (which was closed by the parent server long, long ago). So we reopen STDOUT, using the >&= notation to open it on the same file descriptor as $patch_out.⁴ See the description of open() in the perlfunc manual page for a more detailed description of this facility.

    # open a pipe to the patch command

    local $ENV{PATH}; #keep -T happy

    my $patch_in = Apache::File->new("| $PATCH_CMD $filename 2>&1");

    unless ($patch_in) {

       $r->log_reason("can't open pipe to $PATCH_CMD: $!");

       return FORBIDDEN;

At this point we open up a pipe to the patch command and store the pipe in a new filehandle named $patch_in. We call patch with a single command-line argument, the name of the file to change stored in $filename. The piped open command also uses the 2>&1 notation, which is the Bourne shell's arcane way of indicating that standard error should be redirected to the same place that standard output is directed, which in this case is to the temporary file. If we can't open the pipe for some reason, we log the error and exit.

    # write data to the patch command
   print $patch_in $patch;
   close $patch_in;
   close $patch_out;

We now print the diff file to the patch pipe. patch will process the diff file and write its output to the temporary file. After printing, we close the command pipe and the temporary filehandle.

    $patch_out = Apache::File->new($tmpname);

    # send the result to the user
   $r->send_http_header("text/plain");
   $r->send_fd($patch_out);
   close $patch_out;
   return OK;
}

The last task is to send the patch output back to the client. We send the HTTP header, using the convenient form that allows us to set the MIME type in a single step. We now send the contents of the temporary file using the request method's send_fd() method. Our work done, we close the temporary filehandle and return OK.⁵

Example 7-5. Implementing the PATCH Method

package Apache::PATCH;
# file: Apache/PATCH.pm

use strict;
use vars qw($VERSION @EXPORT @ISA);
use Apache::Constants qw(:common BAD_REQUEST);
use Apache::File ();
use File::Basename 'dirname';

@ISA = qw(Exporter);
@EXPORT = qw(PATCH);
$VERSION = '1.00';

use constant PATCH_TYPE => 'application/diff';
my $PATCH_CMD = "/usr/local/bin/patch";

sub handler {
   my $r = shift;
   return DECLINED unless $r->method eq 'PATCH';
   unless ($r->some_auth_required) {
      $r->log_reason("Apache::PATCH requires access control");
      return FORBIDDEN;
   }
   $r->handler("perl-script");
   $r->push_handlers(PerlHandler => \&patch_handler);
   return OK;
}

sub patch_handler {
   my $r = shift;
   return BAD_REQUEST
      unless lc($r->header_in("Content-type")) eq PATCH_TYPE;

    # get file to patch
   my $filename = $r->filename;
   my $dirname = dirname($filename);
   my $reason;
   do {
      -e $r->finfo or $reason = "$filename does not exist", last;
      -w _         or $reason = "$filename is not writable", last;
      -w $dirname  or $reason = "$filename directory is not writable", last;
   };
   if ($reason) {
      $r->log_reason($reason);
      return FORBIDDEN;
   }

    # get patch data
   my $patch;
   $r->read($patch, $r->header_in("Content-length"));

    # new temporary file to hold output of patch command
   my($tmpname, $patch_out) = Apache::File->tmpfile;
   unless($patch_out) {
      $r->log_reason("can't create temporary output file: $!");
      return FORBIDDEN;
   }

   # redirect child processes stdout and stderr to temporary file
   open STDOUT, ">&=" . fileno($patch_out);

    # open a pipe to the patch command
   local $ENV{PATH}; #keep -T happy
   my $patch_in = Apache::File->new("| $PATCH_CMD $filename 2>&1");
   unless ($patch_in) {
      $r->log_reason("can't open pipe to $PATCH_CMD: $!");
      return FORBIDDEN;
   }
   # write data to the patch command
   print $patch_in $patch;
   close $patch_in;
   close $patch_out;

    $patch_out = Apache::File->new($tmpname);

    # send the result to the user
   $r->send_http_header("text/plain");
   $r->send_fd($patch_out);
   close $patch_out;

    return OK;
}

# This part is for command-line invocation only.
my $opt_C;

sub PATCH {
   require LWP::UserAgent;
   @Apache::PATCH::ISA = qw(LWP::UserAgent);

    my $ua = __PACKAGE__->new;
   my $url;
   my $args = @_ ? \@_ : \@ARGV;

    while (my $arg = shift @$args) {
      $opt_C = shift @$args, next if $arg eq "-C";
      $url = $arg;
   }

    my $req = HTTP::Request->new('PATCH' => $url);

    my $patch = join '', <STDIN>;
   $req->content(\$patch);
   $req->header('Content-length' => length $patch);
   $req->header('Content-type'   => PATCH_TYPE);
   my $res = $ua->request($req);

    if($res->is_success) {
      print $res->content;
   }
   else {
      print $res->as_string;
   }
}

sub get_basic_credentials {
   my($self, $realm, $uri) = @_;
   return split ':', $opt_C, 2;
}

1;
__END__

At the time this chapter was written, no web browser or publishing system had actually implemented the PATCH method. The remainder of the listing contains code for implementing a PATCH client. You can use this code from the command line to send patch files to servers that have the PATCH handler installed and watch the documents change in front of your eyes.

The PATCH client is simple, thanks to the LWP library. Its main entry point is an exported subroutine named PATCH():

sub PATCH {
   require LWP::UserAgent;
   @Apache::PATCH::ISA = qw(LWP::UserAgent);

    my $ua = __PACKAGE__->new;
   my $url;
   my $args = @_ ? \@_ : \@ARGV;

    while (my $arg = shift @$args) {
      $opt_C = shift @$args, next if $arg eq "-C";
      $url = $arg;
   }

PATCH() starts by creating a new LWP user agent using the subclassing technique discussed later in the Apache::AdBlocker module (see "Handling Proxy Requests" in this chapter). It recovers the authentication username and password from the command line by looking for a -C (credentials) switch, which is then stored into a package lexical named $opt_C. The subroutine shifts the URL of the document to patch off the command line and store it in $url.

    my $req = HTTP::Request->new('PATCH' => $url);
   my $patch = join '', <STDIN>;
   $req->content(\$patch);
   $req->header('Content-length' => length $patch);
   $req->header('Content-type'   => PATCH_TYPE);
   my $res = $ua->request($req);

The subroutine now creates a new HTTP::Request object that specifies PATCH as its request method and sets its content to the diff file read in from STDIN. It also sets the Content-length and Content-type HTTP headers to the length of the diff file and application/diff, respectively. Having set up the request, the subroutine sends the request to the remote server by calling the user agent's request() method.

    if($res->is_success) {
      print $res->content;
   }
   else {
      print $res->as_string;
   }
}

If the response indicates success (is_success() returns true) then we print out the text of the server's response. Otherwise, the routine prints the error message contained in the response object's as_string() method.

sub get_basic_credentials {
   my($self, $realm, $uri) = @_;
   return split ':', $opt_C, 2;
}

The get_basic_credentials() method, defined at the bottom of the source listing, is actually an override of an LWP::UserAgent method. When LWP::UserAgent tries to access a document that is password-protected, it invokes this method to return the username and password required to fetch the resource. By subclassing LWP::UserAgent into our own package and then defining a get_basic_credentials() method, we're able to provide our parent class with the contents of the $opt_C command-line switch.

To run the client from the command line, invoke it like this:

% perl -MApache::PATCH -e PATCH -- -C username:password\
  http://www.modperl.com/index.html < index.html.diff

Hmm...  Looks like a new-style context diff to me...
The text leading up to this was:
--------------------------
|*** index.html.new    Mon Aug 24 21:52:29 1998
|--- index.html        Mon Aug 24 21:51:06 1998
--------------------------
Patching file /home/httpd/htdocs/index.html using Plan A...
Hunk #1 succeeded at 8.
done

A tiny script named PATCH that uses the module can save some typing:

#!/usr/local/bin/perl

use Apache::PATCH;
PATCH;

__END__

Now the command looks like this:

% PATCH -C username:password \
http://www.modperl.com/index.html < index.html.diff

Footnotes

²Just two weeks prior to the production stage of this book, Script support for the PATCH method was added in Apache 1.3.4-dev.

³ In order for the PATCH method to work you will have to make the files and directories to be patched writable by the web server process. You can do this either by making the directories world-writable, or by changing their user or group ownerships so that the web server has write permission. This has security implications, as it allows buggy CGI scripts and other web server security holes to alter the document tree. A more secure solution would be to implement PATCH using a conventional CGI script running under the standard Apache suexec extension, or the sbox CGI wrapper (http://stein.cshl.org/WWW/software/sbox).

⁴ Why not just redirect the output of patch to the temporary file by invoking patch with the >$tmpname notation? Because this leaves us exposed to a race condition in which some other process replaces the temporary file with a link to a more important file. When patch writes to this file, it inadvertently clobbers it. Arranging for patch to write directly to the filehandle returned by tmpfile() avoids this trap.

⁵ Users interested in the HTTP PATCH method should also be aware of the IETF WebDAV (Distributed Authoring and Versioning) standard at http://www.ics.uci.edu/pub/ietf/webdav/ and Greg Stein's Apache module implementation of these protocol extensions at http://www.lyra.org/greg/mod_dav/. Show Contents Previous Page Next Page