Technology

Case Insensitive Last Resort 404 Page

| #404 | #code | #linux | #nginx | #php | #server | #windows |


I’ve been running my own webserver for a very long time, using Aprelium’s Abyss webserver on a Windows box. Recently I started making the transition to a Linux server, and while most of the changeover has been relatively painless, suddenly having to deal with twenty years of links that didn’t care about upper or lower case in URLs is a hassle. The fuck do you mean Nintendo.jpg isn’t the same as nintendo.jpg!? The fuck not!?

Anyway. I can’t rename the file every time, because that might not solve two referring links that use two versions of the target, and I can’t change every referring link because I’m not Sisyphus. I’d always figured that, when the time came, I could put some last-resort logic in the 404 page, so that if the webserver (nginx in this case) couldn’t find the file based on the strict reading of the client request, the 404 page could try and sort it out.

This script is the result. I knocked it together in an hour or two last night (and another two hours this morning… dammit) and it seems to be working perfectly. In a nutshell, it breaks the client request into chunks separated by the forward slash (/dir1/dir2/file.ext) and iterates through each chunk, trying to match it to an existing directory. For each chunk, it compares the lower case request against a lower case list of directories and files.

Every time it makes a match, it appends the matched chunk to the known good path so far, and continues to the next chunk. For example, if the client has asked for a file that’s .JPG instead of .jpg, it’ll quickly match /dir1 and /dir2, then get a list of files in dir2, and compare their lower case names against the lower case request chunk. If it finds a match, tacks the successfully matched name (with proper case) onto the working URL, and proceeds with the rest of the matches. If no match can be found based on the current directory’s list, it breaks out of the compare.

Before each search, it assumes failure ($giveup = 1) but resets this to zero if it finds a match, so processing can continue to the next chunk. When the processing is done, if the failure flag is 1 the 301 (moved permanently) is skipped, delivering the 404 (not found) header and the 404 error page where required. If the failure flag is still zero, we know a match was found and a 301 redirect is served instead.

If you’ve got a better method, or ways to improve this one, please leave a comment. If you want to leave snarky case-in-filesystems or Windows-v-Linux religious blatherings, I’ll politely ask you to fuck off into the ocean first. ^_^

<?
 
// VERSION 1.4
   - Now handles filenames with spaces
 
//  A bit of PHP magic to hunt down files when the requested case doesn't match the path or file.
//  It was a bit necessary when moving a 22 year old website from a Windows server to a Linux server.
//  Suddenly case mattered.  =(
 
$froot = $_SERVER['DOCUMENT_ROOT'];            // Get the file root ( /var/www/whatever )
$source = urldecode($_SERVER['REQUEST_URI']);             // Get the actual request ( /dir1/dir2/file.ext )
$path = $froot.'/';                            // Start with the base file path ( var/www/whatever/ )
$partcounter = 0;                              // Count the matched parts of request path
 
$filereq = explode ("/",$source);              // Break the request into chunks ( /dir1 /dir2 etc)
$source = '';                                  // Blank the source for a re-created request path
$giveup = 0;                                   // Making sure we start off with optimism!
 
foreach ($filereq as $chunk) {
 
  if ($giveup == 1) break;                     // Give up if the last process failed to reset the flag
 
  $chunk = rtrim($chunk, '/') ;                // Remove the trailing slash if it exists
 
  if ($chunk !== '') {                         // The first part of the request is blank, skip it.
 
    $giveup = 1;
 
    if (file_exists($path.$chunk)) {           // Does the current path match a file/dir ?
 
      $giveup = 0;                             // Carry on processing, we're good so far
      $path = $path.$chunk;                    // Update the working path
 
      if (is_dir($path)) {                     // If it's a dir, make sure it ends in /
        $path = $path.'/';
      };
 
    } else {                                   // No match, so:
 
$pathchunk = escapeshellarg($path.$chunk);     // adds quotes for processing 
 
      $list = glob($path.'*',GLOB_NOSORT);     // Get a file list from the current path
 
      foreach ($list as $checkfile) {          // Check every file found
 
$checkfile = escapeshellarg($checkfile);       // adds quotes for comparing
 
                                               // Next line: force lower case for both, and compare:
        if (strtolower($pathchunk) == strtolower($checkfile)) {
 
                                               // Remove those quotes we just added
          $checkfile = str_replace("'",'',$checkfile);
          $path = $checkfile;                  // Update $path: $checkfile is our known good path so far
 
          if (is_dir($path)) {                 // If it's a dir, make sure it ends in /
            $path = $path.'/';
          };
 
          $giveup = 0;                         // NEVER GIVE UP, NEVER SURRENDER
          break;                               // Break out of the compare, no point going farther
 
        }  // EndIf match files
      }  // End ForEach files in dir
    }  // EndIf request no match
  }  // EndIf Chunk != ''
 
  if ($giveup == 1)  break;                    // We couldn't match a file or dir, real 404 detected
 
}  // End Processing
 
 
if ($giveup == 0) {                            // NEVER GAVE UP, NEVER SURRENDERED
 
  $path = str_replace($froot,'',$path);        // This is our new re-created web path, minus the file root
 
header("HTTP/1.1 301 Moved Permanently");
header("Location: ".$_SERVER['REQUEST_SCHEME'].'://'.$_SERVER['HTTP_HOST'].$path);
 
} else {
 
header($_SERVER["SERVER_PROTOCOL"]." 404 Not Found");
?>
<html>
<head>
<title> 404 Error Page - File or Document Not Found </title>
</head>
<body style="background-color: black; color: white;">
<img src="/system/404.gif">
</body>
</html>
 
<? } ?>
--NFG
[ Mar 18 2018 ]
Navigation

Got something to add?

Your Comment
Name:
Email:
Website: