Proxying a WordPress Blog on a Sub Path instead of a Sub Domain

We recently migrated over the website portion of Ziggeo to AWS CloudFront and S3. For SEO purposes, we wanted to have our Wordpress-based blog under /blog instead of under blog.ziggeo.com, so we needed to somehow proxy it through CloudFront.

We are hosting the WordPress-based blog on WPEngine which currently does not support hosting WordPress blogs under sub paths like /blog so we had to solve this challenge as well.

This post might help you to set up a WordPress blog on a sub path in a similar environment - it does not have to be via CloudFront, this is just an additional complication in our scenario.

In our CloudFront configuration, we set up the WPEngine origin like so:

As you can see, the blog originally is accessible via / - which brings us to one of the first issues. We don't want this domain to really be publicly accessible for SEO purposes and duplicated content.

For this reason, we'll need the servers running / to be able to distinguish between requests coming from CloudFront and requests from anywhere else on the internet. We solve this issue by changing the referer coming from CloudFront to ziggeoAWS so we can later identify requests easily as coming from CloudFront.

As for the CloudFront behavior, we have to set it up like so:

A few settings here are worthwhile to note:

  1. The path pattern allows us to have multiple "main" routes on / and we only "proxy" routes beginning with /blog to the WordPress blog.
  2. We allow all HTTP Methods. This is not surprising per se as we need methods like POST and PUT on the WordPress dashboard, but it is interesting to note that CloudFront fully forwards HTTP request payloads to the origin (which is what we need, of course).
  3. We need to make sure that caching is based on all request headers instead of none.
  4. We obviously need to forward cookies to allow for WordPress session credential management.

Note that we didn't specify an origin path in the Origin Settings - this would only help if we want to e.g. map / to /blog but not the other way around. There is no built-in way in CloudFront to map a sub path on the main domain to an origin without the sub path, so we needed to create a Lambda@Edge function that does this mapping for us:

The Lambda@Edge origin request function has the following code:

'use strict';

exports.handler = (event, context, callback) => {
   const request = event.Records[0].cf.request;

   if (request.uri === "/blog") {
       callback(null, {
           status: 301,
           headers: {
               location: [{
                   key: "Location",
                   value: "/blog/"
               }]
           }
       });
       return;
   }

   request.uri = request.uri.replace(/^\/blog/,'');
   return callback(null, request);
};

The main objective of this function is in the third to last line - removing the /blog part from the request uri. We also handle an edge case here where somebody is exactly requesting /blog but should really be requesting /blog/ - note that WordPress normalizes urls to end in / so we should respect that as well - more on that later.

Next, we'll have a look at the .htaccess file we put on the Apache WPEngine server running WordPress:

RewriteEngine On
RewriteBase /

# If a browser is requesting / instead of AWS, redirect to /blog
RewriteCond %{HTTP_REFERER} !ziggeoAWS [NC]
RewriteRule ^(.*)$ /blog/$1 [L,R=301]

# If the URL is GET-requested and is not a file/directory and does not end in /, redirect to use /
RewriteCond %{REQUEST_METHOD} GET
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*[^/])$ /blog/$1/ [L,R=301]

# If the URL ends in ziggeoAWS/, remove it by redirection (happens when deleting items)
RewriteRule ^(.*)/ziggeoAWS/$ /blog/$1/ [L,R=301]

# Redirect date posts
RewriteRule ^\d\d\d\d/\d\d/\d\d/(.*)$ /blog/$1 [L,R=301]

# Standard WordPress stuff
RewriteRule ^index\.php$ - [L]

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]

Let's dissect the different logical blocks one by one:

RewriteCond %{HTTP_REFERER} !ziggeoAWS [NC]
RewriteRule ^(.*)$ /blog/$1 [L,R=301]

This refers back to our previous discussion in the origin settings to set the request referer to ziggeoAWS so we can identify calls coming from CloudFront on the origin. In this particular case, we redirect the request to /blog/* if the request is not coming from CloudFront.

All further blocks assume that the request has been forwarded by CloudFront.

RewriteCond %{REQUEST_METHOD} GET
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*[^/])$ /blog/$1/ [L,R=301]

This block handles url normalization with trailing slashes. We only apply url normalization to GET requests that do not reference particular files (like assets) or directories, and make sure to add a trailing slash if not already present.

RewriteRule ^(.*)/ziggeoAWS/$ /blog/$1/ [L,R=301]

This next rule works around some of the consequences of us overwriting the referer in CloudFront. Some WordPress actions trigger a redirect url that includes the referer - which in our case then results in a 404 as the referer is not a particularly proper url. We catch this case in the block and just redirect back to actual blog url.

RewriteRule ^\d\d\d\d/\d\d/\d\d/(.*)$ /blog/$1 [L,R=301]

We also had the objective in our particular case to normalize old routes with date-like urls in order to avoid duplicated content.

The remainder of the .htaccess file is pretty standard for a WordPress installation.

Within the WordPress dashboard itself, we make sure that both the Site Address as well as the WordPress Address point to /blog

This almost makes everything work, except for issues arising from the use of the internal WordPress function wp_admin_canonical_url that, instead of using the configured Addresses uses the request host - which in our case would be /; you can read up more on this issue here. A workaround for this is to (manually) install this plugin on your WordPress blog.

PREV NEXT