Serving Index Pages From non-root Locations With AWS CloudFront

Note: Adapted from someguyontheinter.net, I grabbed the content from web caches as the site appears to have been taken offline, but I did find it useful, so thought it might be worth re-creating.

So, I was doing a quick experiment with host this site in static form in AWS S3, details on how that works are readily available, so I’ll not go into that here. Once you’ve got a static website it’s not hard to add a CloudFront distribution in front of it for content caching and other CDN stuff.

Once setup and with the DNS entries in place, the Cloudfront distribution will present cached copies of your website in S3, and if you’ve got a flat site structure, such as this example below;

http://website-bucket.s3-website-eu-west-1.amazonaws.com/content.html

this will work fine.

However, if you have data in subfolders, ie. non-root locations, for example if there was a folder in the bucket called, “subfolder” such as the example here;

http://website-bucket.s3-website-eu-west-1.amazonaws.com/subfolder/

and you want to be able to browse to

https://your-site.tld/subfolder/

and have the server automatically serve out the index page from within this folder, you’ll find you get a 403 error from CloudFront. This problem comes about as S3 doesn’t really have a folder structure, but rather has a flat structure of keys and values with lots of cleverness that enables it to simulate a hierarchical folder structure. So your request to CloudFront gets converted into, “hey S3, give me the object whose key is subfolder/“, to which S3 correctly replies, “that doesn’t exist”.

When you enable S3’s static website hosting mode, however, some additional transformations are performed on inbound requests; these transformations include the ability to translate requests for a “directory” to requests for the default index page inside that “directory”, which is what we want to happen, and this is the key to the solution.

In brief: when setting up your CloudFront distribution, don’t set the origin to the name of the S3 bucket; instead, set the origin to the static website endpoint that corresponds to that S3 bucket. Amazon are clear there is a difference here, between REST API endpoints and static website endpoints, but they’re only looking at 403 errors coming from the root in that document.

So, assuming you’ve already created the static site in S3 and that can be accessed on the usual http://website-bucket.s3-website-eu-west-1.amazonaws.com URL, it’s example time;

  1. Create a new CloudFront distribution.
  2. When creating the CloudFront distribution, set the origin hostname to the static website endpoint and do NOT let the AWS console autocomplete a S3 bucket name for you, and do not follow the instructions that say “For example, for an Amazon S3 bucket, type the name in the format bucketname.s3.amazonaws.com”.
  3. Also, do not configure a default root object for the CloudFront distribution, we’ll let S3 handle this
  4. Configure the desired hostname for your site, such as your-site.tld as an alternate domain name for the CloudFront distribution.
  5. Finish creating the CloudFront distribution; you’ll know you’ve done it correctly if the Origin Type of the origin is listed as “Custom Origin”, not “S3 Origin”.
  6. While the CloudFront distribution is deploying, set up the necessary DNS entries, either directly to the CloudFront distribution in Route 53 or as a CNAME in whatever DNS provider is hosting the zone for your domain.

Once your distribution is fully deployed and the A record has propagated, browse around in your site and you should see all of your content, and it’ll be served out from CloudFront. Essentially what’s happening is CloudFront is acting as a simple caching reverse proxy, and all of the request routing logic is being implemented at S3, so you get the best of both worlds.

Note: nothing comes without a cost, and in this case the cost is that you must make all of your content visible to the public Internet, as though you were serving direct from S3, which means that it will be possible for others to bypass the CloudFront CDN and pull content directly from S3. So be careful to not put anything in the S3 bucket that you don’t want to publish.

If you need to use the feature of CloudFront that enables you to leave your S3 bucket with restricted access, using CloudFront as the only point of entry, then this method will not work for you.