Mark Gilbert's Tech Blog

Backing Up Data Centre Hosted Data To AWS

Currently at work, we backup the majority of our critical on-premises data to Azure, with some local retention onsite, as the majority of restores are needed for data within the last week or so. This is done using a combination of Microsoft Azure Backup Services (MABS) and the standalone Microsoft Azure Recovery Services (MARS) software and agents.

In time the majority of the on-prem data is likely to move to the cloud, but this takes time, with various products and business functions to move, so for now we still have a large data set that we need to backup from our data centres. The cloud all this on-prem data is moving to is and will continue to be AWS, but the backups happen into Azure, for mainly historical reasons.

We started looking at whether we could move this from Azure to AWS to reduce complexity, simplify billing and potentially reduce cost, because as I said, a lot of our estate runs in AWS already. We fairly quickly found out that the actual “AWS Backup” solution doesn’t really cover on-premises data directly, and from there things started to get complicated.

So we looked at the various iterations of Storage Gateway, including file gateway and volume gateway.

Tape gateways are pretty much ruled out as we don’t have any enterprise backup software that will write to tape storage, so would incur additional cost to purchase licences for that.

File Gateway does most of what we need as we can write backup output from things like MSSQL or MySQL servers running on-prem, to the file gateway presented volume and have that written back into S3 and backed up from there. However, as this can’t be throttled in terms of bandwidth and we don’t have a direct connect available for this means we can’t risk annihilating our data centre egress bandwidth.

Volume gateways would do what we need in terms of being able to present storage to a VM that backups are written to, and that can then be throttled and sent into S3. From there we’d have to pick that data up and move that via AWS Backup into a proper backup with proper retention policies attached, however as this bills as EBS rather than S3 storage, when we priced all this out it worked out considerably more expensive than our current solution of backing up into Azure, which again, pretty much rules this out as option.

Now, if only Amazon would add an on-prem option for AWS Backup, we’d be laughing – oh well, we can dream.

Serving Index Pages From non-root Locations With AWS CloudFront

Note: Adapted from someguyontheinter.net, I grabbed the content from web caches as the site appears to have been taken offline, but I did find it useful, so thought it might be worth re-creating.

So, I was doing a quick experiment with host this site in static form in AWS S3, details on how that works are readily available, so I’ll not go into that here. Once you’ve got a static website it’s not hard to add a CloudFront distribution in front of it for content caching and other CDN stuff.

Once setup and with the DNS entries in place, the Cloudfront distribution will present cached copies of your website in S3, and if you’ve got a flat site structure, such as this example below;

http://website-bucket.s3-website-eu-west-1.amazonaws.com/content.html

this will work fine.

However, if you have data in subfolders, ie. non-root locations, for example if there was a folder in the bucket called, “subfolder” such as the example here;

http://website-bucket.s3-website-eu-west-1.amazonaws.com/subfolder/

and you want to be able to browse to

https://your-site.tld/subfolder/

and have the server automatically serve out the index page from within this folder, you’ll find you get a 403 error from CloudFront. This problem comes about as S3 doesn’t really have a folder structure, but rather has a flat structure of keys and values with lots of cleverness that enables it to simulate a hierarchical folder structure. So your request to CloudFront gets converted into, “hey S3, give me the object whose key is subfolder/“, to which S3 correctly replies, “that doesn’t exist”.

When you enable S3’s static website hosting mode, however, some additional transformations are performed on inbound requests; these transformations include the ability to translate requests for a “directory” to requests for the default index page inside that “directory”, which is what we want to happen, and this is the key to the solution.

In brief: when setting up your CloudFront distribution, don’t set the origin to the name of the S3 bucket; instead, set the origin to the static website endpoint that corresponds to that S3 bucket. Amazon are clear there is a difference here, between REST API endpoints and static website endpoints, but they’re only looking at 403 errors coming from the root in that document.

So, assuming you’ve already created the static site in S3 and that can be accessed on the usual http://website-bucket.s3-website-eu-west-1.amazonaws.com URL, it’s example time;

Create a new CloudFront distribution.
When creating the CloudFront distribution, set the origin hostname to the static website endpoint and do NOT let the AWS console autocomplete a S3 bucket name for you, and do not follow the instructions that say “For example, for an Amazon S3 bucket, type the name in the format bucketname.s3.amazonaws.com”.
Also, do not configure a default root object for the CloudFront distribution, we’ll let S3 handle this
Configure the desired hostname for your site, such as your-site.tld as an alternate domain name for the CloudFront distribution.
Finish creating the CloudFront distribution; you’ll know you’ve done it correctly if the Origin Type of the origin is listed as “Custom Origin”, not “S3 Origin”.
While the CloudFront distribution is deploying, set up the necessary DNS entries, either directly to the CloudFront distribution in Route 53 or as a CNAME in whatever DNS provider is hosting the zone for your domain.

Once your distribution is fully deployed and the A record has propagated, browse around in your site and you should see all of your content, and it’ll be served out from CloudFront. Essentially what’s happening is CloudFront is acting as a simple caching reverse proxy, and all of the request routing logic is being implemented at S3, so you get the best of both worlds.

Note: nothing comes without a cost, and in this case the cost is that you must make all of your content visible to the public Internet, as though you were serving direct from S3, which means that it will be possible for others to bypass the CloudFront CDN and pull content directly from S3. So be careful to not put anything in the S3 bucket that you don’t want to publish.

If you need to use the feature of CloudFront that enables you to leave your S3 bucket with restricted access, using CloudFront as the only point of entry, then this method will not work for you.

My Experiences with consumer mesh wireless devices

Update 11/11/2020: I’ve followed up this post with two others, Ubiquiti Home Network Setup – part 1 and Ubiquiti Home Network Setup – part 2

UPDATE 15/08/2020: I’m currently testing Ubiquiti hardware, which initially seems much better, although I’ve not got as far as multiple AP’s yet as I’ve been away from home. The current setup for this is a Ubiquiti UDM, and I’ll add access points further down the line, as well as do a bit of a write-up of the setup.

Over the previous 8 weeks I’ve been experimenting with various mesh wireless systems, and thought it might be worth at least recording some of my thoughts on them.

The Problem to be solved

My house isn’t huge, but unfortunately as it was built in a time before internet connectivity was a consideration the main BT socket is in the hallway, where it’s completely useless as having the router there would either involve wall mounting it, which I don’t want to do, or having a small table in the hall right at the bottom of the stairs in front of the living room door, which I also don’t want to do. Thankfully the previous owner had taken care of this little trouble but drilling various holes through walls, meaning I can run the telephone cable through holes and under carpets to get to the router which is in the office room at the front of the house.

So far so normal, but this essentially means the router could not be any further towards the front of the house without it being outside on the drive. To further complicate things, the way the room is setup in terms of desk layout for me and my partner to work, we have a large wine fridge next to desk, so the router is immediately obscured by a large metal object.

Because of the direction the garden faces we have our outside table setup down at the far end of the garden, putting this as far away as possible from the router that it could be and still be on my property, and by the time you get down there the line of sight to the router is obstructed by a fridge, three internal walls, and an external wall.

Reception down there was patchy at best, and video calls were out of the question, which meant working from outside in the garden wasn’t an option.

We also wanted either me or my partner to be able to move quickly and work in the living room temporarily if we both ended up having calls at the same time, and her work systems are a little flaky when it comes to switching wireless networks, involving long periods of reconnecting and Cisco IP Communicator being a real pain with this.

On top of this, we’ve got various Google Home devices around the house, Sonos speakers, multiple phones, laptops, tablets, and other smart home devices.

Essentially though I’m only looking for internet access everywhere, not throwing large files around a at as close to gigabit LAN speeds as possible, so that seems like a fairly straightforward setup. My router in use here is either a Draytek 2860n or a Billion 8800NL, and I list two for reasons that will become apparent later.

The solution

I tried four different mesh wireless systems over the previous few months, with various degrees of success. All systems have been three device systems, with the main device going in the office at the front of the house, a second in the living room at the back of the house, and a third upstairs in the main bedroom at the front of the house.

BT Whole Home WiFi Premium

I got the BT Premium whole home wifi system out of the box and setup, configured to essentially act as a wireless AP mesh. Leaving my Draytek router doing all the heavy lifting for the network, including DHCP, DNS and other services.

The setup on the app reported the connection between all the BT devices was good and at first, all seemed well, I migrated all my devices to the new network and they all seemed happy enough. I did some performance testing from various places in the house and out in the garden, and all was good for internet access.

After a few hours though I started to notice a problem, where if a device was turned off, rebooted, or had it’s wireless disconnected by leaving the house or something, it couldn’t then reconnect to the network. If I restarted the BT mesh system, it worked again, until the same thing occurred again a few hours later. After various testing and trawling the BT product forums, I came to the conclusion that for some reason the devices stop passing DHCP packets through. If I assigned an IP manually, all good, use DHCP, the wireless connection happens, but no addressing is given. I tried resetting the whole thing and starting again, but couldn’t solve this. I did read from someone who had more time to look at this than I did that he thought they were fragmenting DHCP packets for some reason. – So back in their box they went

Amazon eero

The next contender was the Amazon eero mesh wireless system. I’ll start by saying the eero doesn’t do PPPoE, so forget that as an option here. I initially set this up in router mode, and swapped the Draytek to the Billion as the Billion will run in the mode I’ve seen from various sources called PPP half-bridge mode, where the WAN IP received from the ISP is passed through to the device underneath it, eliminating the double NAT problem, although this is ultimately trickery to get around the eero devices lack of PPPoE capability.

So the main eero had the WAN IP assigned by my ISP on it’s external interface, albeit via a half cut method, so good? At this point I very quickly discovered that the eero devices in “router” mode did not want to get an IPv6 address from the Billion device above it. It wanted the address handed out via DHCPv6 and Prefix Delegation, rather than using SLAAC. I did try every conceivable setup on the Billion, but could not get this to work, so no IPv6 connectivity. Not a big deal you may think, but it bothers me. – Again, back in their box they went.

Google Nest WiFi

Next contender was the new Google Nest Wifi. Again, the setup was very simple and all seemed to work well, there was even an option to enable IPv6 which enabled and worked well. However after a few hours IPv6 just stopped working across all devices on the network, meaning it had to be turned off and back on on the Nest WiFi devices and then it would work again, for another few hours.

We also noticed that when we were playing music through the Sonos devices they kept all dropping off the network. It didn’t seem to be any specific Sonos speaker, they just all dropped, then 5 minutes later all reconnected. This happened quite often, sometimes a few times a day, and enough to wind me up.

So I looked at putting the Google Nest Wifi devices into AP mode to see if this helped, except you can’t. Well, technically you can, but they don’t act as a mesh, so what exactly is the point? – Back in their box they went.

Linksys Velop AC6600

Next challenger was the Linksys Velop system, which was certainly the worst looking of the bunch. I had them in white, but in black I’d imagine you could mistake them for the monolith from “2001: A Space Odyssey”.

The app was also more clunky than the other devices, just as a point to note. Perfectly functional, just a bit more clunky.

Everything setup ok, but within 24 hours I had problems with the Linksys devices dropping their connectivity to each other, and I was getting fairly irritated with these sorts of problems by this point. – I’m sure you can probably guess, back in their box they went.

AMAZON EERO – Another attempt

I decided, since they were still back in their box where I left them, to give the eero another final shot before I returned them. This time I went for what eero call “bridge mode”, where my router does the work, and they act as mesh access points. So back in with the Draytek this time, and they seem….ok.

We’ve now been running off them for a few weeks, Sonos devices seem fine, and everything works ok. I don’t get the fancy features of the eero devices, but then I’ve got what I need on the Draytek router, so all seemed good.

I have had a few instances where the two satellite devices get disconnected from the main unit, which stays connected and working during this time. Mostly they reconnect pretty fast, but sadly they did this twice in the space of 15 minutes this morning when I was on a video call, so that did irk me somewhat, and enough to question whether I should get them back in their box and have another go with something else.

Asus ZenWiFi XT8

I tried these as I was really at the end of my patience with the various other options. I also brought in a Draytek 130, to act as a pure modem for this setup.

Everything was plugged in and configured, and everything seemed fine on initial inspection, until I did some performance testing. At which point I found that wired in in to the Asus router unit, I got what I would call “full speed” for my connection. However, move to wireless, and the upload speed drops to somewhere between dire and unacceptable. Download speeds stay normal, but upload performance dies a terrible death,

Conclusion?

So at this point I really have to wonder two things; What have I done to deserve this, and why is home mesh wireless kit so absolutely crap? It’s not a case of “there’s no perfect kit”, there’s just no kit that didn’t have some fairly critical functionality flaw.

I honestly don’t know why this has been so problematic. It seems like a fairly simple problem to solve as I’m not after huge throughput, just stability at range and standard UK broadband speeds on wireless. All the mesh systems I’ve tried either seem to have their own unique problems, like the DHCP on the BT one, or devices falling off the network and IPv6 failing to work like the Google devices, performance problems like the ASUS, or just random disconnects between them, which the eero and Linksys systems seem to do

I’m not sure of the advice I’d give here after all this. Perhaps don’t bother with mesh wireless is the right conclusion? All the ones I tested seemed flaky in one way or another, although reading the reviews you get the usual mixed bag, they’re terrible and don’t work at all; they’re the best thing since sliced bread. My personal experience says there’s a reason why the ethernet cable is still king.

My problem now is I still need a mesh wireless system, and I know there are a few brands I haven’t tested, Netgear’s Orbi system, and a few other less known devices, like the Plume kit, but after having tried so many, would I expect them to be any better? I simply don’t know what to actually do to solve this beyond smashing my walls to bits and running cables.

Footnote 29/07/2020: TP-Link Deco P9

As a note, my parents have a set of TP-Link Deco P9 devices. They have a fairly large house to cover and so have four of these setup in various places. My experience of these is the app isn’t as slick as the eero or Google apps, isn’t very feature rich, and is a bit flaky at times. My biggest gripe of these, is since they support ethernet, wireless or powerline for backhaul, you have no way to pick which they’re using, and no way to see which the devices themselves have picked. My parents had a very similar problem to what I’ve seen with other mesh devices I’ve tested, in which the mesh devices disconnected from the main device, but in their case seemed to stay disconnected. This could be related to the backhaul method chosen, I have no idea as you can’t control it or see what’s being used. UPDATE: This little niggle has apparently now been improved slightly in a later firmware update and you can now see connection quality of the various backhaul methods, but still can’t pick which one is being used, or see which one the devices have chosen, and the quality of the firmware seems questionable. There’s a lot of errors appearing in the log files that TP-Link are aware of , and are working on fixing at some point. From my experience, the Deco devices constantly go “offline” in the app, when in reality they’re actually still online and available. So, as for the Deco devices, I’d also avoid them.

Experimenting with & Moving to AWS – Part 2

This is a follow up to my previous post – Experimenting with moving to AWS

All went well with AWS Lightsail, it’s a very serviceable VPS solution, but now I’ve had a bit of time in AWS I’ve migrated the site further to EC2. It was a simple enough process, snapshot the Lightsail machine and export that as an EC2 AMI and EBS snapshot, and then cloned the whole lot from London to Ireland. The move of regions was because I have some other data already in Ireland and wanted to keep the site in the same region now.

Off the back of all that I’ve got my IPv6 connectivity back to the site again, as Lightsail does not support IPv6 addressing, which is a bit of a negative point there of Lightsail. EC2 instances however, most certainly do support IPv6.

I’ve also gone as far as migrating DNS management into Route53 from Google Domains, mainly to simplify managing the domain zone.

The instance type the site is now running on is also one of the newer AMD EPYC EC2 instance types, which work out slightly cheaper than the equivalent Intel instances, so keep an eye on the instances suffixed with “a”, as you can save a bit of money there.

Experimenting with & Moving to AWS – Part 1

So, in my work environment, I’ve been heavily based in the VMware and “traditional data centre” world, covering all the usual stuff, as well as some very modern technologies like VSAN.

However, a need has now arisen for me to start skilling up in AWS technologies. So as of last week, my journey into cloud technologies has begun, and I’ve been using the fantastic A Cloud Guru site for their great courses on AWS. I’m starting from the ground up, with very little experience of AWS, so it should be an interesting path for me.

On a related note, for an easy in to AWS, I’ve migrated this site to now live in AWS via their Lightsail platform. For what you get, it’s very cheap and has allowed me to start to experiment with AWS technologies. I’d recommend it to anyone looking at self-hosting WordPress sites. Give it a go, you can get a free month and try things out. Overall, even though the specs of the basic entry-level server look very diminutive, but I’ve found the performance to be great in reality.

I’ll report back when I’m a little further on with the learning, but just for your information, the path I’ve started down is the AWS Certified Solutions Architect, starting with the associate level and hopefully working up to professional level eventually.

Wish me luck!!!

VxRail host upgrades failing? Try an iDRAC reset

Just thought I’d post this little nugget of information; in case some other poor soul is having a similar problem. I’ve recently been upgrading two VxRail clusters to the latest code and suffered the same problem in both clusters. The code update was to the latest, 4.7.211 and some hosts in the cluster were repeatedly failing to upgrade their firmware. The host ESXi software was upgrading without any problems, some were just failing on firmware.

After a lot of head-scratching and a call to Dell support a reset to the iDRAC was advised. I assumed that’ll never work, it’s only out of band management, why would that fix a firmware update problem? Well, it appears that the iDRAC reset can, on occasion, fix this sort of problem, as after the first cluster upgrade was complete and I came to the second cluster I ran into the same problem. Lo and behold, I tried the same trick without Dell support this time, and to my surprise, it worked.

I guess the iDRAC is more intimately involved in the firmware upgrade process than I assumed. So, keep this in mind for problems of this sort, I know I will.

Microsoft January 2019 KB4480970 Patch – KMS Activation Errors – UPDATED

I’ve seen a few cases of this now in the wild within my organisation, where previously activated Windows 7 devices would suddenly report that they were no longer activated. On running “slmgr /dlv” I could see that the client reported as unlicenced, with the notification reason as “0xc004f200 (non-genuine)“

This appears to be another instance of the infamous KB971033 which has caused this in the past, which seems like it might have resurfaced as part of the January 2019 – KB4480970 rollup update and KB4480960 security only update

Listed under known issues is;

KMS Activation error, "Not Genuine", 0xc004f200 on Windows 7 devices.

So, it would appear that this is the cause of the activation problem in this case. The fix is as follows;

wusa /uninstall /kb:971033 /quiet
net stop sppsvc  /y 
del %windir%\system32\7B296FB0-376B-497e-B012-9C450E1B7327-5P-0.C7483456-A289-439d-8115-601632D005A0 /ah 
del %windir%\system32\7B296FB0-376B-497e-B012-9C450E1B7327-5P-1.C7483456-A289-439d-8115-601632D005A0 /ah
del %windir%\ServiceProfiles\NetworkService\AppData\Roaming\Microsoft\SoftwareProtectionPlatform\tokens.dat
del %windir%\ServiceProfiles\NetworkService\AppData\Roaming\Microsoft\SoftwareProtectionPlatform\cache\cache.dat
net start sppsvc
cscript %windir%\system32\slmgr.vbs /ipk 33PXH-7Y6KF-2VJC9-XBBR8-HVTHH
cscript %windir%\system32\slmgr.vbs /ato

Don’t forget the Windows 7 key in my example above is for Windows 7 Enterprise, grab the right key for your edition of Windows 7 from Microsoft’s KMS Keys Page.
This should remove the offending update and re-activate the copy of Windows against your KMS server.

UPDATE
Microsoft have confirmed that the Windows activation problem is, in fact, unrelated to the January 2019 update, and is in fact caused by a separate update to Microsoft Activation and Validation and has since been reverted by them

VMware VUM Error in Firefox Since 6.7 U1

I came across this error today, when using the HTML5 client and VMware Update Manager (VUM);

Response with status: 401 OK for URL: https://<FQDN of VUM server>ui/vum-ui/rest/vcobjects/urn:vmomi:HostSystem:host-10:478c8cfc-c88e-4fdb-9e1a-93d899697bf7/isUpdateSupported

Turns out this is something that only affects Firefox since 6.7 U1 and VMware have a KB article on it here; https://kb.vmware.com/s/article/59696

UPDATE – VMware have now fixed this in 6.7 U2 of vCenter which is available from Vmware Downloads

OVF Template Failing To Deploy

When trying to deploy an exported OVF template into another vCenter and cluster, I was presented with a strange error which seemed to indicate that I’d not specified which datastore to deploy to, which was odd because I most certainly had. The error I saw is below;

Failed to deploy OVF package. ThrowableProxy.cause A specified parameter was not correct: Target datastore must be specified in order to deploy the OVF template to the vSphere DRS disabled cluster

A quick Google search for this gave nothing, which is usually a bad sign, but the fix for this was rather simple. When presented with the storage you want to deploy to and you pick the relevant datastore;

If you then go and click on advanced, you’ll find that only one of the disk groups has been allocated to the datastore you picked for some reason;

If you then click edit on the disk group that doesn’t have a datastore, you should then be able to pick a datastore for that and the OVF will then deploy. This was all seen on vCenter 6.5 Update 2 (build 6.5.0.20000) but it may affect other vCenter versions.

Quite Impressed With Microsoft’s Hyper-V

I realise I’ve not posted for a while, and I’ll try and atone for that going forward, I’ve been a busy server guy at work, but onto the good stuff.

I know Hyper-V, I’m not new to the existence of Hyper-V, but I’ve only ever briefly touched it in lab environments, until recently.

I had cause to do a number of small site deployments in the US on Hyper-V, my first choice being for a proper VMware setup with vCenter and a shared storage platform, but for one reason or another, my hand was forced and I had to go in guns blazing with Hyper-V, no shared storage and no Microsoft System Centre Virtual Machine Manager (SCVMM) either.
For those that haven’t looked at Server Core installs yet, firstly, why not? Secondly, please do, it’s a great feature of Windows Server for enterprise and business setups and something that Windows Server nerds everywhere should be doing more of.
I’m happy with PowerShell, so deployment wasn’t hard, that’s not to say I’m 100% au-fait with every set of cmdlets on offer and know the whole she-bang inside out, but it’s pretty easy to get going. Once the install was done and networking configured (loving the native NIC teaming since Server 2012), Hyper-V role installed and servers fully patched it was time to start actually configuring the thing. Again, Microsoft have made the whole thing fairly straightforward via the Hyper-V console and since I’m not a PowerShell martyr, I use it where things are easier but use the GUI where it makes sense for some things, I was happy to proceed in the GUI for some of the configs.

So, onto what I liked, the live migration with shared nothing is great, although something I know ESXi also has. The replication of a VM between hosts, and the ability for one of those replicas to effectively be powered on in an unplanned failover scenario is great and failed back when things are working as planned again. Ok, I know it’s not a full HA setup in the sense that it requires intervention for it to work, but it’s a step up from what you get with just ESXi without vCenter.
Hot-add of memory is now available in Hyper-V since the Server 2016 version, as well as hot-add of network adapters, which brings it a lot of the way towards VMware’s offering in terms of hot-add features. PowerShell Direct is amazing, the ability to have a PowerShell based session to a guest OS from the host regardless of networking or firewall is great.
Obviously there are some things missing from Hyper-V still, vCPU hot-add being one, but not one that I personally use too often. The HTML5 interface of the later iterations of VMware’s product is also great, no need for an installed application to manage the thing is always good news.

Hyper-V can easily suffice for small scale deployments and is well worth a look these days. In its current evolution, it’s a big leap from where it was in its Server 2008 days. As time goes by, there really is getting less and less between Hyper-V and ESXi, and that can only be good. No one benefits from a monopoly position, with the exception of the monopoly holder, so it’s good to have some healthy competition in the market and I look forward to seeing what Microsoft can do with the platform in the future.