The problem

Over the past year, I've been gradually moving a large, heavily customised SharePoint environment to Azure. While everything has gone pretty smoothly on the Azure side, as we approached the go live date I ran various Load tests and came across an issue. Users were experiencing slow page loads across large swathes of the site, with the previous environment average page load being around 1 second, the azure environment exceeded 4 seconds on many areas.

Digging into the slowness, I discovered it was being caused by slow search query’s, further investigation pointed to IO contention on the query servers.

The solution...

At first I thought this would be perfectly easy to solve, having seen the recent announcements for the new D series VM's with SSD available, It appeared to be a case of simply switching to a D series and moving the query index onto an SSD. While this was the answer, the journey to it was a bit more than a simple switch.

It turns out, due to being an early adopter of Azure IAAS, that all of my machines/services were locked to features from the beginning. When looking at the Machine sizes, the D series was not even listed

noD

I spent a couple of days searching the internet for an easy way to get access to the D series VM's, unfortunately it transpired that the only solution currently is to recreate the lot.

The implementation

Firstly let me just point out, this is not the best way to action this, I didn't have time to be mucking around with PowerShell scripts, and was aware that all the required actions could be achieved via the management portal (incidentally not via the preview portal)

Step 1: Backups

As anyone who's had the misfortune to work with large SharePoint environments will know, the amount of configuration is obscene, especially in customised solutions. Given it had taken us almost a year to get everything running correctly, it was imperative that the machines be reused. To that end I shutdown all of the VM's and took a "Capture" of them, making sure not to tick the Syspreped box.

cap

Step 2: New Network

Next I set-up a new virtual network for the machines, with a matching config to the original. Make sure that you create the new network in the same region as the one that you’re going to restore the machines from, as otherwise you won't be able to see the images. The reason for the new networks were due to the fact that using the same network locates your VM's in the same "racks", and thus you still won't get access to the D series.

Step 3: Recreate VM's from backups

Next I went through each machine and created a new VM, Pointing to the backups from the "My Images" gallery, making sure to recreate with the required Cloud services and availability groups (note that you can't add the machines to load balanced endpoints during this step.)
I took care to ensure that the machines were created in the same order as before, so that they all received the same internal IP's as before. If your system relies on internal IP's, doing this in a different order will cause a right mess.

Step 4: Recreate endpoints

Finally I went through each of the new machines, and ensured that they had matching endpoints to the previous machines, including load balancing etc. Taking special care to ensure the SQL endpoints were recreated with "Direct server return" selected.

It's Alive!

After a couple of hours, All the machines were recreated, and my index servers are now using the D series VM's. Resulting in a much faster site overall.