Server 2003 to Service Fabric: Reflections on a modernisation journey
It's been over a year since my last ramblings! During that time there have been many stories I've wanted to share, but due to the workload of migrating hundreds of systems (across thousands of servers) to run in Azure before a hard deadline, I simply didn't find the time. Now that the hardest parts are out of the way, I wanted to share some of my experiences over the past year. This post focuses on porting many ASP classic/.NET 1 era sites, to run in containers on Azure Service Fabric. If you are considering or already undertaking such a journey, I hope my early adopter experience can make your path a little smoother.
Asp Classic? are you joking?
I kid you not, the oldest site I was tasked with migrating, was written in 1999 and was last edited in 2005! While "if it ain't broke don't fix it" has merit, these sites had taken the idea a little to literally.
So why change now?
Almost all these sites were running on Server 2003. Due to delays in the projects to fully replace these systems, we had to find a way to keep them running for a while longer, whilst adhering to compliance requirements that meant Server 2003 had to go. As Server 2008R2 is nearing the end itself, and who wants to deploy Server 2012 unless you must, we would already be making the migration to Server 2016 regardless. If the sites can run on Server Core, why not do it in containers?
Was it worth it?
Absolutely! While the direct benefit on the ASP classic sites was certainly lower, the gains for the asp.net sites was huge.
Since going live with our first production systems in November 2017, we have made over 330 deployments with fixes/improvements to these sites, which to put it in context, is more than they received in the proceeding 15 years!
To ensure we got good density, I spent some time tweaking the sites to reduce ram usage etc, which lead to one system going from being spread across 40 VM's, to 5 containers. Thus, reducing the operational cost more than 60%, while at the same time reducing page load times!
What's the most important thing to do?
Monitor everything! I can't stress enough how important good monitoring/logging is to this journey. As a long-time fan of application insights, I was keen to add it to the sites anyway, but it has proved invaluable with systems that you can't just log in to and 'have a look'. This also brought the bonus, of finding various bugs in the sites which had gone un-noticed, some of which had been present for almost a decade!
You said the ASP Classic benefit was lower, what gives?
I'll be straight here, while it is technically possible, and we did decide it was worthwhile for us, your mileage may vary. Firstly, our sites depend on 3rd party COM library’s, which it turns out can be a real pain to register properly in the container build process.
Support for monitoring tools and logging in ASP classic is also very limited, so when you do hit issues, figuring out what’s going wrong can be a painful process. If possible I would highly recommend ASP classic sites get ported to .net core instead, which we have also done for a few smaller sites, and found it to be a faster process than expected.
Why didn't you use Kubernetes instead?
When I first started this phase of the project just over a year ago, Kubernetes hardly even worked on Windows. Whilst it has come leaps and bounds since then, and I would certainly suggest looking at it for "Cloud Native" greenfield projects, the reality is (in my opinion) that Service Fabric is the only production grade orchestrator for legacy Windows workloads.
Ok so I've added monitoring, now what?
Next on the list, it is critical that you have a fully automated CI/CD pipeline. While it is possible to push straight from Visual Studio or the command line (seriously don't ever do this, as Damian Brady would say "Friends don't let friends right-click publish!"), you won't gain half of the agility if you stick to these outdated methods.
As an example, previously one of these sites would take almost 6 weeks just to go through the manual testing/approval process. The other day, a user reported an issue which due to the monitoring we had already been alerted to and were investigating. 5 mins later a fix had been created and checked in, where it automatically rolled to testing, was approved and then deployed into production, all within 20 mins of the original alert from the monitoring system.
Any other tips?
Make sure you don't have any hard-coded configuration
You know that setting that 'Bob' used to tweak? or other similar human interactions. When I started this journey, there was still no direct support for reading configuration from environment variables (how service fabric takes them) in full .net, so we had to write our own, however since then a package has been released to support this. Like with the monitoring, the process of ensuring dynamic configuration worked, revealed a few places in almost every single site, which had configuration values hard coded.
Delete stuff you don't need!
Server core containers are already large, and you want to make sure that resulting container images are as small as possible. In one of our sites, we found someone had checked in almost 1GB of SDK files for a product that was no longer even in use.
Don't store uploaded/downloaded files in the containers.
A couple of our sites were storing downloaded reports in the web directory, which as containers are immutable, got removed whenever the container was restarted. We found it simple enough to replace all of these with either volume mountings, or in most cases, making a slight change to store the files in Azure Files instead.
The Journey continues
While this has been a high-level snapshot of the things that spring to mind, we are by no means done on our journey. I still have some of the largest and most ingrained sites to tackle, and we are constantly looking to refine the process we have come to so far.
Going forward I will try to capture more of our journey here, but if you have any specific questions, let me know on twitter and I'll answer any that I can.