When things catch fire: Identity for Office 365
I’ll make a bold move and start a series. Let’s discuss some scenarios of when the cloud may come in handy when your on-premises environment catches fire, has an outage or get’s disrupted due to a disaster. How can you leverage cloud technologies to take over, so you benefit from business continuity, while you recover?
I’ve heard from various customers that plan for business continuity for their most critical assets using the cloud. Consider their scenario: they invested in Office 365, meaning that they have a considerable amount of business data in Sharepoint they have e-mail and calendar in the cloud, as well as instant messaging and in the meantime also a cloud-hosted conferencing solution. The majority of users (>90%) use these capabilities, while they have to keep a handful of users still on-premises – for example, due to regulatory reasons. These customers manage access to the cloud through the traditional setup: they have an on-premises Windows Active Directory (AD), which is the main directory for authentication and coarse-grained access. They also have an identity provider that issues access tokens to the cloud and have a directory synchronization engine that transports identity data from their on-premises directory being AD into the cloud directory – in their case, this is Azure AD (AAD).
You see, there are still some components hosted on-premises that are required to make successful use of these business applications, for making users’ daily work possible.
Now – one of these customers was looking at their business continuity plan and, as part of that, made an inventory of their most valuable assets. The idea of this analysis is quite simple; they need to be able to bring back the most valuable assets as fast as possible – to contain damage to the company and continue or resume functioning. As part of the investigation, they identified instant messaging (IM) and e-mail as critical components. You would agree that these may not be the assets that contain company secrets or valuable manufacturing data – even though, these databases were of course also on the list. No, e-mail and IM were identified as critical, to restore clear communication paths between employees that seek for help and corporate communications when the disaster is ongoing or is on route to resolution.
Clearly, IM and e-mail, at present, have dependencies to a number of on-premises components: Windows AD, Identity Provider, Directory Synchronization – and in the cloud: Azure AD. How do you make these components resilient and disaster-proof?
We were thinking of two approaches:
- Replicate the whole environment somewhere else, where you have a loose connection to and that you can easily attach and detach to your environment: build all of the required components (Windows AD, Identity Provider, Directory Sync) in another cloud environment using IaaS.
- Change the authentication configuration and switch from on-premises Windows AD authentication to simple Azure AD authentication – with help from Password Hash Synchronization.
Two different approaches that vary in speed to recovery, complexity to build, maintain and operate and security aspects:
|Replicate whole environment||Change authN configuration|
|Speed to recover||low||low|
|Complexity to recover||low||low|
|Cost to build||high||low|
|Cost in maintaining||medium||low|
|Complexity to build||medium||low|
Of course, there will be security requirements, especially when it’s about password hash synchronization. In this particular example, the customer has already a good footprint in the cloud, which of course helps to trust the cloud provider. Also, to put into perspective, in this example, the entire on-premises infrastructure died, crashed, went offline, and you need to bring the business back online again, in a secure way.
Moving forward, the customer accepted password hash synchronization as the fallback option and configured it according to Microsoft recommendations. The hashes of the password hashes are flowing to Azure AD as a fallback, while the primary authentication, for normal operation, still happens through the Identity Provider and Windows AD. Only in case of an emergency, when their Windows AD died or gets compromised or anything of that magnitude, they switch to Password-based sign-on directly in Azure AD. That “flip over” from IDP-SSO to passwords in Azure AD is one funky Powershell Command, once the password sync was successful. The switch takes only a couple of seconds in the tests the customer has gone through.
Long story short – you may or may not like Password Hash Synchronization for a number of reasons – and prefer building much additional infrastructure for the case of the disaster (that hopefully never happens!), but what we’d like you to get away with is:
- Look at your precious assets and make a list of what’s essential to your business – and what MUST be restored ASAP in case something happens.
- Look at these assets not only from a business app angle – how do you communicate to the workforce and provide help/assistance/guidance in this crisis? Would a Whatsapp group be an appropriate alternative, probably
- Consider your cloud footprint already – what are the on-premises dependencies to core services that you host from the cloud/other vendors?
- What is your willingness to invest into a disaster recovery “setup”. Consider the complexity (to build, maintain, operate), cost (resources to build, maintain, operate, boxes, VMs, …) and security (considering you are in a very grave situation). Will you be able to keep all environments in-sync?
- Once you have a plan: how good are your chances to executing the plan (documentation, practice, instructions clear, up to date?), when things are on fire, and your hands are shaking?
Fun reality checks, huh? By the way, have you checked Azure Site Recovery (ASR)?