Disaster Explosion Fire Problem

When things catch fire: Identity for Office 365

I’ll make a bold move and start a series. Let’s discuss some scenarios of when the cloud may come in handy when your on-premises environment catches fire, has an outage or get’s disrupted due to a disaster. How can you leverage cloud technologies to take over, so you benefit from business continuity, while you recover?

I’ve heard from various customers that plan for business continuity for their most critical assets using the cloud. Consider their scenario: they invested in Office 365, meaning that they have a considerable amount of business data in Sharepoint they have e-mail and calendar in the cloud, as well as instant messaging and in the meantime also a cloud-hosted conferencing solution. The majority of users (>90%) use these capabilities, while they have to keep a handful of users still on-premises – for example, due to regulatory reasons. These customers manage access to the cloud through the traditional setup: they have an on-premises Windows Active Directory (AD), which is the main directory for authentication and coarse-grained access. They also have an identity provider that issues access tokens to the cloud and have a directory synchronization engine that transports identity data from their on-premises directory being AD into the cloud directory – in their case, this is Azure AD (AAD).

You see, there are still some components hosted on-premises that are required to make successful use of these business applications, for making users’ daily work possible.

Now – one of these customers was looking at their business continuity plan and, as part of that, made an inventory of their most valuable assets. The idea of this analysis is quite simple; they need to be able to bring back the most valuable assets as fast as possible – to contain damage to the company and continue or resume functioning. As part of the investigation, they identified instant messaging (IM) and e-mail as critical components. You would agree that these may not be the assets that contain company secrets or valuable manufacturing data – even though, these databases were of course also on the list. No, e-mail and IM were identified as critical, to restore clear communication paths between employees that seek for help and corporate communications when the disaster is ongoing or is on route to resolution.

Clearly, IM and e-mail, at present, have dependencies to a number of on-premises components: Windows AD, Identity Provider, Directory Synchronization – and in the cloud: Azure AD. How do you make these components resilient and disaster-proof?

We were thinking of two approaches:

  • Replicate the whole environment somewhere else, where you have a loose connection to and that you can easily attach and detach to your environment: build all of the required components (Windows AD, Identity Provider, Directory Sync) in another cloud environment using IaaS.
  • Change the authentication configuration and switch from on-premises Windows AD authentication to simple Azure AD authentication – with help from Password Hash Synchronization.

Two different approaches that vary in speed to recovery, complexity to build, maintain and operate and security aspects:

Replicate whole environment Change authN configuration
Speed to recover low low
Complexity to recover low low
Cost to build high low
Cost in maintaining medium low
Complexity to build medium low
User Experience low low
In case your toenails are rolling up, consider the fact that Passwords never leave Windows AD in clear text and the password hashes from Windows AD are hashed another 1000 times before they are security transported to the cloud.

Of course, there will be security requirements, especially when it’s about password hash synchronization. In this particular example, the customer has already a good footprint in the cloud, which of course helps to trust the cloud provider. Also, to put into perspective, in this example, the entire on-premises infrastructure died, crashed, went offline, and you need to bring the business back online again, in a secure way.

Moving forward, the customer accepted password hash synchronization as the fallback option and configured it according to Microsoft recommendations. The hashes of the password hashes are flowing to Azure AD as a fallback, while the primary authentication, for normal operation, still happens through the Identity Provider and Windows AD. Only in case of an emergency, when their Windows AD died or gets compromised or anything of that magnitude, they switch to Password-based sign-on directly in Azure AD. That “flip over” from IDP-SSO to passwords in Azure AD is one funky Powershell Command, once the password sync was successful. The switch takes only a couple of seconds in the tests the customer has gone through.

Long story short – you may or may not like Password Hash Synchronization for a number of reasons – and prefer building much additional infrastructure for the case of the disaster (that hopefully never happens!), but what we’d like you to get away with is:

  • Look at your precious assets and make a list of what’s essential to your business – and what MUST be restored ASAP in case something happens.
  • Look at these assets not only from a business app angle – how do you communicate to the workforce and provide help/assistance/guidance in this crisis? Would a Whatsapp group be an appropriate alternative, probably
  • Consider your cloud footprint already – what are the on-premises dependencies to core services that you host from the cloud/other vendors?
  • What is your willingness to invest into a disaster recovery “setup”. Consider the complexity (to build, maintain, operate), cost (resources to build, maintain, operate, boxes, VMs, …) and security (considering you are in a very grave situation). Will you be able to keep all environments in-sync?
  • Once you have a plan: how good are your chances to executing the plan (documentation, practice, instructions clear, up to date?), when things are on fire, and your hands are shaking?

Fun reality checks, huh? By the way, have you checked Azure Site Recovery (ASR)?

Florian is a Program Manager in the Customer Experience Group (CXP) Engineering team for Azure Identity. He works with customers and partners throughout Europe, to ensure they can successfully get their cloud identity using Microsoft’s Azure AD technologies, planned, designed and delivered securely. Florian helps customers removing technical blockers, gathers customer sentiments, feedback and deployment experiences, to feed these back to product engineering.

No Comments

Post a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.