The process of developing an effective contingency plan can be broken down into six key stages:
Every organisation is different. This might sound obvious, but a common failing when developing a disaster management strategy is to look for a one-size-fits-all solution. Starting out with a clear examination of the specific requirements of your organisation helps to make sure the plan you develop is the one you actually need.
It can help clarify your thinking to create a contingency planning policy statement as part of your analysis. The statement should briefly set out the following information in as clear a manner as possible:
Once these objectives have been defined, it's important to ensure that the various departments of the organisation who may be impacted by the plan are brought on board. This may include the IT and HR departments, those responsible for physical security and emergency readiness, and the decision-makers responsible for these areas. This shouldn't be regarded as a comprehensive list; the circumstances of a particular organisation must be individually assessed.
Before a meaningful plan can be developed, it's necessary to know precisely what is at stake. In short, unless an organization has a clear understanding of the relative value of its assets, there is a serious risk of targeting resources inappropriately, and of providing inadequate protection where it is most required. An impact assessment can help minimize this risk, by establishing a clear hierarchy of the priorities the contingency plan will address.
The basic approach to impact assessment is first, to identify the organization's essential IT resources; second, to identify the likely impact of disruptions to these resources, with particular emphasis on defining acceptable downtimes; and third, to establish a hierarchy of recovery priorities.
Resource identification involves analyzing the various IT resources to establish which elements of the infrastructure perform which critical functions. When resources have been accurately correlated with their functions, it will become clear which elements of the infrastructure are critical, and which are less so.
Disruption impact analysis takes the data gathered in the previous step and evaluates the impact of disruption to the critical resources, both over time and across related dependent systems. From this analysis, the planning architect can identify the point at which the cost of disruption becomes greater than the cost of preventative measures. This assists greatly in determining for how long the organization can afford to accept system downtime, as well as helping to assign appropriate financial resources to implement the plan
Recovery priorities will usually be self-evident if accurate data has been gathered and collated in the first two steps. The planning architect will assign relatively greater resources to the recovery of more critical components within the overall framework.
The impact assessment stage will help to identify areas where disruption can be significantly reduced by implementing preventative measures. Although in principle prevention is always better than cure, whether this is true in any specific case will come down to a cost assessment. For example, if the financial cost to the business of system downtime is greater than that of generators to virtually eliminate the risk of this, the generators are a sound investment. However, in many cases, the reduced financial outlay required to ensure rapid restoration will be more than adequate. Many such decisions will need to be made by the planning architect during the development of the plan.
A wide variety of preventative controls, at various levels of expense, are worth considering. These will typically include but are by no means limited to, things like UPS devices, generators, fire detection and suppression systems, water sensors and off-site storage arrangements for backup media.
Recovery strategies enable operations to be rapidly normalized in the event of a disruption. Strategies should be based on the data gathered in the impact assessment, in order to ensure that they are appropriate to the organization's core requirements. They should also take into account the full range of possible incidents and disruptions. Among the specific strategies that should be considered are:
Backup strategy. This will specify the frequency and type of backups, the data sets to be backed up, file naming policy, backup storage locations, procedures for transport to off-site locations, etc. It will also specify the type or types of backup media and the frequency of rotation and renewal, based on the volume of data, and its integrity and availability requirements.
Alternate sites. In extreme circumstances, it may be desirable or necessary to transfer operations in whole or in part to an alternate location. The contingency plan should identify such locations, which must be capable of supporting the operations they may be required to accommodate. The plan should also specify the circumstances under which relocation is to be undertaken and address the logistical considerations of such a move
Equipment renewal. In the event of major damage or theft, it may be necessary to replace items of IT infrastructure at short notice. If the planning architect considers it necessary, the contingency plan should specify emergency arrangements for the procurement, delivery and commissioning of replacement hardware. This may involve arrangements with suppliers to supply equipment at short notice, the advance purchase and off-site storage of critical items of equipment, and plans for the contingency use of suitable equipment already owned by the organization.
Roles and responsibilities. The plan should specify teams and individuals and the areas for which they are responsible in an emergency situation. The people involved must understand their roles and the expectations that these roles place upon them, and they must be fully prepared to implement their responsibilities at short notice when required
Emergency procedures: The plan must include the emergency procedures that must be implemented in the case of an incident. For example:
The plan development phase involves pulling together all the information gathered in the previous steps into clear and precise outlines of the actions to be taken under various emergency conditions. The plan should be laid out in a simple and straightforward manner, to assist people to locate relevant information quickly and easily. An emergency is not the time for individuals to have to wade through thousands of words trying to find the bits that matter to them. The plan should also be simple to execute under emergency conditions.
A useful approach to laying out the plan is to structure it according to the various emergency circumstances that have been envisaged. Within these sections, step by step workflows, and checklists targeted at individuals or teams help make it easy for people to know what they are supposed to be doing.
It is typical to specify three phases of response for each identified emergency situation. The first of these is the activation phase and consists of the procedures for communicating the existence of the emergency, assessing the damage and activating the plan. The second phase is the recovery phase, during which the recovery procedures are initiated and carried out. This is followed by the reconstitution phase when the original infrastructure is restored and tested, and the emergency procedures wound up.
Testing, training and maintenance are essential follow-up activities that must be carried out after the completion of the plan. It is vital that the plan is thoroughly tested in all its aspects. An untested plan is worthless, as there is a high probability of it falling under the pressure of an actual emergency. For this reason, planning standards generally recommend a structured and comprehensive testing schedule covering at least the following areas:
Coordination of responsible parties. Testing should demonstrate that responsible teams and individuals understand and can carry out their assigned roles in an emergency. Given the human aspect to this, it is advisable that at least some testing is carried out under pressure situations to ensure that people are able to function under the stress of an actual emergency.
Notification procedures. The communication elements of the plan are easy to overlook but are in fact critical. Testing should ensure that communication procedures are viable and effective, and able to function properly under emergency conditions.
Training is essential to ensure that individuals are aware of the plan, its possible impact on them and their role within it. This does not apply only to active participants in emergency processes. In an emergency, it is not unusual for almost every individual in an organisation to be affected in some way. Some form of training is, therefore, necessary for everybody. Training usually consists of a combination of classroom and practical exercises designed to imitate real-life scenarios as convincingly as reasonably possible. Practical exercises should ideally involve simulations of anticipated disruptions.
Maintenance is another frequently overlooked aspect of contingency planning. In a dynamic environment such as a typical business, frequent reappraisal and review are essential. This part of the process shouldn't be left to chance: the maintenance schedule should be formally specified as part of the plan. As a general rule, the plan should be subject to annual review, although in many cases more frequent or even continuous review will be desirable. In all cases, procedures must be in place to update constantly changing details such as contact information on an as-needed basis.
Finally, the plan itself should be subject to the organisation's security processes. It is in and of itself a piece of critical and sensitive documentation. Therefore, its distribution should be appropriately controlled, backup copies should be stored offsite, and its contents themselves must be disaster-proof.
Click here to download a contingency plan template for future use.