Contents | October 2007 | Volume 2, Issue 6
Disaster Recovery 101
Effective disaster recovery planning is essential to providing and maintaining technology services for end users. There are three main components to disaster recovery planning: response, recovery, and restoration. Response is the action taken immediately after a disaster occurs; recovery aims to recover data access and agency operations; and restoration is repairing the facility that suffered the disaster. Prior to the data center services (DCS) contract, the DCS agencies had varying levels of disaster recovery planning and capability. One of the key goals of the DCS contract is to provide all agencies with robust disaster recovery plans aligned with business needs.
Team for Texas has seven disaster recovery coordinators (DRCs) working to understand and update agency disaster recovery planning. DRCs work closely with Team for Texas Service Delivery Managers (SDMs) at the agencies. Additionally, they have access to IBM’s Business Continuity and Resiliency Services, which is providing technical expertise in disaster recovery and a framework for the DCS failover strategy.
The DRCs are developing two types of disaster recovery plans: interim and final. Interim disaster recovery plans cover the time period before the agency is consolidated, while final disaster recovery plans take over after the agency is consolidated to the Austin or San Angelo data center. Team for Texas has developed six draft plans for agencies with no prior disaster recovery planning. The agencies are reviewing these draft plans with the SDMs before final acceptance and implementation.
The Team for Texas approach to restoration uses a dual site solution or a sister site methodology, where each center uses the other as a backup in case of a disaster. Each data center maintains the ability to restore critical data and applications from its sister site. Both data centers have bandwidth capacity to support restoration while maintaining current operations.
Agencies prioritized applications covered by the DCS infrastructure into one of five recovery time objective (RTO) classifications—D0, D1, D2, D3, D4—based on criticality to business operations. The RTO is the time in which systems, applications, or functions must be recovered after an outage. RTOs form the basis for recovery strategies, and determine whether to implement one or more recovery strategies during a disaster. The RTOs will be the primary basis for future updates and evolution of disaster recovery plans.
Team for Texas will conduct periodic simulation exercises to test the effectiveness of the disaster recovery response once the dual site solution is completed. All lessons learned from these exercises will be incorporated into the disaster recovery plans.
TSLAC Migration
The Texas State Library and Archives Commission (TSLAC), an agency focused on preserving historically significant records and providing accessibility to public government information, is the first DCS agency to migrate data center operations. TSLAC, which has 250 agency users, began to transition hardware and software on August 11, 2007. They have moved 22 servers to the Austin Data Center and are continuing to migrate software platforms.
TSLAC’s move date was prompted by plans to renovate the agency’s Capitol site, the Lorenzo de Zavala State Archives and Library Building. The first phase of construction included the server room, so the decision to be the first in the transformation process not only supported the data center initiative, but also the construction schedule.
Planning for the move began in mid-April 2007. The transformation planning activities included identification of specific milestones and risks that were critical to determine the priority and order of equipment and applications scheduled for transition. Though some outages could not be completely avoided in such a move, TSLAC planning and project management focused on minimizing the downtime to users of their services.
When the move began in mid-August, approximately 20 TSLAC employees were involved with migration activities including verifying server counts and disk space requirements, providing documentation, and testing hardware and network connectivity. TSLAC employees provided a knowledge base of agency-specific hardware and software. Team for Texas resources and TSLAC employees worked together to order new equipment, set up the new environment, migrate servers, and test applications moved to the Austin Data Center. The combination of teams ensured that the necessary manpower and knowledge base were available to rapidly move TSLAC’s hardware and software.
Michael Ford, the agency’s DCS Customer Representative, has worked for TSLAC for ten years. He says that while moving has been a challenge, he hopes that TSLAC can be an example for other agencies as they prepare for their DCS program.
Ford concludes, “It is not enough to simply know the operating system and hardware. Knowledge of the specific agency environment and applications is critical to ensuring that a move to this degree is successful.”