Completion Report
Project Summary:
This project was initiated to build on the work of the deployment of new application and database tiers, to implement ways that we could provide more resilience beyond our current basic manual disaster recovery failover provision. Most systems can be failed over to a secondary site in the event of a major problem but require manual, attended intervention.
During the project we have looked at technologies, such as VMware Site Recovery Manager (SRM) but due to changes in the IT Infrastructure strategy of delivering a stretched metro cluster we have decided to not implement SRM. Instead the project focused on 'quick wins' where with little effort, gains can be made.
One of these “quick wins” was to implement and test facilities where applications still work when failing over databases and no manual intervention is required. This has been implemented for two top priority services – MyEd and Central Wiki - with no impact to service encountered.
Scope
To look at a specific service and implement a failover facility to manage planned / unplanned events in the infrastructure or application layer e.g. ORACLE, SQL or MySQL.
Objectives, Deliverables, Benefits, Success Criteria
Objectives |
Deliverables |
Benefits |
Success criteria |
|
|
Within the University of Edinburgh, Staff and students have been operating more and more on a 24x7x365 basis. Service availability is key to ensure this. While our infrastructure is robust, issues do arise and as we do not have personnel on hand to the same timeframe, a failover option facility is required. Benefits include:
|
|
Out of Scope
- Following discussions with project team using Events Booking would result in failover of AppsDEV, AppsTest would result in impacting a number of applications and also EBIS Online / Web Central had complexities with SAMBA drives that we made the decision to look at other applications and agreed to use Central WIKI Service as the candidate application.
- The project was asked to incorporate SRM solution following discussions with ITI however due to issues encountered to implement the technology the project agreed to remove as there was a preferred solution being introduced using VM Stretch Cluster which is not expected to be availability until Summer 2017.
- Also the analysis for automated failover process was agreed to not progress due to the future introduction of Stretch Cluster infrastructure.
Analysis of Resource Usage:
Staff Usage Estimate: 100 days
Staff Usage Actual: 28 days
Staff Usage Variance: (72%)
Other Resource Estimate: 0 days
Other Resource Actual: 0 days
Other Resource Variance: 0%
Explanation for variance:
- 16 days removed as unused days in 15/16 Financial Year
- 23 days returned to INF programme following removal of tasks aligned to SRM activity. This was based on discussions with the Programme Manager and Project Sponsor as well as ITI, the SRM solution was de-scoped from this project. The main reasons are:
- Difficulties were encountered by ITI to implement the SRM technology
- The SRM technology does not deliver automated failover capabilities. A proposal between IS Applications and ITI are under way to deliver a VM Metro stretched cluster, which will deliver an automated seamless cross site storage and server solution. This solution will be rolled out by Summer 2017 and is not ready for this project to use..
-
The focus of the project is now to deliver a solution where applications are database agnostic. This means that when databases move sites, the application does not require manual intervention to “point” to the new database, but the application will by itself “know” where the database is served from. This has already been implemented and tested for MyEed. This project will establish other services Central WIKI where this is being implemented and tested.
- 25 days effort returned to INF programme following re-estimation of remaining activity.
Key Learning Points:
- These types of project can be difficult to resource as key stakeholders are required from Production Management and their time can be limited as they need to focus on ensuring production has first priority to limit any potential impact on service
- With Infrastructure landscape changing and evolving the project team were able to use this situational awareness and make the correct decision to descope SRM allowing unused days to be transferred back to INF programme to be used on other projects
- The INF programme Manager thanked the project team for being pragmatic and scoping sensibly
Outstanding issues:
- None