Overview
Background
Staff and students, more and more, have expect services to operate 24x7 365 days basis. Whilst the current infrastructure is robust and the service as such is reliable, there are parts of service which have single points of failure and require manual intervention in case of failures.
The Wiki service is a top priority service and currently has no automated site resilience. The Wiki consists of two tiers - an application and and a database tier. The overall aim of this Project, is to move both tiers to the newly implemented metro stretched cluster.
The Project is dependent on the WIKI upgrade Project, COM039, who will require the application and databases involved in the Project, in order to perform the annual upgrade to WIKi. Following a meeting with the WIKi upgrade project, we have agreed to commence with Phase 1, in moving the application Tier to the VM for resilience by mid February 2018, and then to perform a handover to the team, with the necessary testing results and documentation. Following the WIKi upgrade project closure, the Application servers and database servers will be handed over to our Project to move to Phase 2, albeit the planning and design work will be completed beforehand.
Phase 1 - Wiki Application Tier - The application tier is a single application server because the licence costs for a horizontally scaled application tier is very expensive. This solution provides the resilience without the additional costs. The application tier is already on a virtual infrastructure, therefore this phase of the project will involve moving the application tier onto the virtual metro stretched cluster, in order to provide 'site' resilience, including the necessary DR Testing and performance testing methodologies that are in place. This approach, will give automated resilience across sites, in the event of an 'unplanned' site failure, hence, removing the need of manual intervention.
Phase 2 - Wiki Database Tier - The aim is to provide the database server with site resilience and automated recovery, in the event of site failure. The database tier is currently on a physical infrastructure. This will be the first transactional database to move to the VM environment and will therefore move in two stages.
Stage 1 - Move the database to virtual infrastructure, including the necessary DR and performance testing to ensure stability.
Stage 2 - Move the database VMs to the metro stretched cluster, only when the Technical teams deem this to be essential, following Stage 1 outcomes.
Scope
Phase 1 - Move the Wiki Application service, which is currently on a Virtual Machine, to the virtual metro stretched cluster. This will be completed by mid-February 2018.
Phase 2 - Move the Database Tier to VMs for Stage 1 and Metro stretched cluster(MSC), for Stage 2.
Stage 1- move the WIKi Database tier to a VM, including DR testing and load testing. This is a MUST, in order to provide recovery from 'unplanned' data centre failure. As this will be the first 'transactional' system to move to a VM environment, testing and planning with key technical stakeholders and business owners is crucial to the success. A decision will be made at the end of this stage, whether the move to stage 2 should go ahead as planned or whether there is a need for further information based on the results of the testing.
Stage 2 - move the WIKi Database VM s to the metro stretched cluster. This has not been done before within the University infrastructure and therefore key planning and extensive testing will be a key part of the Project.
Out of Scope
This project will not move the GenoWiki which was separated last year from the Central Wiki.
Objectives and Deliverables
| Phase | No | Description |
Priority (MoSCoW) |
Owner | |
|---|---|---|---|---|---|
| O1 | Provision of full automated resilience for the top priority Wiki Service that will ensure service excellence to the end users | Alain Forrester | |||
| 1 | D1 | Move WIKI application Tier VM onto to the metro stretched cluster | Must | Ana Heyn | |
| 2.1 | D2 | WIKI database move to Virtual Machines | Must | Heather Larnach, Mark Lang, Martin Campbell | |
| 2.2 | D3 | Move WIKi database VMs to the Metro stretched cluster | Should | Heather Larnach, Mark Lang, Martin Campbell |
Benefits
- Improved service availabilty in line with University Strategic Plan and Vision 2025 to provide a 365 24x7 service to staff and students.
- Reduction in manual intervention required in the resolution of 'unplanned' downtime instances.
- Ability to perform site maintenance by failing service over to secondary site.
Success Criteria
- Availability is measured and reviewed on an annual basis. http://reports.is.ed.ac.uk/alerts/reports/
- In summer 2018, site maintenance will be performed with outage over weekends, success will be where the WIKI service will have no 'outage'.
Project Milestones
(Please copy and paste from Milestones log)
| Target Date |
Title | Stage | Complete |
|---|---|---|---|
| 19-Jan-2018 | Planning Complete | Plan | No |
| 26-Jan-2018 | Phase 1- WIKi App Tier to metro stretch cluster - Design complete | Design | No |
| 29-Jan-2018 | Phase 1- WIKi App Tier to metro stretch cluster - Deploy to DEV | Integrate | No |
| 31-Jan-2018 | Phase 1- WIKi App Tier to metro stretch cluster - Deploy to TEST | Integrate | No |
| 09-Feb-2018 | Phase 1- WIKi App Tier to metro stretch cluster - Deploy to LIVE | Deliver | No |
| 16-Feb-2018 | Phase 1- WIKi App Tier to metro stretch cluster - Acceptance | Accept | No |
| 16-Feb-2018 | Phase 1- - Closure | Deliver | No |
| 02-Mar-2018 | Phase 2 - Database move to VM & MSC - Analysis & Design Complete | Design | No |
| 26-Oct-2018 | Phase 2 - Stage 1 Database move to VM - Delivery Complete | Deliver | No |
| 14-Dec-2018 | Phase 2 - Stage 2- Wiki Database to MSC - Delivery complete | Deliver | No |
| 21-Dec-2018 | Project Close complete | Close | No |
