Overview

Background

Staff and students, more and more, have expect services to operate  24x7 365 days basis. Whilst the current infrastructure is robust and the service as such is reliable, there are parts of service which have single points of failure and require manual intervention in case of failures.

The Wiki service is a top priority service and currently has no automated site resilience. The Wiki consists of  two tiers - an application and and a database tier.  The overall aim of this Project, is to move both tiers to the newly implemented metro stretched cluster.

The Project is dependent on the WIKI upgrade Project, COM039, who will require the application and databases involved in the Project, in order to perform the annual upgrade to WIKi.  Following a meeting with the WIKi upgrade project, we have agreed to commence with Phase 1, in moving the application Tier to the VM for resilience by mid February 2018, and then to perform a handover to the team, with the necessary testing results and documentation.  Following the WIKi upgrade project closure, the Application servers and database servers will be handed over to our Project to move to Phase 2, albeit the planning and design work will be completed beforehand.  

Phase 1 - Wiki Application Tier - The application tier is a single application server because the licence costs for a horizontally scaled application tier is very expensive.  This solution provides the resilience without the additional costs. The application tier is already on a virtual infrastructure, therefore this phase of the project will involve moving the application tier onto the virtual metro stretched cluster, in order to provide 'site' resilience, including the necessary DR Testing and performance testing methodologies that are in place. This approach, will give automated resilience across sites, in the event of an 'unplanned' site failure, hence, removing the need of manual intervention.

Phase 2 - Wiki Database Tier - The aim is to provide the database server with site resilience and automated recovery, in the event of site failure. The database tier is currently on a physical infrastructure.  This will be the first transactional database to move to the VM environment and will therefore move in two stages.

Stage 1 - Move the database to virtual infrastructure, including the necessary DR and performance testing to ensure stability.

Stage 2 - Move the database VMs to the metro stretched cluster, only when the Technical teams deem this to be essential, following Stage 1 outcomes. 

 

Scope

Phase 1 - Move the Wiki Application service, which is currently on a Virtual Machine, to the virtual metro stretched cluster. This will be completed by mid-February 2018.

Phase 2 - Move the Database Tier to VMs for Stage 1 and Metro stretched cluster(MSC), for Stage 2.

Stage 1- move the WIKi Database tier to a VM, including DR testing and load testing.  This is a MUST, in order to provide recovery from 'unplanned' data centre failure.  As this will be the first 'transactional' system to move to a VM environment, testing and planning with key technical stakeholders and business owners is crucial to the success.    A decision will be made at the end of this stage, whether the move to stage 2 should go ahead as planned or whether there is a need for further information based on the results of the testing.

Stage 2 - move the WIKi Database VM s  to the metro stretched cluster.   This has not been done before within the University infrastructure and therefore key planning and extensive testing will be a key part of the Project. 

Out of Scope

This project will not move the GenoWiki which was separated last year from the Central Wiki.

 

Objectives and Deliverables

Phase   No  Description  

 Priority

 (MoSCoW)

 Owner
   O1   Provision of full automated resilience for the top priority Wiki Service that will ensure service excellence to the end users       Alain Forrester
 1  D1  Move WIKI  application Tier VM onto to the metro stretched cluster    Must  Ana Heyn
 2.1  D2  WIKI database move to Virtual Machines    Must  Heather Larnach, Mark Lang, Martin     Campbell  
 2.2  D3   Move WIKi database VMs to the Metro stretched cluster    Should   Heather Larnach, Mark Lang, Martin     Campbell  

 

 

 

Benefits

  • Improved service availabilty in line with University Strategic Plan and Vision 2025 to provide a 365 24x7 service to staff and students.
  • Reduction in manual intervention required in the resolution of 'unplanned' downtime instances. 
  • Ability to perform site maintenance by failing service over to secondary site.

 

Success Criteria

  • Availability is measured and reviewed on an annual basis.    http://reports.is.ed.ac.uk/alerts/reports/
  • In summer 2018, site maintenance will be performed with outage over weekends, success will be where the WIKI service will have no 'outage'.

Project Milestones

(Please copy and paste from Milestones log)

Target Datesort descending   Title   Stage   Complete
19-Jan-2018   Planning Complete   Plan   No
26-Jan-2018   Phase 1- WIKi App Tier to metro stretch cluster - Design complete   Design   No
29-Jan-2018   Phase 1- WIKi App Tier to metro stretch cluster - Deploy to DEV   Integrate   No
31-Jan-2018   Phase 1- WIKi App Tier to metro stretch cluster - Deploy to TEST   Integrate   No
09-Feb-2018   Phase 1- WIKi App Tier to metro stretch cluster - Deploy to LIVE   Deliver   No
16-Feb-2018   Phase 1- WIKi App Tier to metro stretch cluster - Acceptance   Accept   No
16-Feb-2018   Phase 1- - Closure   Deliver   No
02-Mar-2018   Phase 2 - Database move to VM & MSC - Analysis & Design Complete   Design   No
26-Oct-2018   Phase 2 - Stage 1 Database move to VM - Delivery Complete   Deliver   No
14-Dec-2018   Phase 2 - Stage 2- Wiki Database to MSC - Delivery complete   Deliver   No
21-Dec-2018   Project Close complete   Close   No

Project Info

Project
Wiki Database Resilience Improvement
Code
INF129
Programme
ISG - IS Applications Infrastructure (INF)
Management Office
ISG PMO
Project Manager
David Watters
Project Sponsor
Stefan Kaempf
Current Stage
Close
Status
Closed
Project Classification
Grow
Start Date
02-Oct-2017
Planning Date
19-Jan-2018
Delivery Date
29-Mar-2019
Close Date
12-Apr-2019
Programme Priority
5
Overall Priority
Higher
Category
Compliance