Closure Report
Project Summary
This project was intended to pilot the move of a database and an application to virtual machines on the metro stretched cluster. If successful, then this would provide a proven approach to improving the resilience of our databases and applications so that, for example, we can recover from a data centre failure. The pilot was designed to use the Wiki application, one of our priority services, as the model for this work.
The project was expected to move the Wiki Application service, which was already on a virtual machine, to the virtual Metro stretched cluster. The project would then move the Database Tier to a virtual machine - this would be the first 'transactional' system to move to a VM environment. If successful, the project was expected to then move that virtual machine to the Metro stretched cluster - this was another aspect that had not been done before within the University infrastructure.
In the end, the project has ended up looking quite different to what may have been envisioned in the published brief, with some deliverables being tweaked and the original estimates being revised more than once. However, what has emerged from these changes is a solution that will hopefully enable us to migrate a high number of Oracle databases to virtual machines, and a greater understanding across our technical teams of how this can be done.
Project Scope
[Copied from Project Brief]
Phase 1 - Move the Wiki Application service, which was on a Virtual Machine, to the virtual metro stretched cluster.
Phase 2 - Move the Database Tier to VMs for Stage 1 and then to Metro stretched cluster(MSC), for Stage 2.
Stage 1- move the WIKi Database tier to a VM, including DR testing and load testing. This was a MUST, in order to provide recovery from any 'unplanned' data centre failure. As this would be the first 'transactional' system to move to a VM environment, testing and planning with key technical stakeholders and business owners was crucial to the success. A decision was expected at the end of this stage: whether the move to stage 2 should go ahead as planned or whether there was a need for further information based on the results of the testing.
Stage 2 - move the WIKi Database VMs to the metro stretched cluster. This had not been done before within the University infrastructure, and key planning and extensive testing were therefore a key part of this stage of the project.
Analysis of Resource Usage:
Staff Usage Estimate: 50 days
Staff Usage Actual: 110 days
Staff Usage Variance: 120%
Other Resource Estimate: n/a
Other Resource Actual: n/a
Other Resource Variance: n/a
Outcome
|
No. |
Description |
Priority (MoSCoW) |
Outcome |
|
O1 |
Provision of full automated resilience for the top priority Wiki Service that will ensure service excellence to the end users |
|
|
|
D1 |
Move WIKI application Tier VM onto to the metro stretched cluster |
Must Have |
Achieved (but later removed) |
|
D2 |
WIKI database move to Virtual Machines |
Must Have |
Partially Achieved (DEV only) |
|
D3 |
Move WIKi database VMs to the Metro stretched cluster |
Should Have |
Not achieved |
It has been pointed out that the project has delivered a new deliverable - a design and process for how to migrate all remaining databases to a VM infrastructure.
Explanation for variances
-
Effort
The original published effort in the project brief for this project was 50 days, but it should be pointed out that the estimation spreadsheet shows, instead, a total of 54 days. The breakdown for this is available in the estimation logs. However, after the first six months of the project, before it was suspended because of network issues and the lack of a replacement project manager, the project had only used a total of 11 days. The project restarted in August 2018 and it was thought that it might be possible to complete the major deliverable of the project - a VM-based database - within the bounds of the remaining effort.
This proved to be more difficult in practice, with additional effort for the design and build of a VM solution (particularly the Puppet work) requiring an extra 40 days effort (Issue 14). The project saw two further increases to 100 days, and then to 110 days, to get the deliverable of a VM-based Oracle database finally completed.
The table below lists the original estimated effort for each stage, with the breakdown for original phases given in brackets (where applicable). These phases were not followed after the early stages of the project because the focus became the design and build of a VM for Oracle databases. The major differences between estimated and actual effort are in the Design and Build of the VM solution (for a database) and the replanning work that was necessary for this element of the project. This is obviously a doubling of the early estimates but it has to be said that this project was very much entering 'terra incognita' and these estimates were not strictly based on past experience because this this kind of work had not been attempted before. Even getting estimates for effort and duration in the latter stages of the project proved difficult because of the unknowns around the development, testing and eventual deployment that were necessary to get to the point of delivering a solution to the requirement for hosting an Oracle database on a virtual machine.
| Stage/Task | Estimate (days) | Actual | Difference |
|---|---|---|---|
| Project Management/QA | 12.0 | 23.0 | 11.0 |
| Planning | 4.0 | 4.1 | 0.1 |
| Analysis & Design | 2.0 | 1.5 | (0.5) |
| Build | 10.0 (3/2/5) | 9.1 | (0.9) |
| Integration | 10.0 (3/4/3) | 3.8 | (6.2) |
| Acceptance | 7.0 (3/2/2) | (7.0) | |
| Deployment | 6.0 (2/2/2) | (6.0) | |
| Replanning (Oct/Nov 18) | 13.5 | 13.5 | |
| Design & Build of VM solution | 53.4 | 53.4 | |
| Handover | 2.0 | (2.0) | |
| Unplanned | 0.6 | 0.6 | |
| Closure | 1.0 | 1.0 | |
| Total | 54.0 | 110 | 56 |
-
Time
The planning stage was signed off in January 2018, and the first few milestones for the project (covering work on getting the application tier onto the metro stretch cluster) were met as scheduled. But, the project then hit some network issues that disallowed the final migration of the LIVE application to the MSC. This milestone was eventually withdrawn as it became apparent that this migration was not going to take place, and the migrations for DEV and TEST were reversed.
While the project team waited for a resolution to the reported network issue, the milestones for the latter stages of the project were also modified because of a combination of leave and the related Wiki upgrade project being started. The original PM left the University around this time and the decision was made to suspend the project until a new PM could be assigned. This change was made in August 2018 and there was a series of meetings to try and pin down new milestones. As intimated in the section above, this proved to be difficult to estimate and plan, but the development work continued. It was eventually decided that the Wiki database should be ready for migration to a VM by the end of January 2019, but there was a short delay to this, with the VM-based database eventually being delivered in March. The closure of the project has also been delayed in recent weeks because of annual leave and the workload of the PM.
-
Scope
The main changes here were the removal of the Wiki application from the metro stretch cluster because of the reported problems with Brocade switches. This was investigated by ITI but no solution was forthcoming and so the migrations of DEV and TEST were rolled back, and that of LIVE postponed. It was also decided that the database tier did not have to be migrated to the MSC so this was withdrawn.
Finally, the scope was modified to only include the migration of the DEV Wiki database to a virtual machine because of the extended duration of the project. It was believed that the successful development of this solution would allow the project to be closed, and its deliverables begin to be further developed in the follow-on project INF142, rather than spend more time on migrating TEST and LIVE. These will be picked up by project INF142.
Key Learning Points
One key point made by a member of the project team, given that the project 'evolved' as time passed, was that, "whenever we have large new technologies, we should be open to allow scope change as quite often we do not know all the technical details at the beginning. We need to be flexible and have options to change scope." This is a pertinent point when it comes to R & D-type projects: there can be a number of unknowns around how things will go; what decisions will be made, and how much effort might be required. Some of the techniques and technologies may have been developed as solutions elsewhere, but we (the University of Edinburgh) have our own requirements around our infrastructure and the technical solutions that we develop, so this type of work is not always straightforward and we may not able to use off-the-shelf answers in response to our investigations.
More feedback from those close to the project has raised questions around the cost-effectiveness of the project - balancing the final effort used against what was delivered, and whether the solution developed was the best solution. Looking back, it may have helped to suspend the project at an earlier date and await a resolution of the network issues. However, it became pressing to develop how we could migrate databases to VMs, and this became a focus for the project, especially ahead of the impending need to migrate Oracle databases to virtual machines in the next 12 months. On the second point, some team members acknowledged that the solution developed (using Puppet) could have been done differently, but there have been no suggestions as to what that might be.
Finally, this project was one in which there was a lot of work with the ITI Enterprise Unix team and this arrangement has been largely positive. Feedback from colleagues has highlighted the benefits of working closely with each other on this project, and the understanding of what each other wanted from developing a solution that emanated from regular meetings and discussions.
Outstanding Issues
There are no outstanding issues.
