6/9/13 - WIKI resilience improvements

Attending:

Morna Findlay, Iain Fiddes, Ana Heyn, Riky Harris

Expectation of a priority One Service

Ana explained that the resilience requirements of a priority one service have changed and the current wiki is not very resilient compared to what could be expected to be achieved.

A priority one service should be robust, resilient and able to withstand failure. In the light of the loss of a whole site ( AT or KB) the service should be able to continue without manual intervention.

Improving resilience of WIKI service

Iain sketeched out two different approaches to improving the resilienc eof the wiki service.

clustered solution

This would mean having three or more applications at each site which would continually update each other.

This would be resource-heavy to implement and would make patching and planned outages more difficult.

This could be a good solution if increased capacity is required.

"observer" solution

This solution would ensure that the loss of a site could be resiliently handled using an "observer" process sited outwith AT and KB. This solition has been proven to work but has not been implemented in any production service.

devtech reccomend this approach as the best solution to improving the resilience of the wiki service.

ACTION:

Iain will provide an estimate for the "observer" soluton

Morna will invite devtech and production to a wiki team meeting to present their recommendations so that the project team can consider adding improved resilience to the project scope.

This article was published on Oct 18, 2016