Closure Report

Background 

University of Edinburgh Distance Learning at Scale courses are to be delivered via a partner platform. 

Learner data from this platform needs to be accessed:  

  • For learning analytic tools that provide coaching feedback to learners based on actions taken in the MicroMasters modules;  

  • Course-based administration processes (e.g. identification of learners who have passed all modules in the MicroMasters and are eligible for the capstone). 

  • To generate reports on course usage and performance  

This project will design and implement an automated process and infrastructure to download, decrypt the data package and to make it available via a Relational Database Management System (RDBMS).  

Scope

This scope of the project is to: 

  • Design the automated process by which data is downloaded, decrypted and made available via an RDBMS. 

  • Build the infrastructure and workflows that will deliver this process in a resilient infrastructure that can be mirrored on both Test and Live  environments using the data supplied by eDX

  • Document the data flow processes and ensure they meet legal requirements and university policy. 

The scope of the project does not include: 

  • Building the environments for the OnTask learning analytics tool or identifying how the use of OnTask should be incorporated into pilot DLAS programmes. 

  • The gathering of student data for UoE admissions systems during registration for the capstone assessment. (We assume that the data from edX on learners is so minimal that the benefits of part-populating UoE systems from edX data will not be worth the effort required). 

Project Summary

The project has delivered 

  • The provision of a MariaDB environment for Dev, Test and Live across two sites 
  • The provision of a MongoDB environment for Dev, Test and Live at one site 
  • An automated process to download data from the eDX Amazon S3 storage to secure UoE databases.  The process includes an automated 
    • decryption of data packages based on the existing Data Czar data key
    • upload of weekly eDX institutional sql data to a new MariaDb environment 
    • upload of daily json clickstream\event log data to a new nonsql MongoDB environment  
  • A number of summary views relating to course activity 

In delivering the above, it should be recognised that this project has been required to implement a number of new technologies and processes 

  • The preferred database environment for the weekly course data was a windows sql environment and would have used MySQL.  However, in recognising the Applications Directorate strategy of migrating from MySQL to MariaDB, the project was one of the early adopters of the new MariaDB environments 
  • The preferred database environment for the daily clickstream was MongoDB based on the need to process JSON files.  This was a new technology for the Applications Directorate 
  • To simulate the live environment of downloading data from an Amazon S3 environment, an Amazon S3 storage area was commissioned by Applications Directorate to enable the download process to be fully tested 
  • Additionally it should be noted that that the development process was made more difficult as a result of the fact that data structures did not mirror the technical documentation supplied by the partner platform

Given the effort required to deliver the underlying infrastructure, the project manager would like to acknowledge the efforts of Peter Pratt, Peter Jackson and Nigel Binns and the contract developer Ciaran Doherty.  In addition , the patience and understanding demonstrated by the business partners, Myles Blaney and David Thresher.

 

Objectives 

Phase Achieved 
O1. Ensure Data Protection requirements are identified  Yes
O2. Deliver a Platform Data Store for DLAS Yes 

 

Deliverables

Phase Priority Achieved
D1.1  Produce a Data Privacy Impact Assessment (DPIA) plus any other documentation identified as being required as a result of writing the DPIA.  Must Yes.  The DPIA was created and verified by the Data Protection Officer 
D2.1 The creation of both a Test and Live Production environments Must  Yes.  Dev, Test and Live environments were created for both the MariaDB and MongoDB  database environments 
D2.2 A resilient infrastructure ensuring data is always available Must

Mostly.

The MariaDB environment has resilience with primary and secondary database environments been set up 

The MongoDB is only running in a single environment as the database can be re-built from the platform supplier, if neccessary

D2.3  An automated mechanism for downloading and storage of encrypted data packages (weekly database snapshot of Edinburgh X courses and daily clickstream\event log data) from edx.org Amazon S3 cloud storage to a UoE IS locally-hosted infrastructure. Must  Yes 
D2.4 An automated decryption of data packages based on the existing Data Czar data key.  Must Yes
D2.5 An automated process for importing the weekly database snapshot into an RDBMS.  Must Yes
D2.6 Creation of additional tables of designated event log \ clickstream data  Must Yes
D2.7 An automated process for importing designated event log \ clickstream data  Must Yes
D2.8 Error handling mechanism for automated decryption and import processes of data packages (which allows the process to complete with any errors flagged).  Must Yes
D2.9 Documentation of the database schema, automated processes and error handling processes. Must Yes
D2.10 Consultation & creation of basic reporting of the EdinburghX data (e.g. user x is enrolled on what courses, course x has the following user enrolled) Must Yes 

 

Analysis of Resource Usage:

Staff Usage Estimate:

Project Proposal: 50 days

After Planning: 105 days 

Staff Usage Actual: 116 days

Other Resource Estimate: £xxx

Other Resource Actual: £xxx

Other Resource Variance: xx%

 

Breakdown by Team 

Team Estimate (After Planning) Actual Difference Reason for Difference 
Project Services 17 24 +7 days  Elongated project time lines
Development Technology 30.5 20 - 10.5 days Level of contingency built into the estimate to cater for the unknown impact of the technical design 
Software Development 28.5 67 +36.5 days 

Additional effort required to complete the development of the download process due to

  • working with eDX data 
  • the new technologies of MariaDB, MongoDB
  • the requirement to incorporate an amazon S3 cloud storage into the Dev and Test environments to protect the eDX security keys 
  • incorporation of the eDX security key into the download process 
  • the management of the contracted software developer 

For information, the contract software developer accounted for 35 days 

Applications Management 19.5 2 -17.5 days 

Combination of

  • Over estimation of actual activity 
  • As a result of the download design process not requiring a user interface, there was no requirement to be involved in User acceptance testing 
Technology Management 9.5 2 -7.5 days  Over estimation of actual activity
Service Management 0 0.2 +0.2 days   
Total 105 116  +11 days   

 

Explanation for variance

From a budget perspective

  • The original planning budget was greatly underestimated in terms of the work involved.   This was reflected at the end of the planning phase when the budget was increased to 105 days.  This change is detailed in piccl item 1
  • During the design phase of the project, it was noted that additional effort would be required to incorporate the processing of the clickstream data, resulting in a further 15 days being requested to increase the budget to 120 days.  This is reflected in the piccl item 4

From a time perspective

  • There was the requirement to move the delivery date from April to June due to the following reasons;
    • The lack of available developer resource resulted in the need to employ a contract developer
    • There was a delay whilst approval was given to utilise the database environments MariaDB and MongoDB 
    • These delays are referenced  in piccl item 3 and item 4
  • In an attempt to combat further delays from a resourcing perspective, the priority of the project was increased to 'Higher' in May (ref piccl item 6)
  • There was a  further requirement to move the delivery date from June to August as a result of 
    • The preparation of the Dev and Test environments for both MariaDB and MongoDB
    • Issues with eDX data 
    • Additional effort to create a UoE environment within Amazon cloud environment
    • Additional effort placed on Development Lead after contractor left 
    • These delays are referenced in piccl item 7 
  • Whilst the deployment to live was actioned on 14th August, unexpected issues regarding Firewalls and memory utilisation on the Python server were encountered.  These were resolved and after confirming the download would run unhindered during the day, the download was scheduled to run overnight and has been running successfully since 25th August.  As a result, there has been the requirement to reschedule the project closure from August to September.  This delay is referenced in piccl item 8

Key Learning Points

  • Assumptions regarding the level of understanding within the project team regarding the data being received proved to be a little optimistic.  In hindsight, the project would have benefited from a more detailed analysis of the data at the outset, as this might have influenced the overall design regarding the level of data extracted 
  • In employing a contract developer for this project, the level of oversight required by the Development Lead was greatly underestimated 
  • In recognising that new technologies were to be employed, namely, MariaDB, MongoDB and Amazon S3 storage, additional time should have factored in to the delivery and so reduced the pressure across the development teams 
  • Requirement to ensure that technical briefing updates were fully communicated regularly across the project team 

 

Outstanding Issues

 

Project Info

Project
Establishing Platform Data Store for DLAS
Code
DLAS012
Programme
Distance Learning at Scale (DLAS)
Management Office
ISG PMO
Project Manager
Andrew Stewart
Project Sponsor
Anne-Marie Scott
Current Stage
Close
Status
Closed
Project Classification
Grow
Start Date
03-Sep-2018
Planning Date
11-Jan-2019
Delivery Date
23-Aug-2019
Close Date
29-Nov-2019
Overall Priority
Higher
Category
Discretionary

Documentation

Close