Closure Report
Background
University of Edinburgh Distance Learning at Scale courses are to be delivered via a partner platform.
Learner data from this platform needs to be accessed:
-
For learning analytic tools that provide coaching feedback to learners based on actions taken in the MicroMasters modules;
-
Course-based administration processes (e.g. identification of learners who have passed all modules in the MicroMasters and are eligible for the capstone).
-
To generate reports on course usage and performance
This project will design and implement an automated process and infrastructure to download, decrypt the data package and to make it available via a Relational Database Management System (RDBMS).
Scope
This scope of the project is to:
-
Design the automated process by which data is downloaded, decrypted and made available via an RDBMS.
-
Build the infrastructure and workflows that will deliver this process in a resilient infrastructure that can be mirrored on both Test and Live environments using the data supplied by eDX
-
Document the data flow processes and ensure they meet legal requirements and university policy.
The scope of the project does not include:
-
Building the environments for the OnTask learning analytics tool or identifying how the use of OnTask should be incorporated into pilot DLAS programmes.
-
The gathering of student data for UoE admissions systems during registration for the capstone assessment. (We assume that the data from edX on learners is so minimal that the benefits of part-populating UoE systems from edX data will not be worth the effort required).
Project Summary
The project has delivered
- The provision of a MariaDB environment for Dev, Test and Live across two sites
- The provision of a MongoDB environment for Dev, Test and Live at one site
- An automated process to download data from the eDX Amazon S3 storage to secure UoE databases. The process includes an automated
- decryption of data packages based on the existing Data Czar data key
- upload of weekly eDX institutional sql data to a new MariaDb environment
- upload of daily json clickstream\event log data to a new nonsql MongoDB environment
- A number of summary views relating to course activity
In delivering the above, it should be recognised that this project has been required to implement a number of new technologies and processes
- The preferred database environment for the weekly course data was a windows sql environment and would have used MySQL. However, in recognising the Applications Directorate strategy of migrating from MySQL to MariaDB, the project was one of the early adopters of the new MariaDB environments
- The preferred database environment for the daily clickstream was MongoDB based on the need to process JSON files. This was a new technology for the Applications Directorate
- To simulate the live environment of downloading data from an Amazon S3 environment, an Amazon S3 storage area was commissioned by Applications Directorate to enable the download process to be fully tested
- Additionally it should be noted that that the development process was made more difficult as a result of the fact that data structures did not mirror the technical documentation supplied by the partner platform
Given the effort required to deliver the underlying infrastructure, the project manager would like to acknowledge the efforts of Peter Pratt, Peter Jackson and Nigel Binns and the contract developer Ciaran Doherty. In addition , the patience and understanding demonstrated by the business partners, Myles Blaney and David Thresher.
Objectives
| Phase | Achieved |
| O1. Ensure Data Protection requirements are identified | Yes |
| O2. Deliver a Platform Data Store for DLAS | Yes |
Deliverables
| Phase | Priority | Achieved |
| D1.1 Produce a Data Privacy Impact Assessment (DPIA) plus any other documentation identified as being required as a result of writing the DPIA. | Must | Yes. The DPIA was created and verified by the Data Protection Officer |
| D2.1 The creation of both a Test and Live Production environments | Must | Yes. Dev, Test and Live environments were created for both the MariaDB and MongoDB database environments |
| D2.2 A resilient infrastructure ensuring data is always available | Must |
Mostly. The MariaDB environment has resilience with primary and secondary database environments been set up The MongoDB is only running in a single environment as the database can be re-built from the platform supplier, if neccessary |
| D2.3 An automated mechanism for downloading and storage of encrypted data packages (weekly database snapshot of Edinburgh X courses and daily clickstream\event log data) from edx.org Amazon S3 cloud storage to a UoE IS locally-hosted infrastructure. | Must | Yes |
| D2.4 An automated decryption of data packages based on the existing Data Czar data key. | Must | Yes |
| D2.5 An automated process for importing the weekly database snapshot into an RDBMS. | Must | Yes |
| D2.6 Creation of additional tables of designated event log \ clickstream data | Must | Yes |
| D2.7 An automated process for importing designated event log \ clickstream data | Must | Yes |
| D2.8 Error handling mechanism for automated decryption and import processes of data packages (which allows the process to complete with any errors flagged). | Must | Yes |
| D2.9 Documentation of the database schema, automated processes and error handling processes. | Must | Yes |
| D2.10 Consultation & creation of basic reporting of the EdinburghX data (e.g. user x is enrolled on what courses, course x has the following user enrolled) | Must | Yes |
Analysis of Resource Usage:
Staff Usage Estimate:
Project Proposal: 50 days
After Planning: 105 days
Staff Usage Actual: 116 days
Other Resource Estimate: £xxx
Other Resource Actual: £xxx
Other Resource Variance: xx%
Breakdown by Team
| Team | Estimate (After Planning) | Actual | Difference | Reason for Difference |
| Project Services | 17 | 24 | +7 days | Elongated project time lines |
| Development Technology | 30.5 | 20 | - 10.5 days | Level of contingency built into the estimate to cater for the unknown impact of the technical design |
| Software Development | 28.5 | 67 | +36.5 days |
Additional effort required to complete the development of the download process due to
For information, the contract software developer accounted for 35 days |
| Applications Management | 19.5 | 2 | -17.5 days |
Combination of
|
| Technology Management | 9.5 | 2 | -7.5 days | Over estimation of actual activity |
| Service Management | 0 | 0.2 | +0.2 days | |
| Total | 105 | 116 | +11 days |
Explanation for variance
From a budget perspective
- The original planning budget was greatly underestimated in terms of the work involved. This was reflected at the end of the planning phase when the budget was increased to 105 days. This change is detailed in piccl item 1
- During the design phase of the project, it was noted that additional effort would be required to incorporate the processing of the clickstream data, resulting in a further 15 days being requested to increase the budget to 120 days. This is reflected in the piccl item 4
From a time perspective
- There was the requirement to move the delivery date from April to June due to the following reasons;
- In an attempt to combat further delays from a resourcing perspective, the priority of the project was increased to 'Higher' in May (ref piccl item 6)
- There was a further requirement to move the delivery date from June to August as a result of
- The preparation of the Dev and Test environments for both MariaDB and MongoDB
- Issues with eDX data
- Additional effort to create a UoE environment within Amazon cloud environment
- Additional effort placed on Development Lead after contractor left
- These delays are referenced in piccl item 7
- Whilst the deployment to live was actioned on 14th August, unexpected issues regarding Firewalls and memory utilisation on the Python server were encountered. These were resolved and after confirming the download would run unhindered during the day, the download was scheduled to run overnight and has been running successfully since 25th August. As a result, there has been the requirement to reschedule the project closure from August to September. This delay is referenced in piccl item 8
Key Learning Points
- Assumptions regarding the level of understanding within the project team regarding the data being received proved to be a little optimistic. In hindsight, the project would have benefited from a more detailed analysis of the data at the outset, as this might have influenced the overall design regarding the level of data extracted
- In employing a contract developer for this project, the level of oversight required by the Development Lead was greatly underestimated
- In recognising that new technologies were to be employed, namely, MariaDB, MongoDB and Amazon S3 storage, additional time should have factored in to the delivery and so reduced the pressure across the development teams
- Requirement to ensure that technical briefing updates were fully communicated regularly across the project team
Outstanding Issues
