Background
This project is the Annual Research Infrastructure Refresh Cycle. This project replaces and hopefully Infrastructure that is out of support. This is generally Dell kit that is over 5 years old, but not exclusively.
Scope
The high level scope of this year’s refresh is as following.
-
Mellanox Switch and Network deployment.
- Replace the existing switches and network that are going out of support.
- Specify and order kit and peripherals. All required kit onsite by the end of June.
- All Cabling and required from Alces will be implemented by the end of June.
- All switches will be tested and onsite by end of June. As yet we don’t have agreement when they will go live.
- The scope of the project run until the New Switches are successfully commissioned.
- Eddie Node Replacement.
- Buy a number of Eddie Compute nodes to replace some of the number that have gone out of Warranty.
- We have issues with power in the Data Centre (ACF CR2) in that the room has no available UPS capacity. So we’re still in discussions with Data Centre Management on how we overcome this. To this end we have delivered Power Utilization Chart of the room before and after our proposed changes.
- Once we get this agreement, we can start thinking about deployment plans.
- Clarify and agree budget available.
- The scope of the project will run until the nodes are racked, stacked and have an operating system installed.
- Incorporating them into the New Scheduler is out of scope.
- VMWare Servers.
- The Research Services VMWare servers have reached end of life.
- These will be replaced and improved on.
- Scope of the project will be from definition, purchase, rack, stack, power up and bringing into service.
- Tape Drives in the Primary Tape Rack.
- We currently have 14 tape drives in the TS3500 at the ACF. These are pretty much fully utilised, so we will increase this number to 20, the current maximum. This will shorten the time it takes to restructure the TSM Library and in the longer time add to the general capacity of the Service.
- Scope is to Spec, purchase and implement Drives.
Objectives
Mellanox.
The primary objective of the Mellanox deployment is to enable the platform to continue to function as the existing switches and network are going out support.
There is a lesser objective to increase network throughput. This can potentially be quantified at a later stage if the change goes ahead.
Eddie Nodes
Primary objective is to replace a volume of out of support nodes. Ideally if we can improve or increase capacity of the service in any way while we do this then it’s a bonus.
We have another objective from the ACF to stay within the (UPS) power footprint that we currently use or reduce. This has never been clearly defined though.
Deliverables
Mellanox.
- 15 Mellanox Switches and related networking.
- We will get a detailed Services document from the primary contractor.
- We will get out of rack networking done by an ACF preferred contractor.
- We will create an implementation plan, when we can agree an implementation window.
Eddie Nodes.
- We will buy an agreed number of nodes to an agreed budget.
- These nodes will be racked, stacked and powered up at the ACF after being tested by a third party contractor (Alces).
- The nodes will consume less than or equal to the current amount of UPS powers currently used at the ACF.
- The nodes will have an operating system installed on them and be powered up ready to be included into the Scheduler.
VMWare Servers.
- The purchase and bringing into service of VMWare servers to replace the kit that has gone out of warranty.
- A migration plan agreed between the Enterprise and Research Services teams.
Additional Tape Drives
- The purchase and implementation of additional tape drives to the TS3500 and the ACF.
Benefits
-
Mellanox Switch and Network deployment.
- Primary benefit is that we have network and switches that are under warranty.
- There are additional benefits around faster throughput and extendibility. These are secondary though and not clearly defined.
- Eddie Node Replacement.
- As above, the Primary Benefit is to replace existing kit that is out of warranty thus giving us a supportable, robust platform.
- Again, there are additional benefits, increased number of cores. Etc. These are not defined as requirements though and will not be measurable.
- VMWare Servers.
- Replace existing kit that is out of warranty thus giving us a supportable, robust platform. Tape Drives in the Primary Tape Rack.
- Additional Tape Drives
- Increate the capability of the Tape Backup Service to keep up with increased use.
Success Criteria
Mellanox All kit ordered and delivered before the end of June. All Switches and related network under support for 5 years. Network performance is as good as or better than the kit that was replaced.
Eddie Nodes All kit is ordered and delivered before the end of June. All replacement nodes are under warranty for 5 years. Replacement nodes fall within the agreed power envelope in the Data Centre. Node performance is as good as or better that the kit being replaced.
VMWare Servers. All kit is ordered and delivered before the end of June. All replacement kit is under warranty for 5 years. Migration onto the new kit is complete by end of July 2019.
Additional Tape Drives. All kit is ordered and delivered before the end of June. All replacement kit is under warranty for 5 years. Derives are installed in the Tape Rack by end of July 2019.
Current project status
| Report Date | RAG | Budget | Effort Completed | Effort to complete |
|---|---|---|---|---|
| July 2019 | BLUE | 0.0 days | 0.0 days | 0.0 |
