Data Engineering Lead
**ACTIVE SC CLEARANCE REQUIRED TO APPLY**
- fully Remote Working anywhere in the UK
- intially 3 months, to extend
- Self-employed contractor role, Inside IR35
- Python, PySpark, SQL, Hive, ETL
- Deadline: Tuesday 6th April 2021 at Noon
- Central Government
Description of Requirement:
You will provide statistical analysis on the COVID-19 pandemic for Government and Research purposes. The Data Processing Pipeline (DPP) receives data from multiple sources, processes this data in variety of ways, then provides the cleaned / linked data to several Analysis teams for their investigation.
The roles are responsible for the technical development and implementation of the processing pipelines specified and designed in conjunction with the COVID analysis teams. You will be working with a small coding team and a wider analysis team to identify the best ways of ingesting and processing the data received.
Throughout you will drive the delivery of products and services, whilst coaching team members and others to apply agile and lean principles to their deliverables. You will:
* Work with other to implement the designed pipeline from the ETL of the data at ingest through to the creation of the final analytical dataset.
* As far as is possible ensure consistency between pipelines ensuring that any associated standards are followed.
* Ensure pipelines can be adapted and re-run should data quality issues be identified
* Liaise with Analysis teams to ensure that requirements and specification for data structure and outputs are understood and incorporate in the pipeline
* Address blockers actively seeking solutions to remove them and proposing alternative routes to delivery.
* Promote a frequent incremental delivery approach
* Testing the systems developed both leading the team and carrying out the testing
Relevant Skills and Experience:
* Data engineering - technical coding skills in Python, Pyspark etc, design of algorithms, multicore/distributed processing, SQL (Hive & Impala) and noSQL (HBase) database systems, statistical analysis languages and tools E
* Data engineering - development of ETL routines for slowly changing dimensions E
* Systems engineering - applied to data/information/intelligence challenges in complex environments E
* Data management/curation - such as the manipulation and analysis of complex, high volume and high dimension data, data management, interoperability, standardisation of data, metadata E
Please apply if you are interested!