Data Engineer x5 (Jnr to Mid-level)
- SC Clearance before starting - you must have lived in the UK for the past 5 years to apply
- Fully remote working anywhere in the UK - overseas applications will not be considered
- £24.50 to £27.50 per hour, 37 hours a week
- Duration of engagement 6 months
- Contractor position (LTD or Umbrella)
- Deadline 12 noon 16th August 2021
This post would suit someone, with the appropriate skills from a specialist Operational Delivery, Statistics, Social Research, Economics, Operational Research, or Digital Data and Technology Background.
A key aspect of the role will be an ability to quickly digest, troubleshoot and understand a range of large administrative and wider non-survey data series working with the Senior lead in the area. The post will also provide a core coding resource for the rising demand in delivering the underlying processes and code to validate and build key administrative data series. This will involve specifying history, validating data and building coherent and accessible data structures on a series-by-series basis for distribution to users inside and outside of the ONS.
The role also offers use of software tools such as Python, PySpark, SQL, Impala, Hive and Spark, for processing, analysing and validating data in a Cloudera distributed computing Environment.
- Support the administrative data engineering strategy, working closely with managers and other teams in ONS Data Architecture Division in implementing and developing standard data validation, structuring and standardisation protocols to non-survey data. This is to enable business-wide roll-out of the data and ensure it is used consistently across ONS.
- To develop, test and document the end-to-end process for developing ETL data pipelines to ingest and transform large administrative and commercial data flows as they come into the ONS.
- Supporting (if experienced users of Cloudera tools), other areas of ONS, working collaboratively with teams in validating and linking key non-survey data sources. The key goal is to contribute to the data's broader development and alignment by enabling business areas to integrate the data to produce better and more timely statistical outputs.
Skills and Experience
- Ability to communication at all levels to both technical and non-technical audiences
- experienced in handling and manipulating very large data sets
- experienced in using a variety of statistical or data engineering programming tools and techniques, especially in distributed computing with Hadoop
- share information appropriately and build supportive, trusting and professional relationships
- The applicant should also be experienced with one (or preferably) more of the following programming languages:
- - Python;
- - SQL (Hive and Impala);
- - Spark (Pyspark or Scala).
- desirable Experience in:
- version control using git and GitHub
- using the Hadoop Distributed File System