Data Engineer - Central Government
Current SC Clearance required to apply - highly sensitive data
- Fully Remote Working
- Until 31st March 2021 initially, likely to extend
- Deadline: Thursday 14th January 2021 at Noon
- Tech stack: Hadoop Cloudera, Hive, Spark, Impala
We are looking for a data engineer to do coding and data engineering. Their day to day work will be coding PySpark and SQL to explore data, design data cleaning and standardisation methods, derive variables, stack datasets together, package code into pipelines and packages, and other very hands on coding data engineering.
- Ability to work within a busy team
- Ability to communicate at all levels to both technical and non-technical audiences
- Ability to successfully manage, support self and team to deliver agreed goals and objectives
- Experienced in handling and manipulating very large data sets
- Experienced in using a variety of tools and techniques (such as those common to a Hadoop)
- Experienced in python in the context of PySpark
- Experienced in SQL in the context of big data
- Experience in building pipelines and packaged code
- Experience with SQL in the context of Hadoop
- Version control using git and GitHub
- Experience of working in CDSW, Hive, Impala and DAP
- Experience of tasks related to data linkage, especially cleaning and standardising datasets to be linked
As a Data Development Specialist within Data Architecture you will provide the working-level lead around developing code and processes to standardise, prepare for linkage, and more broadly exploit key strategic external data for a variety of wider business products and outputs. The core focus is on developing externally-provided operational data tailored to enable consistent cross-business use. This involves using key programming languages such as Python and SQL and tools such as Spark, Hive and Impala to ensure that these data are completely fit-for-purpose and in suitable format for inclusion in a wide range of downstream statistical and analytical products.
Also central to the role is identifying and integrating the business rules and operational definitions our suppliers use into these feeds before they are rolled out across the wider business. This is a key part of Data Architecture's responsibility in standardising and preparing externally-acquired data for. In collaboration with other business areas and external data suppliers, the post holder will work with stakeholders both inside Data Architecture as well as other government departments to develop the technical processes that underpin the data engineering and development of key data.