Data Engineer - Central Government
Current SC Clearance required to apply - highly sensitive data
- Fully Remote Working
- Until 31st March 2021 initially, likely to extend
- Deadline: Thursday 14th January 2021 at Noon
- Tech stack: Hadoop Cloudera, Hive, Spark, Impala
This is a 3-month temporary contract role, with an immediate starting date for suitably skilled data engineers to provide central support to the Core Data Engineering Team around the development and processing of data deliveries from another government department as well as proving programming and data engineering support around the delivery of a number of elements of business survey redevelopment to make surveys more responsive to the economic disruption arising from contemporary events.
This will involve developing operational and data to form the basis of a series of products as well as applying established data engineering and data modelling methods to the data for the main stage of the product build. This involves using key programming languages such as Python (through Spark), Scala (through Spark) and SQL (through Hive and Impala) in a big data context. This role will require the development of an understanding of key government operational data and the contractor to quickly develop a familiarity with the ONS Business Resgister data and collection methods and specifically the role will involve:
1. Development of ETL methods for a range of internal and external data sources converting thee from unstructured formats to dynamic tables and views in Hive and Impala;
2. Providing coding support and coaching to a growing team of data engineers sharing best practice and established methods.
3. Assisting in the development of analytical layers of data from raw HDFS files for use in producing a range of outputs.
As indicated above, the role requires experience of manipulating big data using Hadoop-based tools such as Hive, Impala and Spark so experience of data manipulation and querying using Python and SQL in such tools is a core element of the position.
- Collaborating with key members of the Data Engineering team to develop automated coding solutions for a range of ETL, data cleaning, structuring and validation processes.
- Working with large semi-structured datasets to construct linked datasets derived from multiple underlying sources as well as supporting the wider team in delivering a range of data profiles across key strategic administrative data flows.
- Working with area leads across the broader Data Architecture Division providing ad-hoc coding support on a range of projects underway in Data Architecture utilising cross-government data;
- Assisting in a range of ETL and warehousing design projects in migrating data from a number of legacy environments;
- Proving training and coaching to new members of staff across the Data Engineering team.
- Person Specification
- Good inter-personal and communication skills;
- Self-starter e.g. problem solving and taking initiative to sort out issues;
- A quick learner, able to assimilate the complexities of unfamiliar data sources and business data requirements;
- Flexibility in being able to work on several projects at the same time.
Skills and Experience
- Extensive proven experience of data engineering and architectural techniques, including data wrangling, data profiling, data preparation, metadata development, and data upload/download;
- Proven experience of 'big data' environments, including the Hadoop Stack (Cloudera), including data ingestion, processing and storage using HDFS, Spark, Hive and Impala;
- Extensive hands-on experience of developing ETL functionality in a cloud or on-premise environment;
- Experience of using tools such as python and SQL (in Spark) to profile, query and structure large-volume data;
- Proven experience of using Cloud Services particularly in the context of Hadoop;
- Experience of developing/utilising programming and query languages e.g. SQL (Hive Impala specifically), Python (through Spark), Scala.
- SC-level clearance valid for at least 6 months on commencement of the contract. PLEASE NOTE APPLICATIONS NOT MEETING THIS CRITERIA AT THE APPLICATION STAGE WILL NOT BE CONSIDERED;
- Understanding of data bases and applying data models in relational database formats.
- Experience of coaching and training others in programming and ETL techniques;
- Experience of UK Government, particularly HMRC Administrative Data;