Any questions before applying, please call Zoe on 01173320836
This will involve developing operational and data to form the basis of a series of products as well as applying established data engineering and data modelling methods to the data for the main stage of the product build.
This involves using key programming languages such as Python (through Spark), Scala (through Spark) and SQL (through Hive and Impala) in a big data context.
This role will require the development of an understanding of key government operational data and the contractor to quickly develop a familiarity with Census data and collection methods and specifically the role will involve:
- Development of ETL methods for a range of internal ONS and external data sources converting these from unstructured formats to dynamic tables and views in Hive and Impala;
- Supporting the development and transformation of raw Census 2019 Rehearsal data, making these available to core ONS users in near real-time;
- Providing coding support and coaching to a growing team of data engineers sharing best practice and established methods.
- Assisting in the development of analytical layers of data from raw HDFS files for use in producing a range of outputs.
As indicated above, the role requires experience of manipulating big data using Hadoop-based tools such as Hive, Impala and Spark so experience of data manipulation and querying using Python and SQL in such tools is a core element of the position.
- Collaborating with key members of the Data Engineering team to develop automated coding solutions for a range of ETL, data cleaning, structuring and validation processes.
- Working with area leads across the broader Data Architecture Division providing ad-hoc coding support on a range of projects underway in Data Architecture utilising cross-government data;
- Forming part of a joint project team with Census group to deliver a number of primary data outputs in support of the 2019 Census Rehearsal;
- Proving training and coaching to new members of staff across the Data Engineering team
- Extensive proven experience of data engineering and architectural techniques, including data wrangling, data profiling, data preparation, metadata development, and data upload/download;
- Proven experience of 'big data' environments, including the Hadoop Stack (Cloudera), including data ingestion, processing and storage using HDFS, Spark, Hive and Impala;
- Extensive hands-on experience of developing ETL functionality in a cloud or on-premise environment;
- Experience of using tools such as python and SQL (in Spark) to profile, query and structure large-volume data;
- Proven experience of using Cloud Services particularly in the context of Hadoop;
- Experience of developing/utilising programming and query languages e.g. SQL (Hive Impala specifically), Python (through Spark), Scala.
- SC-level clearance valid for at least 1 year on commencement of the contract;
- Understanding of data bases and applying data models.
- Experience of coaching and training others in programming and ETL techniques;
- Experience of UK Government Administrative Data;