Clerical Matcher x8
Contract until 23rd November 2021
£14 per hour
Titchfield - Office based
This is an exciting opportunity to work within a project that will help to deliver quality outputs for Census 2021. The role provides support to Census by comparing information within data records from Census returns and a follow up survey called the Census Coverage Survey. By making the comparisons you will be helping ONS work out how many people have been missed and also find anyone who may have completed a questionnaire twice. The comparisons you make will help with decisions on planning and funding services in your community.
Interested? Join the team and make a lasting impact.
We are looking for people to who can work to a very high level of accuracy to ensure correct decisions are made - the more similarities and differences that can be found correctly, the better the estimation process for Census outputs is.
This is a 3-month temporary contract role, with an immediate starting date for suitably skilled data engineers to provide central support to the Core Data Engineering Team around the development and processing of data deliveries from another government department as well as proving programming and data engineering support around the delivery of a number of elements of business survey redevelopment to make surveys more responsive to the economic disruption arising from contemporary events.
This will involve developing operational and ONS data to form the basis of a series of products as well as applying established data engineering and data modelling methods to the data for the main stage of the product build. This involves using key programming languages such as Python (through Spark), Scala (through Spark) and SQL (through Hive and Impala) in a big data context. This role will require the development of an understanding of key government operational data and the contractor to quickly develop a familiarity with the ONS Business Register data and collection methods and specifically the role will involve:
- Development of ETL methods for a range of internal ONS and external data sources converting thee from unstructured formats to dynamic tables and views in Hive and Impala;
- Providing coding support and coaching to a growing team of data engineers sharing best practice and established methods.
- Assisting in the development of analytical layers of data from raw HDFS files for use in producing a range of ONS outputs.
As indicated above, the role requires experience of manipulating big data using Hadoop-based tools such as Hive, Impala and Spark so experience of data manipulation and querying using Python and SQL in such tools is a core element of the position.
- Collaborating with key members of the Data Engineering team to develop automated coding solutions for a range of ETL, data cleaning, structuring and validation processes.
- Working with large semi-structured datasets to construct linked datasets derived from multiple underlying sources as well as supporting the wider team in delivering a range of data profiles across key strategic administrative data flows.
- Working with area leads across the broader Data Architecture Division providing ad-hoc coding support on a range of projects underway in Data Architecture utilising cross-government data;
- Assisting in a range of ETL and warehousing design projects in migrating data from a number of legacy ONS environments;
- Proving training and coaching to new members of staff across the Data Engineering team.
- Good inter-personal and communication skills;
- Self-starter e.g. problem solving and taking initiative to sort out issues;
- A quick learner, able to assimilate the complexities of unfamiliar data sources and business data requirements;
- Flexibility in being able to work on several projects at the same time.
Skills and Experience
- Extensive proven experience of data engineering and architectural techniques, including data wrangling, data profiling, data preparation, metadata development, and data upload/download;
- Proven experience of 'big data' environments, including the Hadoop Stack (Cloudera), including data ingestion, processing and storage using HDFS, Spark, Hive and Impala;
- Extensive hands-on experience of developing ETL functionality in a cloud or on-premise environment;
- Experience of using tools such as python and SQL (in Spark) to profile, query and structure large-volume data;
- Proven experience of using Cloud Services particularly in the context of Hadoop;
- Experience of developing/utilising programming and query languages e.g. SQL (Hive Impala specifically), Python (through Spark), Scala.
- Understanding of data bases and applying data models in relational database formats.
* Experience of coaching and training others in programming and ETL techniques;
* Experience of UK Government, particularly HMRC Administrative Data;