Data Handling Specialist
- SC cleared before starting
- Fully remote working anywhere in the UK
- Rate £18.36 - £20.40 per hour, Inside IR35 (LTD or Umbrella)
- 6 months
- Deadline 12 noon 11th August 2021
TYPICAL ROLE RESPONSIBILITIES
A key aspect of the role will be an ability to quickly digest, troubleshoot and understand a range of large administrative and wider non-survey data series working with the Senior lead in the area. The post will also provide a core coding resource for the rising demand in delivering the underlying processes and code to validate and build key administrative data series.
This will involve specifying history, validating data and building coherent and accessible data structures on a series-by-series basis for distribution to users inside and outside of the business.
The role also offers use of software tools such as Python, PySpark, SQL, Impala, Hive and Spark, for processing, analysing and validating data in a Cloudera distributed computing Environment.
- Support the administrative data engineering strategy, working closely with managers and other teams in business Data Architecture Division in implementing and developing standard data validation, structuring and standardisation protocols to non-survey data. This is to enable business-wide roll-out of the data and ensure it is used consistently across the business.
- To develop, test and document the end-to-end process for developing ETL data pipelines to ingest and transform large administrative and commercial data flows as they come into the business.
- Supporting (if experienced users of Cloudera tools), other areas of the business, working collaboratively with teams in validating and linking key non-survey data sources. The key goal is to contribute to the data's broader development and alignment by enabling business areas to integrate the data to produce better and more timely statistical outputs.
Skills and Experience
- ability to work within a busy team
- Ability to communication at all levels to both technical and non-technical audiences
- experienced in handling and manipulating very large data sets
- experienced in using a variety of statistical or data engineering programming tools and techniques, especially in distributed computing with Hadoop
- share information appropriately and build supportive, trusting and professional relationships
- The applicant should also be experienced with one (or preferably) more of the following programming languages:
- SQL (Hive and Impala);
- Spark (Pyspark or Scala).