Data Administrator (Data Handling Specialist)
- SC cleared while in role
- Fully remote working anywhere in the UK
- Self-Employed Contractor positions, Inside IR35
- £18 - £20 per hour, 37 hours a week
- Duration: 6 months initially
- Deadline: 12pm, 9th June 2021
The post will provide pivotal programming support to a team developing and delivering the programmes to import, validate and structure key administrative data that will flow into the wake of the Digital Economy Act (DEA). The role is based in an expanding and busy team that supports key business areas from Social Surveys to the Economic Accounts.
TYPICAL ROLE RESPONSIBILITIES
A key aspect of the role will be an ability to quickly digest, troubleshoot and understand a range of large administrative and wider non-survey data series working with the Senior lead in the area. The post will also provide a core coding resource for the rising demand in delivering the underlying processes and code to validate and build key administrative data series. This will involve specifying history, validating data and building coherent and accessible data structures on a series-by-series basis for distribution to users.
The role also offers use of software tools such as Python, PySpark, SQL, Impala, Hive and Spark, for processing, analysing and validating data in a Cloudera distributed computing Environment.
* Support the administrative data engineering strategy, working closely with managers and other teams Data Architecture Division in implementing and developing standard data validation, structuring and standardisation protocols to non-survey data. This is to enable business-wide roll-out of the data and ensure it is used consistently.
* To develop, test and document the end-to-end process for developing ETL data pipelines to ingest and transform large administrative and commercial data flows as they come.
* Supporting (if experienced users of Cloudera tools), other areas, working collaboratively with teams in validating and linking key non-survey data sources. The key goal is to contribute to the data's broader development and alignment by enabling business areas to integrate the data to produce better and more timely statistical outputs.
Essential Skills and Experience:
* Ability to work within a busy team
* Ability to communication at all levels to both technical and non-technical audiences
* experienced in handling and manipulating very large data sets
* experienced in using a variety of statistical or data engineering programming tools and techniques, especially in distributed computing with Hadoop
* share information appropriately and build supportive, trusting and professional relationships
Desirable Experience in:
* version control using git and GitHub
* using the Hadoop Distributed File System