Senior Data Scientist
Remote role - UK based
Government client - SC clearance preferable.
Initially until 30th April 2022
INSIDE IR35 - Client open to daily rates.
- Please note that good knowledge of Python, Google Cloud Platform and building GUIs (dashboards) is required
- Experience working with complex metadata would greatly benefit the project.
- Develop a codebase for collating information regarding multiple projects running in GitHub, SharePoint and others
- Build a workflow and dashboards according to the user requirements around regular reporting
- Test, maintain, improve the codebase iteratively during development
- Helping with installation, configuration, and development of a dedicated instance of a processing environment within Google Cloud Platform
- Help capture user requirements in the form of user stories and deliverable backlog tasks
- Interact with GitHub REST API to obtain authorised metadata
- Apply SQL (BigQuery/BigTable) and Python for processing GitHub metadata
- Provide a dashboard with either DataStudio, PowerBI or other suitable platforms
- Use Terraform to enable infrastructure as code
- Iterative development following a tailored agile approach
- Regular reporting to the project team and on-demand to the Delivery Team
- Deliver well tested and dependable code
- Deliver clear code documentation
- Deliver a system that can be handed over and maintained
- Deliver production quality processing pipeline
- Investigate issues and anomalies in the GitHub data
- Recommendations/Measures to improve security of data and code as required
- Deliver version controlled, well documented, clean, maintainable code
- Support platform users and engage with stakeholders
This project aims to deliver an improvement upon a prototype system, where initial user requirements have been implemented and further ones explored. The skills required align with the needs of the project so the candidate will be expected to demonstrate these skills.
- Data engineering - such as the design of algorithms, implementation of cloud-hosted solutions, multi-core/distributed processing, SQL and datalike systems, statistical analysis languages and tooling
- Software engineering - software installation and distribution, algorithm and implementation optimisation, design for maintainability (SOLID), Continuous Integration/Continuous Delivery, implementation of user interfaces, user requirements capture and the software lifecycle
- Data management/curation - such as the manipulation and analysis of complex, high-volume and high-dimensionality data, distributed processing, relational and non-relational databases, cloud storage and data management, interoperability and standardisation for data, metadata management
- Storytelling and data visualisation - including the visualisation of insights drawn from data and the building of data driven products
- SyOPs and GitHub training will be mandatory before access can be given to collaboration tools
- A portfolio of open-source projects and contributions
- Any other tools, techniques or programming languages, e.g. Apache Spark, Docker, Kubernetes, Bash
- Data analytics - such as supervised and unsupervised machine learning, natural language processing, geospatial analysis, econometrics and regression, microdata, and causal inference
- SC clearance
Some travel to Campus/Government locations will be required as and when restrictions allow.