Remote role - UK based
Government client - SC clearance preferable but not required.
INSIDE IR35 - £500 to £550 per day.
Job Description Goals
We are looking for a data scientist with strong coding skills who would help to improve the existing statistical method of trade data allocation for complex UK companies. The person would need to quickly develop a full understand of the existing methods, assumptions, and underlying statistical principles. As part of this work, the person will have to use some modules of an existing statistical production pipeline and create a separate analytical environment for code and data. To assess the quality of different flavours of the method, they would need to modify the existing code, to run it multiple times, and top use advanced analytics to understand the significance of method changes, and to provide recommendations on the best method, depending on the end user requirements.
The person should have previous experience in managing big data and cloud storage. They should demonstrate the ability to write efficient analytic programs in Python and SQL scripts, understanding functional and object-oriented coding and code sharing within a team. The essential part of the work is statistical analysis of big, multi-dimensional datasets of economic data, ranking and selecting the most significant statistical features, and presenting their findings for audiences with mixed technical background.
The person will work in close co-operation with professional software engineers, method specifiers and experts in National Accounts and Business Register and should be able to communicate with them efficiently.
TYPICAL ROLE RESPONSIBILITIES
Skills and knowledge
* Python, proficient user. Libraries: pandas, numpy, matplotlib. We need a person who can write their own programs with confidence, observe the best coding practicing for style, structure, and efficiency.
* Spark: PySpark, dataframes, aggregation, window functions, i/o to HDFS; Hive SQL. We need a person who has experience in writing analytic scripts in Spark, preferably PySpark, and who understands the principles of data storage and management in HDFS.
* The person should be experienced in writing programs using functions and methods, structuring their code into modules, and utilise previously built modules.
* The person would have to demonstrate good skills in commenting and annotating their scripts.
* They should demonstrate the ability to test their code at various levels, awareness of automated unit-testing and continuous integration, and ability to produce synthetic test data.
* The analytical work will be a spin-off of a large production pipeline, so the candidate has to demonstrate previous experience in using such systems, or, ideally, developing elements of them.
* As part of the analytics, the candidate should demonstrate the grasp of principle for multiple versions of data outputs for comparison and selection
* The candidate must show that they have experience in managing data versions, efficient logging and record-keeping.
* The key responsibility of the candidate is to find the most significant features of the alternative datasets, so that an informed decision can be made. They should be able to apply filtering, ranking, outlier detection and calculation of confidence intervals
* We expect the candidate to have previous experience in creating statistical summaries of big data, using contemporary visualisation techniques, and strong fundamental background in descriptive statistics.
* They should demonstrate previous experience in "drilling down" some selected data outputs and following the entire process of data transformation from the raw files to the outputs.
* A desirable, but non-essential skill is previous experience in mathematical dimensionality reduction methods.
* We need the candidate to have experience of working with multi-level hierarchical data trees, and performing grouping and aggregations at various levels of such classifications.
* The candidate needs to know the principles of de-aggregation down the using weights.
* The candidate should know how to estimate the accuracy of their finding, by calculating standard errors and confidence intervals.
* The successful candidate should be aware of basic economic concepts: turnover, trade value, employment count, productivity.
* It is desirable, although not essential, to have previous knowledge of standard classifications of products and industry types.
* It is desirable to be aware of the Supply Use balancing as part of National Accounts
* It would be an advantage if the candidate worked with any business register, and understands the structure of complex businesses.