Data Scientist - FM/00022/22

Cape Town - Fixed Term Contract, due to Specific project support

Anova is an NGO that empowers people and changes lives. Good health and quality of life is what motivates us to provide healthcare solutions and support for those who need it most.

The successful candidate will be required to support the Western Cape’s Provincial Health Data Centre. The successful incumbent will use their ETL skills, knowledge of SQL and analytics experience to ensure high quality data architecture, processes and analytics for the HIV cascade and other data cascades in the PHDC. 

Key duties and responsibilities:

Using domain knowledge, analytic and software expertise, work through all stages of the data acquisition, pre-processing and processing cycles within the Provincial Health Data Centre (PHDC).

  • Integrate data from multiple data sources.
  • Preparation - manipulate data into a form suitable for further analysis and processing.
  • Apply data extraction, transformation and loading (ETL) techniques with appropriate automation as required.
  • Conduct data cleaning, harmonization and curation to ensure every data point is appropriately harmonized, then mapped to standard ontologies, flagged for curation, or appropriately classified
  • Support data reduction: Minimise amount of data that needs to be stored.
  • Utilize appropriate tools for pre-processing large file-based datasets prior to database take-on
  • Author SQL statements and clauses to retrieve data, merge data, perform group and nested case queries and automation procedures.
  • Processing – manipulate data to generate an output or interpretation about the data which have been taken on .
  • Storage – ensure data and metadata are held for future use.
  • Architect data environment changes for optimal data management, retrieval and reporting, with appropriate consideration of performance, data security and health ontologies and normative standards and support for application lifecycle management (ALM).

Data beneficiation, interpretation, inference and analysis

  • Assemble and optimise large, complex derived data sets that meet functional and non-functional business requirements
  • Use major programming/scripting languages, e.g. SQL, Java, Ruby, Python and R, and data science approaches, to ensure maximal value, provenance and usability of processed data.
  • Work efficiently with very large datasets, with appropriate approaches to query optimisation.
  • Perform root cause analysis on internal and external data and processes to answer specific business questions, address data errors or deficiencies and identify opportunities for improvement.
  • Schedule and automate data processing to ensure timely availability of processed data.
  • Apply analytic skills when working with structured and unstructured data sets.
  • Flexibly acquire new skills and knowledge as required to deal with new domain, platform, computational or other challenges or questions.
  • Output and interpretation: ensure processed information s is suitable for stakeholder reporting purposes.

Presentation of outputs, reporting and data visualisation

  • Ensure outputs resulting from data beneficiation and analysis are converted into sensible formats for presentation both internally for PHDC data analysts as well as for external stakeholders.
  • Utilise a range of reporting and visualization tools and assimilation of new tools as required to optimize reporting against business requirements.
  • Ensure that reports that roll up curated data into aggregated form are maintained, refactored, extended and improved where possible.
  • Ensure reporting processes adhere to all standards around patient protection, data governance and auditability.
  • Report architecture constantly for optimized and efficient rendering
  • Responsible for sophisticated output visualisation potentially as PHDC dashboards.

 PHDC technical management

  • Domain management
    • Take full responsibility for a domain area in the PHDC, ensuring integrity, timeliness, evolution and innovation.
    • Put in place tests and validations to identify major and minor incidents and appropriate remediation processes.
    • Represent the PHDC in appropriate forums and WCG meetings with respect to the domain.
  • Environment leadership
    • Support technical excellence by instituting, supporting or driving peer review processes, quality documentation.
    • Support capacity development through internal processes aimed at capacity development or onboarding of new skills.
    • Develop and support systems, tools and processes which contribute to technical excellence, good data practice and efficient operations, including agile development and associated processes.
  • People management
    • Provide mentorship to junior and mid-level data scientists, interns and elective students and assist with induction and training of staff
  • Stakeholder management
    • Work with stakeholders to address data-related technical issues
    • Represent the PHDC generally in senior stakeholder meetings, being able to bring detailed domain knowledge to engagements in order to definitively plan solutions that are functionally and technically sound.

Any other tasks as agreed with line manager.

Minimum qualifications and experience:

  • A relevant degree computer science, epidemiology, information systems, science is essential.
  • Minimum of six years’ experience focused on data analytics.
  • Working knowledge of South Africa’s health information systems and monitoring and evaluation processes as they relate to data collection for performance-based reporting on the NSP.

Advantageous Qualifications, Experience and Skills

  • A Master’s degree in management or monitoring and evaluation or analytics or a related field would be an added advantage
  • At least 8 years of progressively responsible work experience

Skills, competencies, and attributes:

  • Advanced computer skills, especially SQL, Java, Ruby, Python, R, MS Excel
  • Ability to portray complex data sets in easy-to-understand formats including visualizations.
  • Well-organized with an ability to handle a large workload, deliver on short deadlines and work independently.
  • Great interpersonal and team working skills.
  • Self-motivated, problem resolution skills, perseverance and inquisitive.
  • Attention to detail.

Please specify the above reference number on the subject line of your email for a quicker response. Good luck!

Submit your CV and application letter to Fuzile Madikane at

 Closing date: 29 June 2022

 In accordance with our Employment Equity goals and plan, preference will be given to suitable applicants from designated groups as defined in the Employment Equity Act 55 of 1998 and subsequent amendments thereto.

 Anova Health Institute is a provider of essential health services and therefore has a mandatory vaccination policy. Should your application be successful, you will be required to submit your proof of vaccination before commencing employment in the role.

Applicants who have not been contacted within 4 weeks of submitting their application should assume that they have not been successful.

Unsolicited CVs from agencies will not be paid agency fees should their candidate be placed at Anova.

For more information on Anova visit our website:

Copyright Anova Health Institute - All Rights Reserved
chevron-downapartmentenvelopephonecrossmenumenu-circlecross-circle linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram