SARA: Innovations to improve efficiency and enhance privacy in health data research
Semi-Automated Risk Assessment of Data Provenance and Clinical Free-Text in TREs (SARA) was one of the five DARE UK Phase 1 Driver Projects funded by UK Research and Innovation (UKRI) following an open call for proposals between October and December 2022. Over a nine-month period, from February to October 2023, the DARE UK Driver Projects investigated the requirements of what will be a UK-wide network of Trusted Research Environments (TREs) in line with
The SARA project, a collaboration of academic partners and Scottish TREs led by researchers at the University of Edinburgh and DataLoch, investigated data privacy risk assessment and efficiency. The project was funded to address the difficulties faced by health data providers, especially those within TREs, in manually ensuring the accuracy and representativeness of linked data while minimising the risk of patient identification.
The goal was to deliver semi-automated tools to optimise two critical areas of risk assessment and monitoring: Data provenance, improving the trustworthiness of how datasets are received, processed and linked to ensure they are compliant for research; and Privacy assessment, reducing the risk of identifiable information in clinical free-text records (e.g., GP letters, discharge summaries).
Outputs and Achievements
Working closely with TREs, academic partners, and members of the public, the SARA project delivered a comprehensive set of tools and approaches designed to improve the efficiency of health data audit, linkage, de-identification, and access between TREs and researchers:
- A privacy risk assessment approach: The SARA team developed an innovative approach to exploring and understanding privacy risks in clinical free-text, with the potential to apply this approach to future data in other TREs.
- A visualisation dashboard: The project team developed a prototype visualisation dashboard enabling exploration of privacy risk in clinical free-text, providing a powerful tool for data analysis and decision-making.
- An open-source data provenance framework: The SARA team project built an open-source framework for data provenance tracing within TREs, covering the entire data production workflow.
- A front-end dashboard for transparency: The SARA team built a front-end dashboard, allowing analysts, researchers, and information governance teams in TREs to inspect each step of the data workflow, ensuring quality assurance and transparency.
- Public involvement and engagement: Recognising the importance of public confidence in this innovation, the SARA project team conducted a series of activities gathering public perspectives to inform their work.
SARA’s Public Involvement and Engagement Efforts
Public involvement and engagement played a pivotal role in the SARA project’s delivery process. The team sought and incorporated public perspectives from the outset through workshops and surveys to understand the public’s concerns and priorities regarding privacy risk assessment in health data research:
- Public consultation: The SARA project adopted a robust public consultation strategy involving deliberative workshops and a survey with a representative sample of adults in Scotland. Deliberative workshops were designed and delivered to explore perspectives around risk assessment and semi-automation within health data research services, ensuring active participation and valuable insights.
- Inclusive and accessible approach: The SARA project took deliberate steps to make PIE activities inclusive, accessible, and collaborative. This involved working with Ipsos Scotland, an independent organisation experienced in designing and delivering deliberative workshops. Accessibility considerations, including scheduling workshops outside standard working hours and providing incentives for participation, were integral to the project’s approach.
The SARA project represents a leap forward in health data research innovation. Through their collaborative efforts, the project team has developed cutting-edge tools and set a precedent for collaborative, transparent, and inclusive approaches to healthcare data management for better health and care outcomes.
Visit the SARA Driver Project page to explore the final reports and outputs: SARA: Semi-Automated Risk Assessment of Data Provenance and Clinical Free-text in trusted research environments.