Multi-party trusted research environment federation: Establishing infrastructure for secure analysis across different clinical-genomic datasets
What we aimed to achieve
Combining data assets for research has historically involved the movement of data between organisations, but trusted research environments (TREs) now offer an alternative to data sharing by providing secure environments for approved researchers to access and analyse data. However, data are frequently still held separately, even where researchers have permission to use data from two separate TREs.
This project aimed to facilitate the TREs of the National Institute for Health and Care Research (NIHR) Cambridge Biomedical Research Centre (BRC) and Genomics England to ‘talk to each other’ to enable researchers to safely access and work with both databases without moving original data – a process known as ‘federation’. This is the UK’s first known demonstration of federation of genomic data.
Led by the Genomic Medicine BRC theme at the University of Cambridge, the project combined the expertise of platform and technology innovation enterprise Lifebit, project management support from Eastern AHSN (Academic Health Science Network) and Cambridge University Health Partners (CUHP), and had patient-public involvement throughout.
“Increasing numbers of organisations, including NHS England, are adopting the TRE model to provide research access to health data while not allowing its distribution, thereby increasing oversight and transparency while protecting privacy. Developing methods to jointly analyse data held in separate TREs is therefore critical to maximising research insights. The output from this DARE UK Sprint project is an important practical demonstration of federated analysis of genomic data with freely available code that can be built on by the global community.”
Professor Tim Hubbard, Professor of Bioinformatics, Head of Department of Medical & Molecular Genetics at King’s College London, Associate Director at Health Data Research UK and Senior Advisor at Genomics England
How we did it
Both TREs involved in the project contain rich, secure, governed sources of fully consented clinical-genomic data. Over the eight-month Sprint, we designed and implemented a system to enable federation of data between the two TREs. In a live demonstration in July, we showed that a researcher could query data within the two separate TREs to find individual records to create a group – or ‘cohort’ – of interest. For example, a group with certain characteristics, such as the same mutations in their tumour. Federated analysis was then run across this joint cohort.
The anonymous, non-patient level results were combined in a secure environment known as a ‘Safe Haven,’ before being released to the researcher through an ‘Airlock’ which checked it was not possible to re-identify individuals from the results. A set of APIs (Application Programming Interfaces) that enable TREs to talk to each other were developed and tested. These APIs aligned with the Global Alliance for Genomic & Health (GA4GH) standard for interoperability and are open source, meaning they are available free of charge to the global research community.
Engaging the public
We aimed to fully embed patient and public involvement representatives at all levels of project governance from the outset. Our patient partner sat on the project board and technical meetings, and we held three sessions with public contributors over the eight months. These sessions focused on shaping thinking and governance structures and developing accessible messaging about federation for a public audience. Building on this learning, we co-wrote a Frequently Asked Questions document for the project.
“I have been delighted to be involved with this exciting project as a patient partner since the beginning. This has enabled me to better understand the potential that federation brings, both in terms of opportunities for researchers using health data and for patients and the public in terms of ensuring the safety of their data. Many patients and members of the public have worked in collaboration with the Sprint team to shape this project, both to the stage it is at now and with how it plans to move forward in the future. Involving us at this early stage will undoubtedly benefit both researchers and the wider public so that we can ensure the safe and fair use of health data to maximise improved outcomes for all.”
Rosanna Fennessy, Patient Partner
Impact of the project
This project has demonstrated a federation capability that, once optimised and rolled out, has the potential to remove the geographical, logistical, and financial barriers associated with moving exceptionally large datasets. It will heavily reduce the current time burden on researchers to conduct their analysis over integrated cohorts. Further, federation is a solution for Data Custodians to maximise security over their datasets, since data remains securely within the bounds and security firewalls of the TRE at all times.
For genomics research, the potential to undertake research across multiple datasets means access to much greater and more diverse data. Once optimised, federation will speed up both initial discoveries and the time it takes to validate results, meaning a reduction in the time it takes for translational research to make a difference to patient lives.