Next-Gen Catalysts
The Data Matryoshka: Progressive Synthetic OMOP Data Layers for Secure Health Research Collaboration Across UK TREs
The Data Matryoshka project is creating a layered pathway using synthetic data externally to connect to real NHS data internally, accelerating research while maintaining public trust.
Health data access is often governed through rigid models that limit exploratory research, even where lower-risk alternatives could be used safely. This slows innovations and prevents researchers from testing methods of assessing feasibility before applying for access to real patient data.
The Data Matryoshka introduces a layered approach to synthetic data generation, progressing from openly available synthetic data to controlled access to real datasets. Each layer supports different stages of the research lifecycle while maintaining clear, proportionate governance. This project targets the OMOP (Observational Medical Outcomes Partnership) common data model to maximise data-sharing opportunities.
Building on existing NHS deployments, the project will develop reusable workflows, governance frameworks, and training materials. These outputs will help organisations adopt transparent, scalable approaches to data access that balance innovation with accountability.
Public involvement and engagement is embedded through collaboration with the UCLH NHS Data Trust Committee. Patients and community representatives will contribute to governance design, review demonstrator datasets, and shape accessible materials explaining benefits, risks, and safeguards. Oversight structures will extend beyond the project’s lifetime.
By the end of the project, The Data Matryoshka will:
- Deliver layered synthetic-to-real data workflows using OMOP
- Produce transparent governance frameworks aligned with NHS Data Trust principles
- Develop public-facing and professional training materials
- Establish sustainable community oversight mechanisms
Project information
Lead organisation: University College London
Principal investigator: Steve Harris
Project duration: 12 months
Project partners: Alan Turing Institute, Synthetic Data UK, OneLondon, UCLH NHS Foundation Trust, GOSH
Funding provided: £308,854
Primary contact email: steve.harris@ucl.ac.uk
GET IN TOUCH
If you’re interested in learning more about our work, how it can benefit you, or how to get involved, click the button to get in touch with us using our contact form.