DARE UK Community groups

Community Group Outputs

Explore all community group outputs, including key resources, reports, and best practice guidelines shaping the future of sensitive data research.

Filter by:

Latest

Media

In this presentation, Jo Lam (co-chair of ITALO) introduces a scalable framework for the automated evaluation and benchmarking of data linkage equality, within the NHSE’s new proposed linkage pipeline.

This is from his first milestone presentation at his third week with NHS England PhD Internship Program.  Jo works within the NHS England Data Linkage Hub, lead by Giulia Mantovani (co-chair of ITALO).

As the NHS moves toward a single, authoritative patient record, automated data linkage plays a central role. But without robust evaluation, this process risks embedding systemic biases, especially for marginalised populations. This presentation introduces a transparent, automated framework to benchmark data linkage equality—ensuring linked data are not only accurate, but also fair and trustworthy.

The framework is structured around three interlinked work packages:

  1. Pre-Linkage Profiling:
    Systematic analysis of input data quality—completeness, identifier structure—to assess linkability and detect early bias risks. Outputs include data enrichment reports for owners and linkers to support traceability and audit.
  2. In-Linkage Model Diagnostics:
    Automated tools to evaluate model bias before thresholds are applied. This includes match weight analysis, blocking rule assessment, and interactive bias diagnostics. Enables transparency in strategy selection and allows deviation from a “one-size-fits-all” pipeline when needed.
  3. Post-Linkage Equity Reporting:
    After linkage, sample bias and subgroup-specific error patterns (e.g., by ethnicity or deprivation) are quantified. Outputs include equity dashboards, analyst guidance, and feedback loops to data owners to inform future improvements.

Core assumptions challenged:

  • That all individuals are represented in central systems (e.g., NHS Spine)
  • That input data are equally complete across groups
  • That linked entities reflect ground truth

Key considerations:

  • Trade-offs between precision and recall often mask subgroup disparities.
  • The pipeline must accommodate diverse user needs—data owners, linkers, analysts—while remaining explainable and auditable.

Tools and progress:

  • Synthetic testbeds and reproducible dashboards support open-access benchmarking.
  • Developed in alignment with international best practices (UK, Canada, Australia).
  • Actively integrated with NHS England’s federated linkage infrastructure and informed by the ITALO expert network.

This work delivers practical, automated tooling to detect, explain, and mitigate linkage bias—helping ensure the NHS’s data future is not only powerful, but also inclusive.

ITALO

Stay informed

Sign up to our community newsletter to receive the latest community updates, events, and opportunities in your inbox.