August 1, 2023

Are synthetic health data ‘personal data’?

In this article, Dr Colin Mitchell and Dr Elizabeth Redrup Hill, colleagues at PHG Foundation (a University of Cambridge charity), discuss their report on the concept and status of synthetic health data under data protection law, providing recommendations for developers, researchers, regulators, and policymakers to balance privacy protection and research potential.

Synthetic data—artificial data that closely mimic the properties and relationships of real data—are not a new concept but technological advances have led to great optimism about their potential for health research and innovation. However, generating synthetic health data from real patient data has led developers and regulators to question the extent to which they may remain ‘personal data’, governed by data protection law.

Our new report, Are synthetic health data ‘personal data’?, was independently commissioned by the MHRA to assess the status of synthetic health data under data protection law. We evaluated the current legal framework (the UK and EU GDPR), regulatory guidance and latest legal commentary to assess whether—or in what circumstances—synthetic health data might be considered ‘personal data’.

We found that regulators and legal commentators currently approach synthetic data with caution. The default is to presume that synthetic data generated from real patient data are ‘personal data’ unless it can be shown that risk of identification has been reduced to remote levels. While this safeguards privacy, it may limit research, testing and ultimately translation into patient care.

As a consequence we make three main recommendations for synthetic data developers, researchers, regulators and policymakers:

  1. synthetic data developers and users should continue to follow best practice in relation to data protection impact assessments and anonymisation in assessing the identifiability and other data protection risks arising from processing.
  2. synthetic data developers, researchers, regulators and policymakers should seek to achieve greater clarity, and reach consensus on:
    • appropriate standards and approaches to assessing identifiability of specific synthetic data generation methods, utilising quantitative metrics as far as possible;
    • whether the default for regulating certain forms of synthetic data and synthetic data generation should change from presumptively ‘personal data’ to a more proportionate approach that allows for some synthetic data to be classified as non-personal data based on an assessment of risk by data controllers.
  3. as synthetic data generation and other forms of AI-driven processing for health purposes gain pace, regulators and policymakers should prioritise determining what form of regulation is appropriate for this sector and how it fits within the overall regulatory framework.

The full report is available to download here.

Please note that this report is intended to provide general information and understanding of the legal framework. It should not be considered legal advice, nor used as a substitute for seeking qualified legal advice.

Source: Are synthetic health data ‘personal data’? (