UK Biobank health data continues leaking on Alibaba despite removal efforts

500,000 UK Biobank volunteers' intimate health data exposed without consent, creating privacy violation and potential re-identification risks despite de-identification claims.
New listings will emerge. There have been additional listings posted since we became aware.
The science minister acknowledging that despite removal efforts, more of the leaked health data keeps appearing on Alibaba.

Half a million people who gave their most intimate biological information to science in an act of civic trust have found that trust tested in the most public of ways — their health records surfacing for sale on a Chinese e-commerce platform, again and again, despite official efforts to stop it. The breach of UK Biobank's de-identified volunteer data is not merely a cybersecurity failure; it is a rupture in the covenant between institutions and the people who believe in them. What makes this moment philosophically significant is not only that the data leaked, but that the protections meant to render it anonymous have been shown, by researchers, to be a kind of comfortable fiction. The question now is whether the systems built to hold human vulnerability can be rebuilt before the cost of that fiction falls entirely on those who had no say in it.

  • New listings of the stolen health records keep appearing on Alibaba even as government teams work with Chinese authorities to remove them — the breach is not a moment but a tide.
  • Researchers at Oxford's Internet Institute have already demonstrated that so-called de-identified records can be traced back to real individuals using only a birth date and a single medical procedure, collapsing the reassurances officials offered.
  • Three Chinese hospitals are believed to be behind the original postings, and while officials say no one purchased the data before removal, at least thirty separate breaches have struck UK Biobank in the past month alone.
  • A dataset of 96,000 volunteers accidentally uploaded by a Yale graduate student remains accessible online despite removal requests, drawing sharp parliamentary criticism that UK Biobank has been 'complacent' with the people who trusted it.
  • All access to UK Biobank data has been frozen, and Science Minister Patrick Vallance has called for a full overhaul of the data environment — a promise that 500,000 volunteers are now watching closely to see if it holds.

Half a million people who volunteered their health information to UK Biobank for medical research discovered last week that their data was being sold on Alibaba. What followed was worse than the initial breach: new listings kept appearing even as government teams worked to remove them.

Science Minister Patrick Vallance appeared before the House of Lords to deliver news no one had wanted to give. The data — technically stripped of names and addresses — continued surfacing on the Chinese platform. Vallance acknowledged the uncomfortable truth plainly: more listings would come. The volunteers had contributed their genetic information, medical histories, and test results in good faith, trusting that the data would serve legitimate research into heart disease, cancer, and dementia. That trust had not been kept.

Officials initially described the risk as low because the records lacked names and precise birth dates. But researchers at Oxford's Internet Institute had already dismantled that reassurance. Using only a volunteer's date of birth and a single operation record, they successfully re-identified an individual from a separate leaked UK Biobank dataset. De-identification, it turned out, was more theoretical than real. Vallance eventually acknowledged what researchers had already shown: in large datasets, triangulation toward identification is increasingly possible.

The breach surfaced through an anonymous whistleblower who alerted technology minister Ian Murray. Three Chinese hospitals appear to have been responsible for the postings. Officials believe the data was removed before anyone purchased it — a small relief in an otherwise serious situation. All access to UK Biobank data has since been frozen.

The Alibaba listings are only part of a larger pattern. In the past month, UK Biobank has faced at least thirty separate data breaches. A detailed dataset covering 96,000 volunteers, accidentally uploaded by a Yale graduate student, remains online despite removal requests. Chi Onwurah, chair of the Commons science committee, was direct: UK Biobank had been complacent with people who shared their most personal information and deserved better.

Vallance closed his statement with a call for a secure data environment — an admission that the current system has failed the very people whose generosity made the research possible.

Half a million people who volunteered their health information to UK Biobank for medical research woke up last week to discover their data was being sold on Alibaba. The breach itself was alarming enough. What came next was worse: more listings kept appearing.

Patrick Vallance, the government's science minister, stood before the House of Lords this week to deliver news that officials had hoped would not need delivering. Even as government teams worked with Chinese authorities to take down the original postings, new ones were going live. The data—stripped of names and addresses, technically "de-identified"—kept surfacing on the Chinese e-commerce platform. Vallance acknowledged what everyone in the room already understood: this would probably happen again. "New listings will emerge," he said flatly. "There have been additional listings posted since the government were made aware of the issue last week."

The volunteers who contributed their genetic information, their medical histories, their test results, had done so in good faith. They believed their data would be locked away, used only by legitimate researchers to unlock secrets about heart disease, cancer, dementia, and Covid-19. UK Biobank is genuinely unique—no other country has assembled such a comprehensive, detailed health dataset from half a million people. The research it enables matters. But the institution that holds this trust had failed to keep it.

The records themselves do not contain names or precise birth dates, which is why officials initially described the risk as low. Vallance repeated this assurance in his statement. But researchers at Oxford's Internet Institute had already proven the claim hollow. Using nothing more than a volunteer's date of birth and information about a single operation they'd had, they successfully re-identified an individual from a different leaked UK Biobank dataset. The de-identification, in other words, was theoretical. In practice, with enough data points, a person could be found. "It is increasingly possible to triangulate in large datasets and get close to identification," Vallance acknowledged, finally naming the thing everyone feared.

The breach itself came to light through an anonymous whistleblower who alerted the technology minister Ian Murray last Thursday. Three Chinese hospitals—Second Xiangya, China-Japan Union, and Beijing Chaoyang—appear to have been behind the postings. Officials believe no one actually purchased the data before it was removed, a small mercy in an otherwise grim situation. All access to UK Biobank data has been frozen while the organization tries to understand what happened and how to prevent it from happening again.

But the Alibaba listings are only part of the problem. In the past month alone, UK Biobank has had to respond to at least thirty separate data breaches. Some of that information—a detailed dataset on 96,000 volunteers, accidentally uploaded by a Yale graduate student—remains online despite requests for removal. Chi Onwurah, who chairs the Commons science, innovation and technology committee, did not mince words: "I'm astounded that that data is still available online. UK Biobank have been complacent about the half a million British people who have shared their most intimate and personal data with them and who deserve better than this."

Vallance ended his statement by calling for a "secure data environment" to be built and implemented, a recognition that the current system has failed. The volunteers who made this research possible—who trusted an institution with their genetic code, their diagnoses, their vulnerabilities—are now waiting to see whether that promise means anything at all.

New listings will emerge – there have been additional listings posted since the government were made aware of the issue last week – and we continue to work with the Chinese government to remove them quickly.
— Patrick Vallance, science minister
I'm astounded that that data is still available online. UK Biobank have been complacent about the half a million British people who have shared their most intimate and personal data with them and who deserve better than this.
— Chi Onwurah, chair of Commons science, innovation and technology committee
The Hearth Conversation Another angle on the story
Inventor

Why does it matter that the data is de-identified if people can still be found?

Model

Because de-identification creates a false sense of security. It lets institutions claim they've protected privacy while actually just making identification slightly harder. Once someone knows your birth date and that you had, say, a knee surgery, they can cross-reference that against public records and find you. The de-identification is a legal shield, not a real one.

Inventor

How did this data end up on Alibaba in the first place?

Model

That's still being investigated, but three Chinese hospitals appear to have posted it. Whether they stole it, received it from someone inside UK Biobank, or got it through some other route—that's not yet clear. What's clear is that once it was there, it kept reappearing even after removal.

Inventor

The government says they're working with Chinese officials. How much leverage do they actually have?

Model

Not much, probably. You can ask another government to take things down, and they might comply once or twice. But if the data is already copied and redistributed, you're fighting a losing battle. The real problem is that the data shouldn't have been vulnerable in the first place.

Inventor

What do the 500,000 volunteers do now?

Model

They wait. They've been told their data was de-identified, which is technically true but not reassuring. They've been told the risk of re-identification is low, which researchers have already disproven. They've been told a secure system is coming. But their data is already out there, and they have no way to take it back.

Inventor

Is UK Biobank going to survive this?

Model

Probably, because the research it enables is genuinely valuable and there's no alternative. But trust is harder to rebuild than systems. The next person asked to volunteer their health data will remember this. They'll think twice.

Contact Us FAQ