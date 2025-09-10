The first part of this post can be found here.

I concluded my last post by observing that my initial interactions with the research systems IT Manager and clinical research academic of a particular oxbridge university suggested that they appeared to be playing fast and loose with your private health information, and posited that this is a common theme seen over and over again with health records in the control of academics. Today I will explain…

Researchers love to write papers telling us how using patient records for post-clinical or secondary use research may breach patient privacy and confidentiality, that ethics committees weaken trust and do little to adequately protect individual patient’s values and interests, that it can be unethical, may expose patients to potential criminality such as doxing or identity theft, or risks damaging the doctor-patient relationship (for example: here, here and here). Yet, we see researchers abusing your and your children’s private health data and even specimens and test results all over the world - from exploiting disagreements and loopholes in the law to store and later access Newborn Screening heel prick test cards without consent in Australia and New Zealand, using incomplete coded versions of our health records to make broad, and potentially misleading claims regarding the existence of or extent of patient’s alleged opioid abuse, or allowing third party organisations to simply and unlawfully remove copies of patient’s health records for ingestion into overseas and potentially unregulated aggregate data collections.

If you ever wanted to understand why I believe health law, ethics, morality, and simple human decency are needed when individuals handle or have access to our medical records, and why I believe most of the highly political and profit-driven people in charge of that clinical data right now are little better than criminals, then I am more than pleased to give you this small primer.

Most people believe that privacy laws protect their health records. However, regulations like HIPAA in the United States and the UK’s Data Protection Act 2018, Health and Social Care Act 2012, NHS Act 2006 and Human Rights Act 1998 that all portend, or perhaps it should be pretend, to protect the privacy and confidentiality of our medical records and associated health data actually do more to enable the sharing of that data than prevent it. You see, what they do isn’t so much to lock away our private data in order to keep it out of the hands of people we might not want riffling through it. Rather, they ostensibly dictate how that patient data is supposed to be looked after and processed once collected. However, instead of granting your right to absolute privacy they actually define and prescribe a myriad of exceptions that enable patient information to be shared and, where these are insufficient, create pliant loopholes like ‘the public interest’ that can and have been bent to serve almost any cause celebre. They have enabled an incredibly large and profitable business selling your private health data: directly to everyone from defence contractors to the governments of other countries (£330-500m), international pharmaceutical companies (up to £330k per company), AI providers like Google DeepMind and to a multitude of academic and quasi-academic research projects (between £6-25k per use); and indirectly through companies like IQVIA who are disingenuously presented as a ‘privacy enhancing contractor to the NHS’ but whose primary business model is to acquire by any means and either repackage your health data for sale to other health data aggregators, or use it to create larger datasets of fake, synthetic health records (sometimes incorrectly described as Digital Twins - you will see in the slide below why they are NEVER twins) that they can tweak and manipulate to suit any research question and sell to academics.

Side note: Leaving aside that governments, healthcare providers and companies like IQVIA are making money from your personal health data, many academics use IQVIA’s synthetic patient data in research that has been used to make incredible (meaning: not credible) ‘new’ discoveries and inform decisions that affect your healthcare without declaring or sometimes even recognising its synthetic nature. If you want to understand synthetic health data, look no further than the slide below from my digital health technology course. Synthetic patients made from the health records of real patients are nothing more than mix-and-match collections of demographics, symptoms, treatments and outcomes and anything you might discover that appears ‘new’ is nothing more than an accident created in the way aspects of real patients are matched up by the engine that generates the synthetic patient. Clinical decisions affecting real patients should not be made on the basis of these synthetic health datasets.

In several small ways your ability to object to the sharing and sale of your health data has been gradually whittled away, first through changing opt in rules to opt out through the flick of a politician’s pen - making all health records shareable and saleable until you object, and later by removing from the legislation some of the grounds by which you could object. For example, the Health and Social Care Act 2012 simply removed parts of Section 10 of the Data Protection Act 1998, eliminating both the patient’s ability to object on specific grounds that included that releasing of or selling the record might cause significant harm or distress to the patient or another person, and the requirement for patients to be informed that their data was going to be shared or sold because that gave them the opportunity to object. And most people still don’t understand that the Data Protection Act 2018 never actually gave patients the right to object to the sharing or sale of their personal data in the first place. This became blatantly obvious during Covid when all pretence that the Data Protection Act, GDPR or even the Common Law Duty of Confidentiality acted to protect your privacy evaporated entirely so that private companies and researchers could be given broad access to the medical records of anyone whose data had any mention of covid or coronavirus-related health concerns. This even included people who had previously opted-out of sharing their medical records for any purpose. Many of these records ended up in the hands of researchers from UCL, King’s College London, Cambridge and Oxford - and aside from incredible publications like those of the ONS and Queen Mary University of London that claimed BAME were more likely to suffer from severe covid, and their being matched with Tim Spector’s ZOE app data to incredulously proclaim almost any symptom common to a broad range of regular ailments was a covid symptom, we have no idea what was done with most of these health records and by whom.

You would be mistaken if you think this is just happening in the UK or United States. This is happening everywhere. For example, Australians are expressly told that their My Health Record is never sold to third parties for commercial purposes, yet the framework that governs secondary use of the records actually allows the sale of health datasets for limited commercial use as long as the use could have a research or public health purpose - so selling a significant subset of health data to the app developer that customised Australia’s COVIDSafe Track and Trace App was a commercial sale that was allowed on the specious belief that it would have (but didn’t) a public health benefit. Nobody I approached seems to know for certain who at that Australian software company, the Singaporean company that designed the original underlying app code, or the webhost the government contracted to host and support the app, had access to that data or even where the data is today. It seems that the only people accessing COVIDSafe’s supposedly protected, encrypted and confidential personally identifiable health data that anyone cared about were the police, who defiantly refused to agree to stop accessing it even after they had been caught.

The biggest loophole that enables sharing and sale of your personal health data for secondary use purposes is something referred to under several, sometimes misapplied and easily misunderstood, terms - the most common being deidentification and anonymisation. If it’s anonymised, to our NHS and others, it becomes fair game.

Deidentification and anonymisation are often described as the removal or transformation of personally identifying information in the record - for example, deleting your name, removing your house number and street and leaving only the first four characters of your UK postcode, and removing your date of birth and replacing it with your age in years. While several deidentification documents (and even Google’s AI) will tell you that deidentification would remove information about rare medical conditions you might have, this almost never happens. Why? Because removing that data might affect how we interpret the rest of your medical record. For example, if we removed the information regarding a Systemic Lupus Erythematosus (SLE or lupus) diagnosis for a 24 year old female, we might be left asking why she has so many GP and rheumatoid specialist appointments when she doesn’t appear to have a rheumatic condition, or why she is taking an antimalarial and strong anti-inflammatory and expensive biologics drugs when she has never been to a malaria-prone country, has never been diagnosed with malaria, and doesn’t have an inflammatory or autoimmune disease diagnosis. Often, it is the rare medical conditions that provide informative context for the rest of the medical record - left lower lobectomy for lung cancer that might explain ongoing shortness of breath, scarring at the surgical site and a history of chemo- and radio-therapy; enucleation of the eye after a penetrating injury that might explain ongoing migraines and the need for maxillofacial prosthetics consults, orchidectomy of the right testes that might explain gynecomastia, hormone injections for low testosterone and what might look like an inguinal hernia scar. Remove that diagnostic context and the rest of the record can become entirely worthless.

So, if the health service wants to profit from your data... I guarantee you they don’t remove it.

However, it is often that data and something as simple as a google search that can turn the deidentified or anonymous health data into a re-identified health record. Healthcare providers and governments have been profiting from the sharing or sale of patient’s health records for over three decades. In 1997 while at Carnegie Mellon University, Professor Latanya Sweeney demonstrated using basic searches of electoral rolls, newspapers and the far less sophisticated Google of the day (BackRub at Stanford University’s URL google.stanford.edu) that re-identification of deidentified health records was a simple task. While a senior politician assured patients that their data had been anonymised and privacy was assured, Sweeney identified that politician’s health record from within tens of thousands of these anonymised records and had FedEx deliver a paper copy to his office. While Sweeney’s work led to significant change in HIPAA, that change which ostensibly was to protect patients privacy has been seen today to enable increased exploitation of patient data, political malfeasance and financial greed.

Having worked briefly inside one NHS Trust’s data team, I can tell you from experience that this type of research collective usually have significant funding from Pharma companies, the Wellcome Trust or the Bill and Melinda Gates Foundation that they use to ‘compensate’ the NHS Trust to identify and anonymise (read: purchase) fairly complete copies of the health records of patients that fit a particular study criteria - which can be anything from highly specific (everyone who had a particular type of pacemaker implanted for cardiac irregularity between March 2012 and January 2014) to highly generic (everyone who had this type of X-Ray/CT/MRI on or after 2007).

Let us return to the example of the health data these academic chaps that emailed me for the job interview ‘test’ are in control of. For the sake of argument and to give us a concrete example, let’s say that it is limited to basic demographics and health histories of patients who had, choosing something relevant at random, a CT or MRI of their skull - what we colloquially might call a brain scan. I am not saying that this is what it is, I am only reaching for an example that I can use to show why you might not want academic institutions and IT wonks in charge of your identifiable or even deidentified NHS health records data. So, in our fanciful example, let’s say these chaps have every brain scan that was done across the NHS for the last decade - and while that might encapsulate several tens of thousands of patient records, re-identifying one, as Professor Sweeney demonstrated, can be as simple as reading the newspaper.

For example: let’s say you know from a newspaper article that in March 2016 a 19 year old male died within 24 hours of admission to hospital with a traumatic brain injury that resulted from blunt force trauma involving self-administration of fall-risk-increasing-drugs (FRID) that caused him to fall down a flight of stairs. If the medical record has already been coded you might simply be searching for that one record that has the right combination of dates, location, age and ICD-10 codes. Perhaps: S09.90XA - initial encounter for unspecified head injury that may later be updated to or include S06 - intracranial injury; F16.xx - Mental and Behavioural Disorders due to use of or acute intoxication caused by hallucinogens; W19.xxxA - initial encounter for unspecified fall, which may later be updated to W10 - falls on or from stairs; and possibly R96.1 - death occurring less than 24 hours from onset of symptoms. But if the medical record is still clinician’s free text, this search remains markedly easier. Further, that patient would be easy to differentiate from, for example, this patient - who while also having a traumatic intracranial brain injury and being 19 years of age, would have other descriptors or codes related to being a passenger in a motor vehicle accident on a different date in October 2024 with death occuring after a 48 hour admission to hospital.

Studies have shown that on average one-quarter of all deidentified or anonymised health records can be re-identified (range: 1 - 79%), and that uniqueness rates can be almost absolute (80-90%) when working with as few as three matchable data points - meaning it may be possible with today’s more information-dense datasets, powerful computers and AI to re-identify 8-9 out of every 10 subject patients from their anonymised health records where we have independently established only three data points about the sought individual.

Oh... and what happens to academics like me who, even sometimes quietly, point these issues out? While regulators and politicians bury their heads in the sand and pretend it isn’t happening, or that it can be prevented with speeches and proposed legislation simply telling people not to do it, they either, as in my case, use censorship of publications and grant applications, prolonged academic probation or the expiration of short-term contracts to take away our livelihoods and make us unemployable elsewhere or, as in this Australian Melbourne University example, apply increasing pressure on our employers until we quit are constructively dismissed.

These secondary use people accessing your personal health information may view it as anything from a mere research tool, a financial resource, or an amusing curiosity, through to a potential source of information that can be used against you. I don’t know whether the two people at the Oxbridge university, research systems IT Manager and clinical researcher academic, were simply so disorganised that last minute interviews and incredible demands for rushed, highly specific and detailed long-form answers to information-poor questionnaires was a one off or, as I suspect based on almost two-decades of experience in IT environments, an established part of their day-to-day management style. I also don’t know whether today’s breed of IT Managers with their let’s throw everything in ‘the cloud’ and spin up ‘an app’ around it with ‘digital ID’ and 2FA without truly assessing all of the security, privacy, ethical and moral considerations should be allowed anywhere near our highly sensitive and confidential health data.

After all... Do you really want your or your family member’s health records in the hands of people who might be greedy, malicious, ethically challenged, capricious, clumsy or simply ignorant of the potential harm its accidental release, intentional sharing or profit-motivated sale might bring?

I know I don’t.