22.11.2018

Science, health and profit

Exaggerated claims for artificial intelligence are being used to further the government’s agenda of privatised healthcare, says James Linney

Artificial Intelligence (AI), we are told, is about to change forever the way diseases are diagnosed and treated. AI smartphone apps are now apparently sophisticated enough to more accurately diagnose disease than doctors - leading many to claim that they are the beginning of the end of the medical profession. As we will see, however, like many claims about AI, this is to enormously exaggerate our current technological capabilities - what actually lies behind most of the wild enthusiasm for AI in healthcare is the hyperbole of the tech companies trying desperately to be market leaders.

To be fair, there is something quite seductive in the idea of AI becoming the future of medical diagnosis - imagine a pristine, shiny, metallic surgery, with a futuristic, robotic arm swirling around a patient, who is suspended in some kind of liquid medium, whilst a super-sophisticated scanning laser produces in seconds a 100% accurate diagnosis of any disease. Granted, I have been watching too many sci-fi films, but this fantasy is all the more alluring, because it is so far removed from the level of technology often on display in our general practitioner surgeries in the national health service.

The GP surgery where I work, for example, has computers that are nearly eight years old - I have my suspicions that the printers are at least a decade older. Medical equipment most commonly being used, such as stethoscopes or otoscopes and ophthalmoscopes (for looking in ears and eyes), use technology that has not really changed in several decades. Embarrassingly, one the of most common ways I share urgent information with hospital colleagues is via a fax machine, for crying out loud - a technology that is really only a modest update on telegraph machines. What is more, the NHS’s recent attempts to modernise have not gone well; for example, the infamous £11 billion system update of information technology, which was supposed to result in the compatibility of GP and hospital computers, allowing the sharing of patient clinic notes in real time, was abandoned in 2011, meaning it was one of the biggest IT failures ever seen.

So, with its lack of 21st century IT and huge staff shortages,¹ you could be forgiven for thinking that new AI technologies might be exactly what the NHS needs. Amongst the newest of these are medical advice and diagnosis systems, using smart-phone apps, known as computerised diagnostic decision support (CDDS) programmes or ‘chatbots’. Essentially you have a WhatsApp-style conversation with a bot, which asks a series of question about your symptoms. Using algorithms and access to digitalised medical textbooks, the app is able to come up with a list of possible causes for your symptoms and can use this to advise on your next best course of action. That may range from taking simple, over-the-counter pain-killers, or booking an appointment with your GP (within X number of days), to immediately calling 999. Not quite the futuristic robot clinics of my imagination admittedly, but could they have the potential to reduce needless GP visits and cut waiting times?

Babylon Health is the leading provider of these new technologies in the UK and has a growing international presence. Its none-too-modest aim, according to its website, is to “put accessible and affordable healthcare in the hands of every person on earth”.² It provides private GP services by video link and has corporate contracts with Bupa, Sky and Boots. Its chief executive, Ali Parsa, was previously a director at Goldman Sachs and was one of the founders of the private health company, Circle. You might have heard of Circle, since it was the first private firm to win a contract to run an NHS hospital (Hinchingbrooke Hospital in Huntingdon) in 2011. But Circle abandoned this 10-year contract in 2015 and the failing hospital had to be taken into special measures, leaving the local trust to clean up the mess.

This does not seem to have deterred Ali Parsa though and he once again has his sights set on taking over a piece of the NHS. In 2017 Babylon won a contract to provide its AI smartphone app (rebranded as ‘GP at Hand’), in partnership with a Fulham-based GP practice, to NHS patients. This small practice quickly went from serving a few thousand patients to signing up over 30,000, the vast majority using the online service alone and never setting foot inside the surgery. Babylon has also recently won a contract in north London - meaning its AI chatbot now provides NHS 111 advice to more than one million people. And Babylon’s future looks even brighter thanks to Matt Hancock, the new health secretary. In September he was guest speaker at a Babylon PR event, where he said that its services should be available nationally.

‘Brilliant,’ I hear you say - instead of the two or three week wait to see a GP, we can now have nearly instant medical advice in the palm of our hand; and obviously such technology, which is already issuing advice to patients and has been endorsed by the department of health, must have been through all kinds of major, stringent tests and have volumes of studies proving its reliability … Er, not quite.

Scientific?

The question of the accuracy of Babylon’s chatbot was the subject of a recent BBC Horizon programme, entitled ‘Diagnosis on demand? The computer will see you now’.³ The programme documented the presentation of what Babylon claims is an “independent study” to prove the accuracy if its app, compared to doctors. This involved comparing the safe triage advice in relation to 100 medical test questions given by seven doctors to that of Babylon’s chatbot, plus the performance of the chatbot against doctors in sitting a series of clinical ‘vignettes’. Vignettes are role-play scenarios used in medical exams, where either a real patient or a trained actor presents symptoms to a doctor, who has to provide a diagnosis and treatment advice. All the questions and vignettes were taken from part of an exam GPs have to sit in order to become fully qualified.

According to Babylon, the results show that for diagnostic accuracy its AI system matched the real doctors (with both scoring 80% accuracy) and that its AI gave safer medical triage advice, compared to doctors (97% vs 93%). Ali Parsa claims that the results prove Babylon has demonstrated diagnostic ability that is “on par with human doctors”.

The BBC programme - in line with the way in which most media present scientific evidence - did not critically analyse the study in any way. Throughout the programme the presenter was well and truly embedded in the Babylon camp and allowed Ali Parsa to present the test, as though it was a reliable piece of academic research - Babylon has even written up the results in a mock-up, clinical study-style way,⁴ trying to further give the impression that this is ‘proper science’. Here we are told that the accuracy of the AI is “comparable to human doctors (in terms of precision and recall)”:

In addition, we found that the triage advice recommended by the AI system was, on average, safer than that of human doctors, when compared to the ranges of acceptable triage provided by independent expert judges.

This test is, however, is in no way reliable for assessing how safe the AI is at diagnosing and advising on disease. The ‘research paper’ has not been published in any journal, has never been peer-reviewed and does not abide by any of the usual clinical trial publishing criteria. For example, published clinical papers have to declare any conflict of interest the authors may have, which could result in bias, but this paper does not. Yet five of the six authors are current or past Babylon employees, while the sixth is its owner (Parsa)!

The methodology of the ‘study’ is also incredibly weak. For a start the vignettes were not presented to the AI chatbot by real patients or independent actors. Instead they were presented by doctors employed by Babylon and the data was then entered into the app by another Babylon employee. This allowed for the symptom data to be entered in a way that the chatbot would be easily able to understand. Secondly, the vignette cases were not chosen at random: Babylon chose the role-play cases, again allowing it to select those it knew the AI system could answer. Essentially what Babylon has done is pick its own homework questions and then mark them itself.

Additionally, the comparison of doctor and chatbot safe triage advice was weighted to favour Babylon: to score a point the chatbot would need to have the correct diagnosis in its top three most likely differential diagnoses, while the doctor had to come up with one single diagnosis. Another significant limitation of the study is the small number of doctors involved: of the seven chosen by Babylon, one seemed to be a significantly poor performing outlier - if his scores are removed, the doctors’ average beats the AI score. We also have no information about how Babylon chose the doctors used, whether they have any conflict of interest or how experienced they are. As one GP reviewer of the study noted,

No statistical testing is done to check if the differences reported are likely due to chance variation. A statistically rigorous study would estimate the likely effect size and use that to determine the sample size needed to detect a difference between machine and human.⁵

Despite all of Ali Parsa’s bold claims and the fanfare that this study received in the media, it is not really a piece of research at all. Instead it equates to a very well engineered PR stunt. We still have no idea how well Babylon’s AI app performs in the real world. Yet, despite this, the company continues to expand its NHS adventures and now runs five GP surgeries in London - their patient list has grown from around 4,700 to 36,555. Not surprisingly most of these patients are younger and fitter than the average GP attendee - 73% are aged 20-34.⁶ By cherry-picking such patients from other practices in the area, Babylon gets paid more, but has to pay out relatively little in caring for them. Its website specifically states that people who are frail or elderly or with complex physical or mental health problems, along with pregnant women, should not register with its app. The result is to create a two-tier GP surgery system.⁷

Not for profit

Let me be clear: I cite the Babylon case not to prove that there is no role for AI in healthcare. There are many exciting roles for it being developed in medical diagnosis - for example, in cancer diagnosis there are AI systems that have been shown to be able to spot melanoma skin cancers with as much accuracy as a dermatologist.⁸ Such use of AI to augment a doctor’s diagnoses could be very useful, but any technology employed should have to prove its safety and reliability. That means high-quality evidence and trials, presented in a transparent way.

However, the Babylon case does show that tech companies will always resist this process - partly because they want to keep their intellectual property a secret, but also in order to maximise their profits before other companies muscle in on the market. They need to get their product out fast, before their technology becomes outdated.

What is abundantly clear is that the rushed, dangerous way that AI has been introduced into the NHS has not been done to help augment and improve the health service: it has been done to further catalyse the exposure of the NHS to private markets. The fantasy of the sci-fi-inspired, infallible AI diagnostic robot surgery that I imagined at the beginning of my article turns out to be just that: a fantasy - one that companies like Babylon aim to propagate. The reality of the AI technology like ‘GP at Hand’ is that it is only as reliable and safe as the companies developing it. I am sure that Babylon has employed competent scientists working in AI health, some of whom may be very well-meaning, but that does not negate its own raison d’être: to make as much profit as possible in as short a time as possible.

We must resist all attempts to privatise the NHS, even ones masquerading as ‘innovation’. We must be wary of tech companies which may be prepared to put people’s health and lives at risk in the name of profit. But we must also demand an NHS that is adequately funded, so that it can be both fully staffed and have at its disposal technology of proven reliability that will help diagnose, treat and improve the care of its patients.

Notes

1. www.telegraph.co.uk/news/2018/11/15/nhs-staff-ortages-could-triple-decade-think-tanks-warn.

2. www.babylonhealth.com/about.

3. www.bbc.co.uk/programmes/b0bqjq0q.

4. https://arxiv.org/abs/1806.10698.

5. https://coiera.com/2018/06/29/paper-review-the-babylon-chatbot.

6. www.pulsetoday.co.uk/news/gp-topics/it/nhs-lifts-clinical-restrictions-on-patients-eligible-to-join-babylons-gp-at-hand/20037800.article.

7. https://support.gpathand.nhs.uk/hc/en-us/articles/115003670889-Can-anyone-register-.

8. www.theguardian.com/technology/2018/jun/10/artificial-intelligence-cancer-detectors-the-five.