Artificial intelligence in health still needs human intelligence

The COVID-19 crisis has shown that AI can deliver benefits, but it has also exposed its limits, often related to having the right data.

Healthcare worker holding up planet
Photo by Ussama Azam on Unsplash
Jillian Oderkirk coauthored this blog post.

The year is 2029 and artificial intelligence (AI) has dramatically changed health care.

Like many vehicles in 2029, ambulances are autonomous and programmed to use the most efficient routes. In transit, emergency responders access the patient’s digital medical records in the cloud, which include their entire genome.

Smart medical devices and wearable sensors systematically send anonymous data to the hospital in real time to feed machine learning algorithms that provide support for each decision. Multi-disciplinary teams co-design health plans for patients with chronic conditions that take into account individual physiologies and preferences. Patients are coached on how to best use apps and mobile devices to monitor vital signs, while clinical teams receive alerts on deteriorating health or exposure to risk factors through automated remote consultation platforms. Alongside doctors and nurses, hospitals and clinics now have teams of computer scientists.

This futuristic vision is based on three case studies from The Topol Review, an independent report on how to prepare the health workforce to deliver a digital future. Commissioned by the government of the United Kingdom, it was published more than a year ago in February 2019. A few months into the worst pandemic in a century, are we any closer to the digital future described above? The answer is yes, but still very far.

In 2019, there were an average of 10 papers on AI and health published daily, with potential applications in virtually every aspect of health care planning and delivery.

Opportunities for AI in health grow, while expectations expose limits

Some applications of AI in health have generated new tools and methods, while others improved existing processes. Researchers at Clalit Research Institute, the largest health payer-provider network in Israel, developed models to identify patients at higher risk of mortality from COVID-19 and alerted them via text message early in the pandemic.

In South Africa, AI has been used to predict the likelihood that patients will follow treatment for HIV, so that staff can follow up with those at risk of discontinuing treatment.

In another example, an AI application in the United Kingdom has cut the time spent by medical professionals to perform clinical audits of patients’ records. When patients have unexpected outcomes, the application identifies recommendations for improvement. This takes humans 10 days whereas the application can do it in 6 seconds. What’s more, the AI recommendations are  more accurate and logically consistent.

Examples like those abound, yet the use of AI in everyday health care practice remains extremely limited. While the COVID-19 pandemic is showing that AI can deliver benefits in situations of crisis, it is also a reality check on the hype surrounding AI by exposing the limitations of existing applications in informing decisions in the real world, at speed and at scale.

The potential of AI is certainly rising.  The volume of electronic health data is expanding, both within and beyond the health sector, creating opportunities for AI to make health care more effective, efficient and equitable. There is also significant potential for reducing unwarranted variation in care, avoidable medical errors, inefficiencies and waste. But to realize all this, and ultimately achieve the futuristic vision detailed above, a number of far less exciting steps need to be taken, starting with managing the expectations around the use of AI in health.

The importance of explainability and human oversight: a cautionary tale

As a field, AI is evolving very rapidly, with more sophisticated techniques and approaches emerging every day. Change is happening so fast that statements in reports – perhaps even in this blog post – are rendered obsolete almost as soon as they are published. But, while promising new ideas keep appearing, such as meta-learning, most applications of AI in health stem from what is called artificial narrow intelligence, otherwise known as weak or applied AI. These systems use machine learning – from linear regressions to deep neural networks – to make predictions based on existing data.

For example, one such system might use past data from pneumonia patients to predict the risk of a new patient dying in the hospital, so that health workers can devise the best care plan well in advance. Such a system could take into account the underlying health conditions of past patients, such as asthma. A group of US researchers did just that and came to a counterintuitive conclusion: patients with asthma had a lower probability of dying than those without asthma.

Upon closer inspection of the results, the researchers found that the hospital where the data was from had a policy of placing pneumoniae patients with asthma directly in intensive care. Receiving more aggressive care substantially lowered their risk of dying.

The researchers knew that the model’s predictions were odd, so they were able to prevent care providers from using the model to “wrongly” triage patients with asthma. That is because the researchers had a prior understanding of how asthma is related to mortality, and they used a machine learning method that made it possible for them to detect what was driving the results.

Under different circumstances the results could’ve been disastrous. What if the researchers had no prior understanding or could not agree on one? What if they had used one of the many black-box machine learning methods that are commonly used today and are not as interpretable? The AI system could have caused more deaths by suggesting a lower level of care.

How intelligent is today’s artificial intelligence?

The term intelligence in AI is a bit of a misnomer. Under the hood of most applications is actually a purely statistical approach, most often based on correlation and not causation. So the predictions these techniques produce can only be as good as the data they’re based on.

In the real world, health data are not always available or easily shared. This hinders the development of AI models that accurately represent diverse populations with potentially very complex conditions. Having appropriate data is critical because decisions based on models with skewed or incomplete data can put patients at risk.

Most readers will have heard the expression “garbage in garbage out”, but who makes the call about the quality of the data? Humans do. Humans collect the data to begin with, they use codes, units and labels to organise and make sense of it, they aggregate and collate the data in tables (datasets), choosing to collate certain variables (asthma, for example) and not others (migraines, for example).

Simply collecting and formatting data can be problematic. Not all health care facilities, or departments within the same facility, will label the same data in the same way (e.g. milligram or mg), nor will all facilities collect the same data to begin with.

In exceptional times, like the COVID-19 pandemic, data may simply not exist. What’s more, they may not adequately represent people’s new behaviours. Even in unexceptional times, many individuals will not be represented in datasets because they face barriers in accessing care.

There are other reasons why data collected in one place and time might differ from data collected in another. For example, blood pressure measured from the same patient might differ when taken using different devices, at different times of the day or year. It can even vary when taken by different health workers. Now consider what this means in the context of a hopefully diverse and competitive market for wearable sensors and monitoring devices. Does one sensor work better than others?

Data bias and scarcity

If datasets lack information on all these sources of variability, then models trying to predict blood pressure will be biased, potentially in ways that may matter a lot (think asthma and pneumonia). This is an old challenge in social science and not specific to new AI techniques. Social scientists trying to attribute differences in outcomes to changes in policy have always struggled to include data on all sources of variability to eliminate alternative explanations for changes in outcomes.

Because most AI models build on correlations, predictions may not be transferable to different populations or settings. Some AI models could exacerbate existing inequalities and biases, especially as the AI industry is highly gender and race imbalanced. Because health professionals are already overwhelmed by other digital tools, there could be little capacity to catch errors and resistance to adopting new technologies.

Data that is by nature scarce is also particularly problematic for some machine learning techniques. The methods behind the recent advances in image and language processing require vast amounts of data. It is difficult to see how these methods could be used to facilitate precision medicine – the tailoring of health care plans to subgroups of patients. It is even more challenging to imagine these methods informing patient-specific tailoring according to physiology and preferences. And of course individuals who are not represented in the data will naturally not be offered tailored plans.

Number of AI-related scientific publications in health (by country, 1980-2019)

Number of relevant scientific publications in health, by country, from 1980 to 2019

Without proper stewardship and risk mitigation, AI in health may not be trustworthy

As with most discussions around the broader digital transformation in health care, the values, policies and institutions that steer how people use technology matter more than the technology itself. There is a real need for responsible stewardship of trustworthy AI, ensuring respect for human rights and democratic values.

That is why the  OECD and G20 AI Principles – and the accompanying recommendations – play a pivotal role. The futuristic vision outlined above is only worth achieving if it brings about more equitable, effective and efficient health systems. For that to happen, we need to co-create a set of basic conditions. The way forward requires an inclusive, informed and balanced discussion at an international level, with the broadest range of stakeholders and actors within and beyond AI.

As demonstrated above, the most urgent matter is to create the conditions where high-quality, representative, real-world data can be made available to AI researchers and developers. That data must secure and respect the privacy and autonomy of the individuals it represents. This requires strong health data governance, within and across countries, as detailed in the OECD Council Recommendation on Health Data Governance.

Next, government regulation and intervention should be proportional to risk. A bad algorithm that unnecessarily adds 10 minutes to your car ride leads to far less serious consequences than an algorithm that suggests less aggressive care when you have pneumonia and asthma. Given the challenges in pre-emptively getting risks right, one approach that has shown promise is the use of regulatory sandboxes which allow new models to be tested in live environments but with appropriate safeguards and ring-fencing of wider systems.

Third, while to date there is little to suggest that AI will replace humans in health care, some AI techniques will very likely fundamentally change human tasks, and as a result skills and responsibilities. The way health workers are educated, trained and socialised will need to adapt. This will most likely include working more closely with computer scientists and data specialists.

Fourth, preparing health systems to manage the risks in AI while making the most of it requires long-term strategic, coordinated and sustained investment. Public resources are and always will be scarce but  fiscal space must be found for these investments, and to avoid worsening existing digital divides.

Finally, while the OECD and G20 AI Principles provide a unifying vision for the future of AI, that vision is only the beginning. Operationalising the principles consistently across countries will require political capital and leadership. The Global Partnership on AI is a great first step.

Is it possible that technological breakthroughs will eliminate the need to address some of the challenges discussed here? Perhaps, but if history is any guide, the fate of AI in health is down to our own – very human – intelligence.

Jillian Oderkirk coauthored this blog post.

AI Wonk Dog
Sign up for OECD artificial intelligence newsletter

Fostering a digital ecosystem for AIHealthAI capabilitiesInnovationLabour MarketsSkills

Disclaimer: The opinions expressed and arguments employed herein are solely those of the authors and do not necessarily reflect the official views of the OECD or its member countries. The Organisation cannot be held responsible for possible violations of copyright resulting from the posting of any written material on this website/blog.

Sign up for OECD artificial intelligence newsletter