One in five UK doctors use a generative artificial intelligence (GenAI) tool - such as OpenAI's ChatGPT or Google's Gemini - to assist with clinical practice. This is according to a recent survey of around 1,000 GPs.
Doctors reported using GenAI to generate documentation after appointments, help make clinical decisions and provide information to patients - such as comprehensible discharge summaries and treatment plans.
Considering the hype around artificial intelligence coupled with the challenges health systems are facing, it's no surprise doctors and policymakers alike see AI as key in modernising and transforming our health services .
But GenAI is a recent innovation that fundamentally challenges how we think about patient safety. There's still much we need to know about GenAI before it can be used safely in everyday clinical practice.
The problems with GenAI
Traditionally, AI applications have been developed to perform a very specific task. For example, deep learning neural networks have been used for classification in imaging and diagnostics. Such systems prove effective in analysing mammograms to aid in breast cancer screening .
But GenAI is not trained to perform a narrowly defined task. These technologies are based on so-called foundation models , which have generic capabilities. This means they can generate text, pixels, audio or even a combination of these.
These capabilities are then fine-tuned for different applications - such as answering user queries, producing code or creating images. The possibilities for interacting with this type of AI appear to be limited only by the user's imagination.
Crucially, because the technology has not been developed for use in a specific context or to be used for a specific purpose, we don't actually know how doctors can use it safely. This is just one reason why GenAI isn't suited for widespread use in healthcare just yet.
Another problem in using GenAI in healthcare is the well documented phenomenon of "hallucinations". Hallucinations are nonsensical or untruthful outputs based on the input that has been provided.
Hallucinations have been studied in the context of having GenAI create summaries of text. One study found various GenAI tools produced outputs that made incorrect links based on what was said in the text, or summaries included information that wasn't even referred to in the text.
Hallucinations occur because GenAI works on the principle of likelihood - such as predicting which word will follow in a given context - rather than being based on "understanding" in a human sense. This means GenAI-produced outputs are plausible rather than necessarily truthful .
This plausibility is another reason it's too soon to safely use GenAI in routine medical practice.
Imagine a GenAI tool that listens in on a patient's consultation and then produces an electronic summary note. On one hand, this frees up the GP or nurse to better engage with their patient. But on the other hand, the GenAI could potentially produce notes based on what it thinks may be plausible.
For instance, the GenAI summary might change the frequency or severity of the patient's symptoms, add symptoms the patient never complained about or include information the patient or doctor never mentioned.
Doctors and nurses would need to do an eagle-eyed proofread of any AI-generated notes and have excellent memory to distinguish the factual information from the plausible - but made-up - information.
This might be fine in a traditional family doctor setting, where the GP knows the patient well enough to identify inaccuracies. But in our fragmented health system , where patients are often seen by different healthcare workers, any inaccuracies in the patient's notes could pose significant risks to their health - including delays, improper treatment and misdiagnosis.
The risks associated with hallucinations are significant. But it's worth noting researchers and developers are currently working on reducing the likelihood of hallucinations .
Patient safety
Another reason it's too soon to use GenAI in healthcare is because patient safety depends on interactions with the AI to determine how well it works in a certain context and setting - looking at how the technology works with people, how it fits with rules and pressures and the culture and priorities within a larger health system. Such a systems perspective would determine if the use of GenAI is safe.
But because GenAI isn't designed for a specific use, this means it's adaptable and can be used in ways we can't fully predict. On top of this, developers are regularly updating their technology, adding new generic capabilities that alter the behaviour of the GenAI application.
Furthermore, harm could occur even if the technology appears to work safely and as intended - again, depending on context of use.
For example, introducing GenAI conversational agents for triaging could affect different patients' willingness to engage with the healthcare system. Patients with lower digital literacy, people whose first language isn't English and non-verbal patients may find GenAI difficult to use. So while the technology may "work" in principle, this could still contribute to harm if the technology wasn't working equally for all users.
The point here is that such risks with GenAI are much harder to anticipate upfront through traditional safety analysis approaches. These are concerned with understanding how a failure in the technology might cause harm in specific contexts. Healthcare could benefit tremendously from the adoption of GenAI and other AI tools.
But before these technologies can be used in healthcare more broadly, safety assurance and regulation will need to become more responsive to developments in where and how these technologies are used.
It's also necessary for developers of GenAI tools and regulators to work with the communities using these technologies to develop tools that can be used regularly and safely in clinical practice.