Genetic test results can be hard to understand and interpret for people without a background in genetics. Investigators at Baylor College of Medicine's Human Genome Sequencing Center are studying whether an artificial intelligence (AI) assistant could be beneficial in answering questions about these results for patients and physicians. They developed a generative AI assistant trained on a knowledge base comprising the latest Clinical Pharmacogenetics Implementation Consortium (CPIC) data for statins and tested its accuracy against OpenAI's ChatGPT 3.5. The findings are published in JAMIA, the Journal of the American Medical Informatics Association.
"We created a chatbot that can provide guidance on general pharmacogenomic testing, dosage implications and the side effects of therapeutics and address patient concerns. We see this tool as a superpowered assistant that can increase accessibility and help both physicians and patients answer questions about genetic test results," said first author Mullai Murugan, director of software engineering and programming at the Human Genome Sequencing Center at Baylor.
The study focused on pharmacogenomic testing for statins, which indicate whether a person is genetically predisposed to have a better or worse response to different statin medications used to treat high cholesterol. To interpret these results, the Baylor researchers developed their own AI assistant. Despite the popularity of ChatGPT, the team knew the chatbot had a major limitation that would impact the accuracy of its responses.
"The training cutoff date for ChatGPT 3.5 is January 2022, so that system won't have access to any guidelines published after that date. It happens that the key publication on statin pharmacogenomics was published in May 2022," Murugan said.
The Baylor AI assistant uses Retrieval Augmented Generation (RAG) and is trained on a knowledge base of CPIC data and publications, which includes the most recent guidelines.
The team compared their AI assistant to ChatGPT 3.5 by giving both chatbots a set of questions designed to reflect typical inquiries from patients and healthcare providers. A panel of four experts in pharmacogenomics and cardiology judged the responses from both chatbots based on accuracy, relevancy, risk management, language clarity and other factors.
The Baylor AI assistant scored highest in accuracy and relevancy, and the largest gaps between the two chatbots were seen in questions from healthcare providers. In that category, Baylor's chatbot scored 85% in accuracy and 81% in relevancy compared to ChatGPT's 58% in accuracy and 62% in relevancy. Both chatbots scored similarly in language clarity.
Despite initial promising results, the researchers stress that this technology is not ready for clinical use. The model still struggles to recognize some biomedical terms that don't use typical words and characters. In addition, while the model is trained on pharmacogenomic data, it lacks training in typical language used by genetic counselors to explain results. Lastly, researchers emphasize a need to address ethical, regulatory and safety concerns before the tool can be used in a clinical setting.
"We are working to fine tune the chatbot to better respond to certain questions, and we want to get feedback from real patients," Murugan said. "Based on this study, it is very clear that there is a lot of potential here."
"This study underscores generative AI's potential for transforming healthcare provider support and patient accessibility to complex pharmacogenomic information," said senior author Dr. Richard Gibbs, director of the Human Genome Sequencing Center and Wofford Cain Chair and Professor of Molecular and Human Genetics at Baylor. "With further development, these tools could augment healthcare expertise, provider productivity and the promise of equitable precision medicine."
Other authors of this work are Bo Yuan, Eric Venner, Christie M. Ballantyne, Katherine M. Robinson, James C. Coons, Liwen Wang and Philip E. Empey. They are affiliated with one or more of the following institutions: Baylor College of Medicine, University of Pittsburgh and UPMC Presbyterian-Shadyside Hospital.
This work was partially funded by the National Institutes of Health's All of Us Research Program.