Artificial-intelligence tools such as ChatGPT have been touted for their promise to alleviate clinician workload by triaging patients, taking medical histories, and even providing preliminary diagnoses. These tools, known as large language models, are already being used by patients to make sense of their symptoms and medical test results.
But while these AI models perform impressively on standardized medical tests, how well do they fare in situations that more closely mimic the real world?
Not that great, according to the findings of a new study led by researchers at Harvard Medical School and Stanford University.
/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.