NPU Team Explores Multimodal Models for Medical Data

KeAi Communications Co., Ltd.

In recent years, the advancement of multimodal large language models (MLLMs) has increasingly demonstrated their potential in medical data mining. However, the diversity and heterogeneity nature of medical images and radiology reports can pose significant challenges to the universality of data mining methods.

To address these challenges, a team led by Dr. Xin Zhang from the Institute of Medical Research, Northwestern Polytechnical University in Xi'an, China, systematically evaluated the performance of Gemini and GPT-series models across various medical tasks.

"Our study encompasses 14 diverse medical datasets, spanning dermatology, radiology, dentistry, ophthalmology and endoscopy image categories, as well as radiology report datasets," shares Zhang. "The tasks evaluated include disease classification, lesion segmentation, anatomical localization, disease diagnosis and report generation."

The results reveal that the Gemini series excels in report generation and lesion detection, while the GPT series demonstrates strengths in lesion segmentation and anatomical localization.

"The study highlights the promise of these multimodal models in alleviating the burden on clinicians and fostering the integration of AI into clinical practice, potentially mitigating healthcare resource constraints," adds Zhang. "Nonetheless, further optimization and rigorous validation are required before clinical deployment.

The team published their findings in the KeAi journal Meta-Radiology.

By establishing benchmarks for the performance of multimodal AI systems, the team's efforts provide a foundation for the continued development and application of such technologies, as well as future research on the multimodal integration of medical imaging and textual analysis.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.