Comprehensive Evaluation Of Large Language Models In Mining Gene Relations And Pathway Knowledge

Higher Education Press

Understanding complex biological pathways, such as gene-gene interactions and gene regulatory networks, is crucial for exploring disease mechanisms and advancing drug development. However, manual literature curation of these pathways cannot keep pace with the exponential growth of discoveries. Large-scale language models (LLMs) trained on extensive text corpora contain rich biological information and can be leveraged as a biological knowledge graph for pathway curation.

Recently, Quantitative Biology published a study titled "A Comprehensive Evaluation of Large Language Models in Mining Gene Relations and Pathway Knowledge." This research assesses 21 large language models (LLMs), including both API-based and open-source models, in their ability to retrieve biological knowledge. The evaluation focuses on predicting gene regulatory relations (activation, inhibition, and phosphorylation) and identifying gene components in pathways, using the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway as the ground truth, as illustrated in Figure 1.

The results reveal a significant disparity in model performance, with API-based models outperforming their open-source counterparts. The findings suggest that while LLMs are informative in gene network analysis and pathway mapping, their effectiveness varies, necessitating careful model selection. GPT-4 and Claude-Pro emerged as top performers in predicting gene regulatory relations, achieving higher precision and recall rates than other models. This study underscores the importance of selecting appropriate computational tools for specific tasks in biological research. It also provides a case study illustrating the use of LLMs as knowledge graphs for data mining in general.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.