AI Model Predicts Cancer Genes With Graph Learning

Chinese Academy of Sciences

The World Health Organization reports a steady increase in cancer patients worldwide, marking it as a major health threat. Preventing and treating cancer has become a global priority, with identifying cancer-driver genes being essential for understanding its development and advancing personalized therapies. However, current methods struggle with generalizability and interpretability, limiting their effectiveness across different cancer types and populations.

To address this issue, the research team from the Xinjiang Institute of Physics and Chemistry of the Chinese Academy of Sciences (CAS), in collaboration with other experts, proposed a graph machine learning model, namely TREE, based on the Transformer framework. With this novel Transformer-based architecture, TREE not only identifies the most influential omics data type but also detects the most representative network paths involved in regulating genes that drive cancer formation and progression. This work was published in Nature Biomedical Engineering.

The researchers found that training TREE on subgraphs sampled from local structures enables efficient node-level representation learning while significantly reducing computational resource requirements. Unlike traditional Transformer architectures, TREE incorporates graph structural information from biological networks into its input. It also integrates position embeddings derived from node degree information with multi-omics features of nodes. Moreover, TREE employs a co-attention mechanism, where global structural encodings of nodes, learned from network distance, guide the calculation of attention weights. This design enhances the model's ability to capture complex relationships within biological systems.

By incorporating multi-omics data from genes and other biological molecules, along with structural information from both homogeneous and heterogeneous biological networks, the model significantly improves prediction accuracy for cancer driver genes. This advancement enables more precise identification of genes closely associated with cancer progression, which is essential for developing personalized treatment strategies.

Moreover, the model's strengths in integrating multi-omics data and complex network analysis equip it with applicability across diseases and disciplines. This research exemplifies the advanced integration of artificial intelligence with biomedical engineering, offering innovative solutions to the challenges posed by cancer.

This work was supported by the National Natural Science Foundation of China, Xinjiang Uygur Autonomous Region, and CAS.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.