Compounds and proteins are the two most fundamental entities in drug discovery. Modeling their interactions is crucial for drug discovery. Although no universal computational method currently exists to predict and explain all compound-protein interactions (CPIs), researchers can contribute to building a comprehensive CPI map by leveraging various biological data from different perspectives.
Recent advances in high-throughput transcriptomic screening have opened up new avenues for drug discovery. Perturbation transcriptomics, which reflects cellular transcriptomic responses to perturbations, links key entities (compounds) with omics data. This approach provides direct results of how compounds affect subjects (single cells, cell lines, patients), offering a fresh perspective for decoupling CPIs.
In a study published in Cell Genomics, a research team led by ZHENG Mingyue, ZHANG Sulin, and LI Xutong from the Shanghai Institute of Materia Medica (SIMM) of the Chinese Academy of Sciences, developed an artificial intelligence (AI) tool called PertKGE, which deconvolutes CPI from perturbation transcriptomics using knowledge graph embedding.
PertKGE is built upon existing biomedical knowledge graphs but employs a novel strategy. The key innovation lies in constructing a biologically meaningful knowledge graph that breaks down genes into DNAs, messenger RNAs (mRNAs), long non-coding RNAs (lncRNAs), microRNAs (miRNAs), transcription factors (TFs), RNA-binding proteins (RBPs), and other protein-coding genes.
This strategy allows PertKGE to capture various fine-grained interactions between genes, simulating post-transcriptional and post-translational regulatory events in biological systems. PertKGE then uses the knowledge graph embedding method DistMult to project all entities into a semantically rich hidden space, enabling deconvolution of CPIs from perturbation transcriptomics.
Researchers extensively evaluated existing approaches under two critical "cold-start" settings: inferring binding targets for new compounds and conducting virtual ligand screening for new targets. PertKGE outperformed all traditional approaches and deep learning approaches. The pivotal role of incorporating multi-level regulatory events in mitigating representational biases was also demonstrated.
Notably, combining PertKGE with phenotype-based and target-based drug discovery led to two significant findings. The first was the identification of ectonucleotide pyrophosphatase/phosphodiesterase-1 (ENPP1) as a target responsible for the unique anti-tumor immunotherapy effect of tankyrase inhibitor K-756. The second was the discovery of five novel hits targeting the emerging cancer therapeutic target, aldehyde dehydrogenase 1B1.
These findings strongly suggested that PertKGE can help pharmacologists accelerate drug discovery. Looking ahead, PertKGE is expected to integrate more regulatory events, further enhancing its predictive performance and expanding its application to the analysis of other perturbation omics data.