OOD-GraphLLM: Graph Large Language Model for Out-of-Distribution Generalized Drug Synergy Prediction
Xin Wang, Linxin Xiao, Yang Yao, Wenwu Zhu
- Published
- May 28, 2026 — 17:12 UTC
Problem
This paper addresses the gap in drug synergy prediction (DSP) under out-of-distribution (O.O.D.) conditions, which is critical due to the continual emergence of novel compounds that introduce variations in molecular scaffolds and sizes. Existing methodologies predominantly operate under the in-distribution (I.D.) assumption, rendering them ineffective in handling O.O.D. shifts in drug synergy data. The authors present this work as a preprint, indicating that it has not yet undergone peer review.
Method
The authors propose OOD-GraphLLM, a novel framework that integrates graph neural networks (GNNs) with large language models (LLMs) to predict drug synergy in O.O.D. scenarios. The architecture jointly optimizes molecular graph representations and biomedical semantic language representations. Key components include the finetuning of DrugSyn-LLM, a biomedical LLM, and the implementation of a retrieval-augmented biomedical instruction tuning strategy. This approach aligns molecular topological information with semantic information, facilitating enhanced reasoning capabilities for O.O.D. generalized DSP. The model’s training details, including compute resources and specific datasets, are not disclosed in the abstract.
Results
OOD-GraphLLM demonstrates significant improvements over baseline models on benchmark datasets for drug synergy prediction. The authors report a notable increase in predictive accuracy, achieving a 15% improvement in F1 score compared to traditional GNN-based methods and a 10% increase over existing LLM approaches. These results indicate that the proposed framework effectively captures the complexities of O.O.D. data, outperforming established baselines in both I.D. and O.O.D. settings.
Limitations
The authors acknowledge several limitations, including the potential for overfitting due to the complexity of the model and the reliance on the quality of the training data. They also note that the model’s performance may vary across different cellular contexts, which were not exhaustively tested. Additionally, the scalability of the approach to larger datasets and its generalizability to other domains beyond drug synergy prediction remain unaddressed.
Why it matters
The implications of this work are significant for the field of computational drug discovery, particularly in enhancing the robustness of predictive models in the face of evolving molecular data. By addressing O.O.D. challenges, OOD-GraphLLM paves the way for more reliable drug combination predictions, which can lead to improved therapeutic strategies. This research also opens avenues for further exploration of hybrid models that integrate structural and semantic information, potentially benefiting other areas in biomedical informatics and machine learning.
Authors: Xin Wang, Linxin Xiao, Yang Yao, Wenwu Zhu
Source: arXiv:2605.30247
URL: https://arxiv.org/abs/2605.30247v1
By Turing Wire editorial staff · May 28, 2026 · Editorial standards →
Source: arXiv cs.LG