BioCOMPASS: Integrating Biomarkers into Transformer-Based Immunotherapy Response Prediction

Title: Aligning biomarkers with transformer representations to improve immunotherapy response prediction

Intro

I read BioCOMPASS with interest because it tries to solve a very practical problem I see all the time: models for immunotherapy response collapse when moved to new cohorts. The paper builds on COMPASS, a transformer-based architecture for molecular data, and proposes integrating biomarkers and treatment information by adding loss terms that nudge the model's internal representations to agree with known biomarkers and pathways. That is a sensible, pragmatic idea. It is also exactly the kind of incremental step we need, not a headline grab.

Technical summary

The core idea in BioCOMPASS is to keep the transformer model and its self-supervised pretraining, but to add auxiliary loss components that align intermediate representations with external biological and clinical signals instead of directly appending those signals to the input. The authors report two main components that helped generalisation in their experiments: a treatment gating loss and a pathway consistency loss. Treatment gating appears to condition or modulate representations by treatment information so the model separates treatment-specific effects from biology. Pathway consistency penalises representations that break expected relationships among genes that belong to the same biological pathway.

They evaluate generalisation with Leave-one-cohort-out, Leave-one-cancer-type-out and Leave-one-treatment-out strategies, which is the right direction because many prior papers report high accuracy but only on held-out samples from the same cohorts. According to the paper, these auxiliary losses improved performance over the base transformer and over threshold-based biomarkers across those cross-validation schemes.

My take

I like the direction. For clinical problems like immunotherapy response, the data are small, heterogeneous and full of batch effects. Forcing model representations to respect orthogonal, domain-derived signals is a reasonable way to make the model care about biology instead of overfitting to cohort-specific noise. The decision to use loss alignment rather than feeding biomarkers directly is pragmatic. It avoids some forms of missingness and lets the model learn representations that are biomarker-aware without relying on biomarker availability at inference time.

That said, several things matter before calling this approach a practical solution.

First, the quality and availability of biomarkers. The paper assumes curated biomarkers and pathway maps that are correct and available for the cohorts used. In real clinics, biomarker measurement is noisy, often incomplete, and varies by assay and lab. If the auxiliary loss targets are themselves biased by measurement platform or by cohort-specific thresholds, the model can end up learning those biases instead of biology. In other words, you can regularise toward the wrong target.

Second, the danger of implicit shortcuts. When you add losses tied to known biomarkers, you reduce some forms of overfitting, but you can introduce others. For example, if treatment gating is implemented using coarse treatment categories, that gating could simply allow the model to memorise cohort-treatment combinations rather than disentangling biology from therapy effects. The paper shows improvements under Leave-one-treatment-out, which is encouraging, but I want to see analyses where the gating is forced to generalise to completely new treatment subclasses and across sequencing platforms.

Third, small data and hyperparameters remain a headache. Auxiliary losses add hyperparameters and optimization complexity. With limited samples, tuning loss weights can lead to cherry-picked settings that look good in cross-validation but fail in prospective use. The authors acknowledge this implicitly, but the paper would be stronger with ablations showing sensitivity to loss weights and to noise in the biomarker labels.

Fourth, clinical utility is more than AUC. For immunotherapy, decision costs are asymmetrical. A model that modestly improves AUC but increases false negatives could harm patients. The paper does the right type of external-split evaluation, but it does not present calibration, decision curve analysis, or expected clinical utility metrics. Those are important next steps before anyone considers deployment.

Finally, provenance and interpretability. Encouraging representations to be pathway-consistent is appealing from an explainability point of view. But we need to see whether that alignment truly produces interpretable features or just constrains the latent space in ways that are hard to unpack. If I am a clinician or regulator, I want to know what the model is using when it predicts response.

Implications for practice

From my perspective building AI systems for clinical use, BioCOMPASS is useful as a design pattern more than a finished product. The idea of injecting curated clinical knowledge via loss terms rather than raw inputs is practical and could be applied wherever biomarkers are noisy or inconsistently available. It may reduce brittle cohort-specific overfitting and make models transfer better across hospitals.

But before clinical translation, several practical steps are needed. Prospective validation or at least genuinely external, multi-institutional validation is essential. Sensitivity analyses that perturb biomarker labels, pathway definitions and treatment encodings will show how brittle the gains are. Calibration and decision-analytic metrics must be reported to judge clinical impact. Finally, operational considerations such as what biomarkers are actually available at a typical treatment center, turnaround time, and costs should guide whether this approach is useful in practice.

I appreciate that the authors emphasise careful curation and complementary clinical information as a future direction. That humility is appropriate. BioCOMPASS does not solve the core scarcity and heterogeneity of clinical cohorts, but it offers a disciplined way to bring domain knowledge into modern architectures. If you are building models for oncology clinics, consider this as one more tool in the toolkit: promising, practical, and worth testing carefully, not a plug-and-play fix.