On the Recoverability of Causal Relations from Bulk Gene Expression Data
Mirrored from arXiv — Machine Learning for archival readability. Support the source by reading on the original site.
arXiv:2606.00568v1 Announce Type: new
Abstract: Bulk gene expression profiling, which aggregates pooled RNA across cells within a biological sample, remains important in the single-cell era because it is typically less noisy, more sensitive, and more cost-effective than single-cell assays. Accordingly, a growing body of computational methods seeks to recover causal relations among genes from bulk expression data. However, aggregation is a lossy, non-invertible coarsening of the underlying cellular system, and it remains unclear whether and under what conditions causal relations are recoverable from aggregated bulk gene expression data. To answer this, we formalize recoverability under aggregation through two notions of consistency: functional-form consistency and conditional-independence consistency. We then derive necessary and sufficient conditions for recoverability, showing that these properties are preserved only under linear aggregations (e.g., sum/mean) coupled with affine structural equations. To assess the practical plausibility of these conditions, analyses of four bulk and four single-cell gene expression datasets further reveal that the estimated pairwise regulatory functions among genes deviate from linearity in both data types, providing limited empirical support for the linearity assumptions required for recoverability. Together, these results caution against recovering causal relations from aggregated bulk expression data without strong additional assumptions.
More from arXiv — Machine Learning
-
BitsMoE: Efficient Spectral Energy-Guided Bit Allocation for MoE LLM Quantization
Jun 2
-
DAStatFormer: A Hybrid Multibranch Transformer with Statistical Feature Integration for DAS-Based Pattern Recognitions
Jun 2
-
Hoeffding Concept Bottleneck Models with Applications to Overhead Images
Jun 2
-
From Demonstrations to Rewards: Test-Time Prompt Optimization for VLM Reward Models
Jun 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.