Simpson's Paradox in Behavioral Curves: How Aggregation Distorts Parametric Models of User Dynamics
Mirrored from arXiv — Machine Learning for archival readability. Support the source by reading on the original site.
arXiv:2605.11017v1 Announce Type: new
Abstract: Behavioral curve modeling -- fitting parametric functions to engagement-versus-exposure data -- is standard practice in recommendation, advertising, and clinical dosing. We show that aggregation introduces a systematic distortion: Simpson's paradox in behavioral curves. On Goodreads (3.3M users, 9 genres), individual users peak at n* approximately 11 exposures while the aggregate peaks at n* approximately 34 -- a 3x gap driven by survival bias. Amazon Electronics (18M reviews) shows a 5.3x distortion. MovieLens-25M (D approximately 1) serves as a negative control, confirming that survival bias -- not aggregation per se -- is the operative mechanism. The distortion is robust to category granularity, engagement operationalization, and classifier calibration. We develop Synthetic Null Calibration to address a 32% false positive rate in per-user classification. Our findings apply wherever individual behavioral parameters are estimated from aggregate curves under differential attrition.
More from arXiv — Machine Learning
-
Interpretable EEG Microstate Discovery via Variational Deep Embedding: A Systematic Architecture Search with Multi-Quadrant Evaluation
May 13
-
QuIDE: Mastering the Quantized Intelligence Trade-off via Active Optimization
May 13
-
Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models
May 13
-
Rotation-Preserving Supervised Fine-Tuning
May 13
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.