Fast Exact Nearest-Neighbor Learning for High-Frequency Financial Time Series
Mirrored from arXiv — Machine Learning for archival readability. Support the source by reading on the original site.
Computer Science > Machine Learning
Title:Fast Exact Nearest-Neighbor Learning for High-Frequency Financial Time Series
Abstract:AI efficiency at scale is becoming critical in finance as market data volumes surge across equities, ETFs, FX, options, and high-frequency trading streams. This growth creates a core challenge for mature financial AI systems: models must learn from larger historical corpora while still meeting real-time latency constraints in trading, risk management, and derivative pricing. We use exact nearest-neighbor learning for high-frequency financial time series as a concrete case study to show that Mojo-based financial AI can address this challenge. We introduce a Mojo SIMD k-d tree with variance-based splitting, contiguous flat-buffer storage, and compile-time vectorized distance computation. We also provide a runtime result showing that, under standard pruning and implementation-cost assumptions, the Mojo SIMD k-d tree asymptotically dominates Mojo SIMD brute force and scikit-learn's k-d tree in the fixed-stock, large-$n$, moderate-dimensional regime. Empirically, across eight financial datasets on x86 and ARM64 with up to 277K training samples, the method achieves 17.5--21.6$\times$ speedup over scikit-learn's k-d tree on x86 and 28.1--43.5$\times$ over scikit-learn brute force on ARM64 equity/ETF datasets, while preserving exact outputs. Beyond nearest-neighbor inference, Mojo's compiled execution enables an Extra Trees-based implied-volatility pricing model to train on $10\times$ more options data, reducing put-IV RMSE by 8.0\%. These results position Mojo as a scalable, production-ready stack for financial AI and a promising foundation for efficient AI in other data-intensive fields. \keywords{Financial AI \and AI Efficiency \and Mojo \and SIMD \and K-D Trees \and KNN \and High-Frequency Trading \and Financial Time Series \and Scaling}
| Comments: | 15 pages 5 figures; |
| Subjects: | Machine Learning (cs.LG); Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2606.10219 [cs.LG] |
| (or arXiv:2606.10219v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2606.10219
arXiv-issued DOI via DataCite (pending registration)
|
Access Paper:
- View PDF
- HTML (experimental)
- TeX Source
References & Citations
Bibliographic and Citation Tools
Code, Data and Media Associated with this Article
Demos
Recommenders and Search Tools
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
More from arXiv — Machine Learning
-
Restless bandits with imperfect binary feedback: PCL-indexability analysis and computation
Jun 11
-
Few-Shot Resampling for Scalable Statistically-Sound Data Mining
Jun 11
-
Physics-informed generative AI for semiconductor manufacturing: Enforcing hard physical constraints in generative models by construction
Jun 11
-
Mechanical Field Networks: Structured Neural Dynamics for Multivariate Systems
Jun 11
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.