r/MachineLearning

500 articles archived · Visit source ↗ · RSS

r/MachineLearning community 1mo ago

Making LLMs tell you how confident they really are through probe-targeted fine tuning.[R]

Just wanted to share my research regarding probe-targeted fine-tuning (LoRa) for verbal confidence calibration., If you probe the hidden states of an instruct-tuned LLM, it can tell correct from incorrect answers at 0.76–0.88 AUROC. But when you ask it directly it tends to…

16
r/MachineLearning community 1mo ago

Social Simulation with LLMs - Fidelity in Applications (CFP @ COLM'26) [R]

🌟 Announcing the 2nd Workshop on Social Simulation with LLMs (Social Sim'26) @ COLM 📣 Welcoming Submissions! Submission here:. 🗓️ Deadline: June 23, 2026 (AoE) This year's theme is "Fidelity in Applications”, moving beyond compelling demos toward evaluation, robustness,…

11
r/MachineLearning community 1mo ago

I built a knowledge graph + policy engine for AI agents , explainable reasoning [D]

Hey , I've been building VeritasReason — an open-source Python framework that adds a structured reasoning and provenance layer on top of LLMs and AI agents. The problem it solves: AI agents today make decisions but record nothing. When something breaks in prod, you have zero…

38
r/MachineLearning community 1mo ago

Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems [R]

Are agents aging after deployment? : https://arxiv.org/abs/2605.26302 On a new longitudinal deployment benchmark, switching the Claude Code CLI agent from Sonnet 4.6 to Opus 4.7 dropped PyTest pass rate by ~15%. This (to me) is a counterintuitive-enough result to pay attention…

6
r/MachineLearning community 1mo ago

Wall-OSS-0.5: 4B VLA with open training code and zero-shot real-robot evaluation[D]

Wall-OSS-0.5 is a new 4B VLA release from X Square Robot, built on a 3B VLM backbone with action experts in a Mixture-of-Transformers layout. What caught my eye is that the report evaluates the pretrained checkpoint on real robots before task-specific fine tuning, instead of…

25
r/MachineLearning community 1mo ago

Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]

Spent the last few months building a deeper context layer over arxiv. Each paper gets a Tomesphere page with a TLDR + key findings (LLM-curated), OpenReview reviews where the venue is public, linked GitHub repos, HuggingFace models, conference videos, the citation graph in both…

15
r/MachineLearning community 1mo ago

Built a richer reading layer for arxiv (Chrome extension + web): OpenReview reviews, GitHub/HuggingFace links, citation graph, SPECTER2 neighbors, TLDRs. 3M papers, free, looking for feedback [P]

Spent the last few months building a deeper context layer over arxiv. Each paper gets a Tomesphere page with a TLDR + key findings (LLM-curated), OpenReview reviews where the venue is public, linked GitHub repos, HuggingFace models, conference videos, the citation graph in both…

10
r/MachineLearning community 1mo ago

A new dataset with more that 100M hi-quality, curated images, with captions and meta data! [P]

Hello everyone. The new dataset is named MONET, is Apache 2.0 and available on HF: https://huggingface.co/datasets/jasperai/monet MONET is open, Apache 2.0-licensed image–text dataset. It was built from 2.9 billion images and refined to 104.9 million high-quality samples. We are…

5
r/MachineLearning community 1mo ago

ACM MM 2026 review discussion [D]

The AC email says the rebuttal is between 28 to 4th. The June 4th on website is the deadline. So I created this post for the discussion. I know it's a MM conference and less about ML but I think many people here are still submitting there.   submitted by  …

32
r/MachineLearning community 1mo ago

Training GPT-like model on non-language series [R]

I am responsible for a research project that is supposed to train a GPT-like model (Transformer-decoder) with 100M, 250M and 500M model variants. # params ## training dataset - 750M tokens - vocabulary is ~15k to ~100k tokens (depends on tokenizer settings) - ~3% of the…

29
r/MachineLearning community 1mo ago

Diffusion models for sketch-guided trajectory simulation [R]

Blog post: https://wezteoh.github.io/posts/diffusion-for-sketch-guided-trajectory-simulation/ During NBA games, coaches often sketch attacking plays on a whiteboard and mentally simulate how teammates and defenders might react. In this project, I explored using diffusion models…

30
r/MachineLearning community 1mo ago

STEM PhD's transitioning to MLE/Data [R]

I'm hoping for some advice from any former PhD's outside of machine learning. If you made it into machine learning engineering and/or data science, what was the key for you? Any tips for this job market? It seems like non computer science PhD's are especially in trouble at the…

38
r/MachineLearning community 1mo ago

BEAM 100K memory benchmark: CSM vs Hindsight local artifact comparison [R]

[R] BEAM 100K memory benchmark: CSM vs Hindsight local artifact comparison I’m looking for feedback on a local agent-memory benchmark comparison, especially from people who care about evaluation methodology. I built an open-source R&D memory system called Context Swarm Memory…

31
r/MachineLearning community 1mo ago

Cross-Platform Fused MoE Dispatch in Triton: Portable Expert Routing Without CUDA [R]

New preprint. A Mixture-of-Experts inference kernel (TritonMoE) written entirely in OpenAI Triton, targeting portability across NVIDIA and AMD without vendor-specific code. Highlights: A fused gate+up GEMM computes both SwiGLU projections from shared tile loads, eliminating 35%…

38
r/MachineLearning community 1mo ago

UK GDPR Small Business Q&A — 5,000 synthetic pairs with article-level citations [D]

Dataset for fine-tuning compliance assistants. Each pair includes: - A practical SME-facing question ("Can I use pre-ticked consent boxes?") - An answer with specific UK GDPR article references, ICO guidance by name, and actionable steps - Source metadata: which GDPR concepts…

23
r/MachineLearning community 1mo ago

Should I attend ICML as a junior? [D]

I am a junior in college, and have two accepted workshop papers at ICML 2026. Some background: I had an accepted workshop paper last year at ICLR, but couldn't attend due to a rejected visa, which led to all the more disappointment. So this year I was VERY eager to attend, and…

4
r/MachineLearning community 1mo ago

I used the N.E.A.T algorithm to teach AI how to control a worm in my game in making! It uses evolution to improve. [P]

Each brain is unique, and from the best generations that I save, a worm can pick random brain files to use, letting each worm be completely unique and feel alive. This is for Bonk Universe.   submitted by   /u/Lanse012 [link]   [comments]

38
r/MachineLearning community 1mo ago

"Unified Neural Scaling Laws" paper release [R]

. https://x.com/ethanCaballero/status/2059686905105563907 .   submitted by   /u/Glittering_Author_81 [link]   [comments]

13
r/MachineLearning community 1mo ago

[R] What 1000+ Harness Experiments Taught Me About Self-Improving Agents [R]

I recently wanted to see whether an AI agent could self-improve a harness to solve terminal bench tasks. It’s possible for an AI agent to propose a meaningful one-time change to the harness, but after experimenting with this for a couple of weeks, I think the continuous…

35
r/MachineLearning community 1mo ago

AI-generated CUDA kernels silently break training and inference [R]

Last month NVIDIA released SOL-ExecBench , a new benchmark of 235 production CUDA kernels lifted from DeepSeek, Qwen, Gemma, and Kimi. We took several top-ranked AI-generated submissions and tried using them in production workloads. Many of them broke, sometimes in surprising…

14
r/MachineLearning community 1mo ago

Best Text to Text Translation Model? [D]

I'm working on a project that translates any language into English. So far, I've tried NMT models like NLLB, MADLAD, and SeamlessM4T v2. The main issue is that they struggle with proper nouns such as: - names - places - dates - organizations I also tried LLMs like Gemma 4, Qwen…

22
r/MachineLearning community 1mo ago

Physics Informed Neural Networks for damped harmonic oscillator and Burger's Equation (with extrapolation analysis) [P]

I built a PINN implementation in Python to solve two problems as part of a physics exam project: the damped harmonic oscillator (2nd-order ODE) and the 1D viscid Burgers' equation (nonlinear PDE). Both forward and inverse problems (to estimate unknown equation parameters from…

37
r/MachineLearning community 1mo ago

noisekit - CLI for generating realistic degraded speech datasets for ASR benchmarking [P]

If you've ever tried to pick an STT vendor for a phone-based voice agent or call center product, you've probably hit this wall: you have plenty of real production audio, but it's unlabeled, so you can't compute WER on it. And the annotated public datasets (FLEURS, CommonVoice,…

31
r/MachineLearning community 1mo ago

EMA-Gated Temporal Sequence Compression in Vision Transformers [P]

Vision Transformers waste 90% of their compute recalculating stationary asphalt. NeuroFlow tracks semantic surprise in embedding space, physically eliminating background tokens before the encoder. NeuroFlow is a dynamic routing framework for Vision Transformer video inference.…

34
r/MachineLearning community 1mo ago

Cross-species RSA: same learning rules (BP, PC, STDP, FA) tested against both human fMRI and macaque electrophysiology [P]

Follow-up to my earlier post on learning rules vs. human fMRI. Same five conditions (BP, FA, PC, STDP, untrained), same model weights, now evaluated against macaque V1/V2 (FreemanZiemba2013, single-unit) and macaque V4/IT (MajajHong2015, multi-electrode). Main findings: Early…

23
r/MachineLearning community 1mo ago

Profiling PyTorch training without accidentally stalling the GPU [D]

Profiling PyTorch training has an interesting measurement problem: the more you measure, the more you can change the behavior of the run itself. A simple example is torch.cuda.synchronize() . It gives cleaner timing boundaries, but it also inserts synchronization points into an…

13
r/MachineLearning community 1mo ago

A Tiny Open-Source Self-Driving AI That Runs on a Phone [P]

https://preview.redd.it/ww14mzr2fm3h1.png?width=1890&format=png&auto=webp&s=79873d47ae79c7815ca3e7e91fd43141632174f5 https://www.youtube.com/watch?v=rr_uS4bf0B4&feature=youtu.be trained a 7MB open-source L4 self-driving AI that learns navigation, lane following, and drift…

11
r/MachineLearning community 1mo ago

What to use for Sign Language Recognition [R]

Hi everyone, I'm finishing up my proposal for my undergraduate thesis for computer science on sign language recognition, specifically Filipino Sign Language and i want to ask what architecture to use for my methodology that is best, rn im considering Mediapipe Holistic +…

32
r/MachineLearning community 1mo ago

[R]GNN Model For Fraud Detection Isn't Performing Well[R]

We're writing a research paper on explainable fraud detection GNN model and in the first step we're creating a basic Graph Neural Network for that. We're using the most famous dataset available on this topic i.e IEEE CIS Fraud Detection Dataset and implemented all necessary…

7
r/MachineLearning community 1mo ago

[D] Is IEEE Workshop on Machine Learning for Signal Processing Reputable? [D]

I randomly came across this conference/workshop: IEEE Workshop on Machine Learning for Signal Processing. Is this a reputable conference and is it worthwhile to submit here vs. a workshop at an A* like ICML, NeurIPS, etc.?(I know these deadlines have passed, I have a paper…

37
r/MachineLearning community 1mo ago

Trouble exploring in ai/ml,idk where to being with [D]

So as the title says Context:I am a sophomore in computer science Have prior knowledge in maths(especially the relevant topics in ml) Good enough with numpy,pandas I don't really know where to start Ok internet every second guy is trying to make me earn 100k/year in 3 months…

24
r/MachineLearning community 1mo ago

Augmented Equivariant Mesh Networks for Anatomical Mesh Segmentation (ICML 2026 Workshops) [R]

Paper: https://arxiv.org/abs/2605.08172 Workshops: AI for Science & Structured Data for Health at ICML 2026 Abstract: Anatomical mesh segmentation requires models that operate directly on irregular surface geometry while remaining robust to arbitrary patient pose and mesh…

20
r/MachineLearning community 1mo ago

Tomesphere, 3M paper pages with TLDRs, peer reviews, code, and a SPECTER2 similarity graph [P]

Built a richer paper page for 3 million arxiv and OpenAlex papers. Free, no signup, no paywall. tomesphere.com Each page has a Gemini generated TLDR, peer reviews scraped from OpenReview with reviewer scores and decisions, GitHub repos, HuggingFace models and datasets,…

31
r/MachineLearning community 1mo ago

Verbosity is not faithfulness: an architectural argument that reasoning models cannot perform faithful inference [D]

Essay argues that reasoning models cannot perform faithful inference because their reasoning trace and final answer come from the same operation. Engages with Lanham/Turpin/Mirzadeh in empirical critique, and with HRM, TRM, GRAM, AlphaProof, and Kona/Aleph as the contrasting…

38
r/MachineLearning community 1mo ago

[P] have a couple technical questions for my LLM router. [P]

I am a CS undergrad and I think token economics is the next big problem for companies. I am building a LLM router specifically for code and codebases. The Routing is not actually done by a heavily fine tuned llm(already existing solutions do this). Using a bit of a different…

11
r/MachineLearning community 1mo ago

Added a Chrome Dino-style game to my research tool's pipeline wait screen driven by real SSE events [P]

Slightly unhinged engineering decision but it works. My tool (ScholarScout) has a 2-3 minute pipeline: fetch papers from 8 databases → analyze trends → generate ideas. During that time, the user sees a pixel art owl running through a parallax forest. The fun part: it's not fake…

10
r/MachineLearning community 1mo ago

[D] Dlib or pytorch to CNN? [D]

I’m currently studying ML, more specifically convolutional neural networks (CNNs) for finding patterns in images. Right now, I’m trying to develop a model that can solve the “Where’s Waldo?” challenge. However, I currently have a question: what would be the best option for…

31
r/MachineLearning community 1mo ago

[P] Built a portable GPU ISA after reading too many architecture manuals [P]

I’ve been reading GPU architecture docs in my free time. NVIDIA PTX, AMD ISA reference guides, Intel Xe, reverse-engineered Apple GPU stuff. Over 5,000 pages across 16 microarchitectures. After a while you notice all four vendors are doing the same 11 things with different…

5
r/MachineLearning community 1mo ago

[D] Where do you go for serious AI research discussion online? [D]

Looking for communities where people actually dig into ML/AI research, not hype, not "look what I built with an LLM API," but discussions about papers, training dynamics, debugging real models, infra problems, that kind of thing. I'm specifically interested in places where you…

15
r/MachineLearning community 1mo ago

Already 11 000 submissions for EMNLP? [D]

Is this normal? I searched it up and last year it was only 8000.   submitted by   /u/NightCR_ [link]   [comments]

24
r/MachineLearning community 1mo ago

Aiki my local Wikipedia Retrieval-Augmented Generation system [R]

Hey i built Aiki a lightweight tool that let's you chat with Wikipedia locally. what it does: - Downloads and chunks wikipedia articles (u can choose those articles by their name or articles and also the option of downloading the similar topics) - Uses a custom TF-IDF + cosine…

23
r/MachineLearning community 1mo ago

The famous METR AI time horizons graph contains numerous severe errors [D]

Nathan Witkin, a research writer at NYU Stern’s Tech and Society Lab, writes damningly about the famous METR AI time horizons graph in the Substack publication Transformer: It is impossible to draw meaningful conclusions from METR’s Long Tasks benchmark — in particular once one…

16
r/MachineLearning community 1mo ago

DCGAN inference on a microcontroller: 12.6M parameters, 512KB SRAM, 26-second generation, pure C [P]

Just thought I'd share, I ran a DCGAN on a dual core RISC-V microcontroller, the CH32H417 generating 64x64 cat faces. This is a new RISC-V MCU, so no TFLite, no CMSIS NN and no external memory. It's a pure C inference engine, bit-identical to PyTorch reference outputs. The model…

11
r/MachineLearning community 1mo ago

Is AI inference platform really that saturated now? [D]

I’m thinking of expanding an on-device inference SDk into a full blown AI inference platform and seeing more and more inference platform popping out. Been talking with a VC from Seattle/NY. Is this space really that saturated?   submitted by   /u/kampak212 [link]  …

35
r/MachineLearning community 1mo ago

Reconstructing the agent methodology: Decoupling decision-making and execution - open source [P]

I’ve been thinking about a problem in current agent systems: Most agents are becoming very good at execution, but the decision layer before execution is still unclear. Coding agents, research agents, tool loops, sandboxes, workflows, and harnesses are all improving quickly. Once…

38
r/MachineLearning community 1mo ago

𝐃𝐞𝐥𝐭𝐚 𝐀𝐭𝐭𝐞𝐧𝐭𝐢𝐨𝐧 𝐑𝐞𝐬𝐢𝐝𝐮𝐚𝐥𝐬 [R]

We're excited to release 𝐃𝐞𝐥𝐭𝐚 𝐀𝐭𝐭𝐞𝐧𝐭𝐢𝐨𝐧 𝐑𝐞𝐬𝐢𝐝𝐮𝐚𝐥𝐬, a drop-in upgrade to residual connections that learns which past layers to route from — without the routing collapse that breaks prior cross-layer attention at scale. 🚀 Attention Residuals route over…

9
r/MachineLearning community 1mo ago

Anyone heard from ICML about Oral decisions yet? [D]

hi all, my paper received a spotlight from ICML. they told us that we would receive decisions as to whether our paper would get an oral by the end of the month with the implication that we wouldn’t receive a notification if we didn’t get it; I was just wondering if anyone has…

30
r/MachineLearning community 1mo ago

I’m building an open-source decision layer above AI agents [P]

Hi everyone, I’m Jia, the creator of Spice. I’ve been working on an open-source project called Spice. The simplest way to describe it is: Spice is a decision layer above agents. Most agent systems today are very focused on execution, They are getting better at doing tasks after…

30
r/MachineLearning community 1mo ago

Call for Papers - Workshop on Efficient Reasoning at COLM 2026 [R]

🌟 Announcing the 2nd Workshop on Efficient Reasoning (ER) at @colm2026 — Oct 9! 📣 We welcome submissions! Submit your work here: https://openreview.net/group?id=colmweb.org/COLM/2026/Workshop/Efficient_Reasoning 🗓️ Deadline: July 12, 2026 (AoE) 🔗 Website:…

11
r/MachineLearning community 1mo ago

Best architecture for seamless Bilingual TTS? (Azure / English + Korean) [D]

Hi guys, when building a language learning app (React Native/Expo frontend, Python backend) and I’ve hit a frustrating wall with Text-to-Speech. I need the app to read sentences that mix English instructions and Korean examples (e.g., "To say hello, we use the phrase 안녕하세요.").…

20

Making LLMs tell you how confident they really are through probe-targeted fine tuning.[R]

Social Simulation with LLMs - Fidelity in Applications (CFP @ COLM'26) [R]

I built a knowledge graph + policy engine for AI agents , explainable reasoning [D]

Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems [R]

Wall-OSS-0.5: 4B VLA with open training code and zero-shot real-robot evaluation[D]

Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]

Built a richer reading layer for arxiv (Chrome extension + web): OpenReview reviews, GitHub/HuggingFace links, citation graph, SPECTER2 neighbors, TLDRs. 3M papers, free, looking for feedback [P]

A new dataset with more that 100M hi-quality, curated images, with captions and meta data! [P]

ACM MM 2026 review discussion [D]

Training GPT-like model on non-language series [R]

Diffusion models for sketch-guided trajectory simulation [R]

STEM PhD's transitioning to MLE/Data [R]

BEAM 100K memory benchmark: CSM vs Hindsight local artifact comparison [R]

Cross-Platform Fused MoE Dispatch in Triton: Portable Expert Routing Without CUDA [R]

UK GDPR Small Business Q&A — 5,000 synthetic pairs with article-level citations [D]

Should I attend ICML as a junior? [D]

I used the N.E.A.T algorithm to teach AI how to control a worm in my game in making! It uses evolution to improve. [P]

"Unified Neural Scaling Laws" paper release [R]

[R] What 1000+ Harness Experiments Taught Me About Self-Improving Agents [R]

AI-generated CUDA kernels silently break training and inference [R]

Best Text to Text Translation Model? [D]

Physics Informed Neural Networks for damped harmonic oscillator and Burger's Equation (with extrapolation analysis) [P]

noisekit - CLI for generating realistic degraded speech datasets for ASR benchmarking [P]

EMA-Gated Temporal Sequence Compression in Vision Transformers [P]

Cross-species RSA: same learning rules (BP, PC, STDP, FA) tested against both human fMRI and macaque electrophysiology [P]

Profiling PyTorch training without accidentally stalling the GPU [D]

A Tiny Open-Source Self-Driving AI That Runs on a Phone [P]

What to use for Sign Language Recognition [R]

[R]GNN Model For Fraud Detection Isn't Performing Well[R]

[D] Is IEEE Workshop on Machine Learning for Signal Processing Reputable? [D]

Trouble exploring in ai/ml,idk where to being with [D]

Augmented Equivariant Mesh Networks for Anatomical Mesh Segmentation (ICML 2026 Workshops) [R]

Tomesphere, 3M paper pages with TLDRs, peer reviews, code, and a SPECTER2 similarity graph [P]

Verbosity is not faithfulness: an architectural argument that reasoning models cannot perform faithful inference [D]

[P] have a couple technical questions for my LLM router. [P]

Added a Chrome Dino-style game to my research tool's pipeline wait screen driven by real SSE events [P]

[D] Dlib or pytorch to CNN? [D]

[P] Built a portable GPU ISA after reading too many architecture manuals [P]

[D] Where do you go for serious AI research discussion online? [D]

Already 11 000 submissions for EMNLP? [D]

Aiki my local Wikipedia Retrieval-Augmented Generation system [R]

The famous METR AI time horizons graph contains numerous severe errors [D]

DCGAN inference on a microcontroller: 12.6M parameters, 512KB SRAM, 26-second generation, pure C [P]

Is AI inference platform really that saturated now? [D]

Reconstructing the agent methodology: Decoupling decision-making and execution - open source [P]

𝐃𝐞𝐥𝐭𝐚 𝐀𝐭𝐭𝐞𝐧𝐭𝐢𝐨𝐧 𝐑𝐞𝐬𝐢𝐝𝐮𝐚𝐥𝐬 [R]

Anyone heard from ICML about Oral decisions yet? [D]

I’m building an open-source decision layer above AI agents [P]

Call for Papers - Workshop on Efficient Reasoning at COLM 2026 [R]

Best architecture for seamless Bilingual TTS? (Azure / English + Korean) [D]