Tag

Developer Tool

500 articles archived under #developer-tool · RSS

Hugging Face Daily Papers research 1mo ago

ECHO: Terminal Agents Learn World Models for Free

Abstract Environment cross-entropy hybrid objective combines policy-gradient loss with auxiliary environment observation prediction to provide dense supervision from terminal feedback, improving agent performance and self-improvement capabilities. AI-generated summary CLI agents…

23
r/LocalLLaMA community 1mo ago

Llamacpp server : How do the -np and -c flags interact?

I've been using lm studio for a few months. I want to try hermes agents with Qwen 3.6 MoE, so I'm switching to llama.cpp and I don't understand well how the server slots -np and the context size -c interact. The context for each parallel client appears to be equally distributed…

10
arXiv — Machine Learning research 1mo ago

Knowledge Graph Modulated Deep Learning for Limited-Sample Clinical Data Analysis

arXiv:2605.24162v1 Announce Type: new Abstract: Biological systems are governed by structured molecular interactions, where pathways, regulatory circuits, and functional gene relationships shape cellular behavior and disease progression. Much of this knowledge is naturally…

6
arXiv — Machine Learning research 1mo ago

PrivFusion: A Privacy-preserving Multi-Agent Framework for Harmonizing Distributed Datasets

arXiv:2605.24249v1 Announce Type: new Abstract: The growing availability of clinical data has increased the use of machine learning, yet centralized data aggregation is often infeasible for sensitive health information. Federated Learning (FL) offers a distributed alternative,…

19
arXiv — Machine Learning research 1mo ago

Optimizing Digital Therapeutic Interventions: Online Learning under Endogenous Adherence

arXiv:2605.24261v1 Announce Type: new Abstract: A critical challenge facing clinicians managing chronic disease interventions is sustaining long-run patient health given limited information and resources. Digital therapeutics (DTs) provide a cost-effective way to manage…

31
arXiv — Machine Learning research 1mo ago

Lake Detection and Water Quality Estimation in Sentinel-2 Data

arXiv:2605.24515v1 Announce Type: new Abstract: With climate change and increasing human pressure on natural landscapes, inland water resources are becoming progressively scarcer, more vulnerable, and more difficult to manage sustainably. Reliable and automated methods for…

25
arXiv — Machine Learning research 1mo ago

ECHO: Terminal Agents Learn World Models for Free

arXiv:2605.24517v1 Announce Type: new Abstract: CLI agents are the closest thing language models have to an embodied setting: the model emits commands, the terminal executes them, and the returned stream -- stdout, errors, files, logs, and traces -- records the consequences. We…

25
arXiv — Machine Learning research 1mo ago

Hardware-Aware Federated Learning for Speech Emotion Recognition

arXiv:2605.24712v1 Announce Type: new Abstract: Federated learning (FL) enables privacy-preserving collaborative training across distributed edge devices, but real deployments involve heterogeneous clients with different processing power, memory capacity, and communication…

16
arXiv — NLP / Computation & Language research 1mo ago

A Multi-Probe Audit of Clinical-Interview Depression Detection Benchmarks

arXiv:2605.23977v1 Announce Type: new Abstract: This paper audits benchmark evaluation in clinical-interview depression detection through four complementary probes across DAIC/E-DAIC, CMDC, ANDROIDS, MODMA, and PDCH. First, we re-evaluate E-DAIC under strict subject-disjoint…

27
arXiv — NLP / Computation & Language research 1mo ago

When Reasoning Hurts: Source-Aware Evaluation of Frontier LLMs for Clinical SOAP Note Generation

arXiv:2605.24902v1 Announce Type: new Abstract: Reasoning-enabled LLMs perform strongly on medical reasoning benchmarks, but it remains unclear whether these gains transfer to structured clinical documentation; we investigate this question using SOAP note generation from…

13
arXiv — NLP / Computation & Language research 1mo ago

Overview of the PsyDefDetect Shared Task at BioNLP 2026: Detecting Levels of Psychological Defense Mechanisms in Supportive Conversations

arXiv:2605.24907v1 Announce Type: new Abstract: We present an overview of PsyDefDetect, the shared task on detecting levels of psychological defense mechanisms in emotional support dialogues, co-located with BioNLP@ACL 2026. Grounded in the clinically validated Defense Mechanism…

20
arXiv — NLP / Computation & Language research 1mo ago

TRACE: A taxonomy-grounded synthetic dataset for teaching-program generation and session interpretation in Applied Behavior Analysis

arXiv:2605.25038v1 Announce Type: new Abstract: Applied Behavior Analysis (ABA) is a clinical discipline whose documentation, teaching programs and multi-session behavioral logs, is formulaic and high-volume, yet real session data is HIPAA-protected and bound by professional…

28
arXiv — NLP / Computation & Language research 1mo ago

Evidence-Linked Radiology Reporting: A Human-Supervised Reference Architecture for Structured Imaging Intelligence

arXiv:2605.25120v1 Announce Type: new Abstract: Radiology reports remain the primary mechanism by which imaging findings are communicated to clinical teams. However, much of the structured information behind these reports, including measurements, image evidence, prior…

8
Hugging Face Daily Papers research 1mo ago

Geometry-Aware Image Flow Matching

Abstract Geometry-aware generative models leveraging spherical manifolds and optimal transport techniques outperform traditional Euclidean approaches for natural image synthesis. AI-generated summary Recent advances in generative models highlight the power of geometry-aware…

29
Simon Willison community 1mo ago

Notes on Pope Leo XIV's encyclical on AI

Dropped this morning by the Vatican: Magnifica Humanitas of His Holiness Pope Leo XIV on Safeguarding the Human Person in the Time of Artificial Intelligence . This is a very interesting document. It's some of the clearest writing I've seen on the ethics of integrating AI into…

12
r/LocalLLaMA community 1mo ago

AI content detector based on Qwen 0.8b fine-tuned on Pangram dataset

I've fine-tuned Qwen 3.5 0.8B on the dataset provided by Pangram with their EditLens paper. It's available via a Chrome extension ; you can just click selected text and it's going to give you the probability distribution of how likely it is AI-generated. It takes under 1s on my…

36
r/MachineLearning community 1mo ago

Is AI inference platform really that saturated now? [D]

I’m thinking of expanding an on-device inference SDk into a full blown AI inference platform and seeing more and more inference platform popping out. Been talking with a VC from Seattle/NY. Is this space really that saturated?   submitted by   /u/kampak212 [link]  …

35
TechCrunch — AI news-outlet 1mo ago

What ClickUp’s mass layoff tells us about the future of work

The nine-year-old startup is replacing hundreds of employees with thousands of AI agents.

18
TechCrunch — AI news-outlet 1mo ago

The pope’s AI encyclical isn’t really about AI

Pope Leo XIV's first encyclical uses AI as a lens to diagnose older problems: concentrated power, eroding democracy, and a tech elite that shapes the world to its own advantage.

34
Hacker News — AI on Front Page community 1mo ago

Magnifica Humanitas (Encyclical Letter)

Article URL: https://www.vatican.va/content/leo-xiv/en/encyclicals/documents/20260515-magnifica-humanitas.html Comments URL: https://news.ycombinator.com/item?id=48265206 Points: 229 # Comments: 63

36
r/LocalLLaMA community 1mo ago

We added W8A8 activation quantization to MLX — prefill went from 2.84s to 2.52s on M5 Pro

Hey, I work on inference tooling at Mininglamp AI. We needed faster prefill for a 4B VLM running on Apple Silicon. Problem was MLX only does weight-only quant — activations stay FP16 the whole way through. So we wrote Cider, a small SDK that adds W8A8 activation quant on top of…

21
arXiv — Machine Learning research 1mo ago

MedExpMem: Adapting Experience Memory for Differential Diagnosis

arXiv:2605.22872v1 Announce Type: new Abstract: Experienced physicians develop diagnostic expertise through clinical practice, acquiring not only disease knowledge but also the ability to differentiate confusable conditions. Current medical vision-language models (VLMs) lack…

24
arXiv — Machine Learning research 1mo ago

FederatedRSF : Federated Random Survival Forests for Partially Overlapping Medical Data

arXiv:2605.22954v1 Announce Type: new Abstract: Multi-center survival prediction can improve robustness and generalizability, yet privacy regulations and institutional governance often prevent pooling patient-level clinical and genomic data across institutions. In practice,…

28
arXiv — Machine Learning research 1mo ago

Class-Dependent Hybrid Data Augmentation for Multiclass Migraine Classification under Severe Class Imbalance

arXiv:2605.23453v1 Announce Type: new Abstract: We conducted a reproducibility-oriented re-evaluation of prior migraine classification studies, correcting for data leakage and metric bias. We then introduced (i) a clinically motivated aggregation of two hemiplegic subtypes…

23
arXiv — NLP / Computation & Language research 1mo ago

When Symptoms Are Not Enough: Evidence-Weighting Patterns in Large Language Model Psychiatric Screening

arXiv:2605.23148v1 Announce Type: new Abstract: As demand for mental health care outpaces clinician-delivered assessment, scalable screening tools are increasingly needed. Large language models (LLMs) may identify psychiatric risk from patient narratives, but their reliability…

9
arXiv — NLP / Computation & Language research 1mo ago

ClimateChat-300K: A Multi-Modal Facebook Dataset for Understanding Diverse Perspectives in Climate Communication

arXiv:2605.23326v1 Announce Type: new Abstract: We present ClimateChat-300K, a large-scale dataset of 299,329 public Facebook posts about climate change collected between May 2020 and May 2024 through the CrowdTangle platform. The dataset contains 41 metadata features including…

5
arXiv — NLP / Computation & Language research 1mo ago

The Deterministic Horizon: Impossibility Results as Design Specifications for Trustworthy AI Systems

arXiv:2605.23024v1 Announce Type: cross Abstract: Large language models now write software, draft legal documents, and produce clinical notes, yet fundamental limits, from Turing and Arrow to the No Free Lunch theorems, shape what computation can do. This thesis turns such…

22
arXiv — NLP / Computation & Language research 1mo ago

What Does the Server See? Understanding Privacy Leakage from Large Language Models in Split Inference

arXiv:2605.23158v1 Announce Type: cross Abstract: The deployment of large language models (LLMs) on resource-constrained devices remains challenging, spurring interest in split inference, where models are partitioned between client and server to reduce computational burden and…

6
r/LocalLLaMA community 1mo ago

OCR, granite-docling-258m vs granite-docling-2stage-258m: has anyone actually noticed any improvements?

IBM's granite-docling-2stage-258m granite-docling-2stage-258m Granite Docling 2stage builds upon the Granite Docling, but introduces a key modifications: it builds a dynamic prompt that precomputes layout objects found within a page, making it more robust on out of distribution…

19
r/LocalLLaMA community 1mo ago

Have we passed the peak of inflated expectations?

I noticed the number of people in this sub going down a bit and checked out some google trends. Any idea what's causing this sharp decline?   submitted by   /u/fairydreaming [link]   [comments]

18
r/MachineLearning community 1mo ago

Custom image encoder [P]

Hello, I would like to know whether building my own image encoder would be a good idea instead of using models like CLIP, SigLIP/SigLIP2, or DINO. My use case is video frame classification. My pipeline is the following: the client sends me a video stream, sampled at 1 frame per…

5
arXiv — Machine Learning research 1mo ago

HealthCraft: A Reinforcement Learning Safety Environment for Emergency Medicine

arXiv:2605.21496v1 Announce Type: new Abstract: Frontier language models are being deployed into clinical workflows faster than the infrastructure to evaluate them safely. Static medical-QA benchmarks miss the failure modes that matter in emergency medicine: trajectory-level…

4
arXiv — Machine Learning research 1mo ago

Calibration, Uncertainty Communication, and Deployment Readiness in CKD Risk Prediction: A Framework Evaluation Study

arXiv:2605.21566v1 Announce Type: new Abstract: Machine learning models for chronic kidney disease (CKD) risk prediction often post strong discrimination scores on internal test sets. Calibration and uncertainty quantification get far less attention, leaving clinicians without…

9
arXiv — Machine Learning research 1mo ago

ChronoMedicalWorld: A Medical World Model for Learning Patient Trajectories from Longitudinal Care Data

arXiv:2605.21963v1 Announce Type: new Abstract: Long-horizon clinical simulation -- predicting how a patient's physiology evolves over years under specified interventions -- is central to chronic-disease care, yet existing electronic health record (EHR) models are predominantly…

19
arXiv — Machine Learning research 1mo ago

Beyond Euclidean Proximity: Repairing Latent World Models with Horizon-Matched Trajectory Reachability Metrics

arXiv:2605.22164v1 Announce Type: new Abstract: Latent world models can contain the state needed for control, yet their terminal-cost interface can expose the planner to the wrong decision-relevant information. In common latent MPC, candidate sequences are ranked by Euclidean…

20
arXiv — Machine Learning research 1mo ago

Decomposing Ensemble Spread in Lorenz '96 With Learned Stochastic Parameterizations

arXiv:2605.22242v1 Announce Type: new Abstract: Weather and climate forecasts are inherently uncertain due to chaotic dynamics, imperfect initial conditions, and incomplete representation of the underlying physical processes. Operational ensemble forecasts aim to represent these…

30
arXiv — Machine Learning research 1mo ago

Explainable AI for Data-Driven Design of High-Dimensional Predictive Studies

arXiv:2605.22243v1 Announce Type: new Abstract: Predictive modelling is important for health data analysis and data-driven clinical decision-making. However, predictive studies are challenging to design optimally by hand when tens or even hundreds of features require selection,…

19
arXiv — Machine Learning research 1mo ago

No Epoch Like the Present: Robust Climate Emulation Requires Out-of-Distribution Generalisation

arXiv:2605.22248v1 Announce Type: new Abstract: Climate emulation is an out-of-distribution (OOD) projection task. This is precisely the challenge where modern Machine Learning (ML) methods are most prone to failure. Consequently, while current ML emulators trained on present…

38
arXiv — Machine Learning research 1mo ago

Detecting Atypical Clients in Federated Learning via Representation-Level Divergence

arXiv:2605.22266v1 Announce Type: new Abstract: Federated learning enables collaborative training across distributed clients with heterogeneous data, but such heterogeneity often leads to unstable updates and degraded global performance. Moreover, in practical deployments,…

29
arXiv — NLP / Computation & Language research 1mo ago

When Cases Get Rare: A Retrieval Benchmark for Off-Guideline Clinical Question Answering

arXiv:2605.21807v1 Announce Type: new Abstract: Across medical specialties, clinical practice is anchored in evidence-based guidelines that codify best studied diagnostic and treatment pathways. These pathways routinely fall short for the long tail of real-world care not covered…

34
arXiv — NLP / Computation & Language research 1mo ago

ChronoMedKG: A Temporally-Grounded Biomedical Knowledge Graph and Benchmark for Clinical Reasoning

arXiv:2605.22734v1 Announce Type: new Abstract: Biomedical knowledge graphs (KGs) treat disease associations as static facts, but temporal information is crucial for clinical reasoning, e.g., a symptom diagnostic of one disease at age 3 may imply a different disease at age 13.…

32
arXiv — NLP / Computation & Language research 1mo ago

The Double Dilemma in Multi-Task Radiology Report Generation: A Gradient Dynamics Analysis and Solution

arXiv:2605.22635v1 Announce Type: cross Abstract: While multi-task learning based automatic radiology report generation (RRG) is widely adopted to ensure clinical consistency, most focus on architectural designs yet remain limited to coarse linear scalarization strategies. These…

12
Hugging Face Daily Papers research 1mo ago

Training Large Language Models to Predict Clinical Events

Abstract Longitudinal clinical notes are converted into temporal prediction examples using Foresight Learning, enabling improved clinical prediction through LoRA adaptation that enhances calibration and reduces uncertainty compared to base models. AI-generated summary…

34
r/LocalLLaMA community 1mo ago

Gmail tie-ins

hey folks. I’m looking to setup a way to give a local LLM access to google cloud SDK for Gmail functions. The goal is to be able to have an LLM once daily check a spreadsheet, and based on criteria send an email that will be structured exactly the same way each time, simply as a…

14
llama.cpp releases dev-tools 1mo ago

b9276

server: expose prompt token counts in /slots endpoint ( #23454 ) Add n_prompt_tokens, n_prompt_tokens_processed, and n_prompt_tokens_cache to the /slots JSON response. These fields are already tracked internally but were not exposed, making it impossible for clients to monitor…

15
The Information — AI news-outlet 1mo ago

Workday Stock Jumps 10% After Company Reveals AI Agent Gains

Workday shares climbed more than 10% in after-hours trading on Thursday after the HR application maker said the number of customers using its AI agents in the three months ended April 30 roughly doubled from the previous quarter to more than 4,000. Gerrit Kazmaier, the company’s…

38
OpenAI Python SDK releases dev-tools 1mo ago

v2.38.0

2.38.0 (2026-05-21) Full Changelog: v2.37.0...v2.38.0 Features api: api update ( 33d1d01 ) api: manual updates ( a21700a ) api: update OpenAPI spec or Stainless config ( 00265c5 ) Chores api: docs updates ( ee10152 ) check release PR custom code sync ( 2638779 ) remove release…

26
r/LocalLLaMA community 1mo ago

Qwen3.6 35Ba3 has changed my workflows and even how I use my computer

My workflow has changed basically to ask Codex to do certain tasks and then document how to do them (including errors it found on its way) into a skill. I feed that skill to pi, and suddenly my qwen3.6 gets that hard stuff done: - devops on a VPS - using docling to create epubs…

33
Google DeepMind official-blog 1mo ago

We’re launching the Google DeepMind Accelerator program in Asia Pacific to tackle environmental risks

The Asia-Pacific region is a global engine for economic growth, but it's also highly vulnerable to climate change. While green technologies are gaining momentum, a recent report shows they aren’t scaling fast enough to keep up with the region’s rising environmental risks. To…

22
r/LocalLLaMA community 1mo ago

LlamaStation v0.9 — llama.cpp GUI for Windows with multi-backend support, TurboQuant, MTP and more

I've been building this for the past few months as a side project — started because I didn't want to run llama.cpp from the command line every time I wanted to try a model. I just wanted something that worked with a click. Fair warning: I'm not a developer. This is 100% vibe…

33

ECHO: Terminal Agents Learn World Models for Free

Llamacpp server : How do the -np and -c flags interact?

Knowledge Graph Modulated Deep Learning for Limited-Sample Clinical Data Analysis

PrivFusion: A Privacy-preserving Multi-Agent Framework for Harmonizing Distributed Datasets

Optimizing Digital Therapeutic Interventions: Online Learning under Endogenous Adherence

Lake Detection and Water Quality Estimation in Sentinel-2 Data

ECHO: Terminal Agents Learn World Models for Free

Hardware-Aware Federated Learning for Speech Emotion Recognition

A Multi-Probe Audit of Clinical-Interview Depression Detection Benchmarks

When Reasoning Hurts: Source-Aware Evaluation of Frontier LLMs for Clinical SOAP Note Generation

Overview of the PsyDefDetect Shared Task at BioNLP 2026: Detecting Levels of Psychological Defense Mechanisms in Supportive Conversations

TRACE: A taxonomy-grounded synthetic dataset for teaching-program generation and session interpretation in Applied Behavior Analysis

Evidence-Linked Radiology Reporting: A Human-Supervised Reference Architecture for Structured Imaging Intelligence

Geometry-Aware Image Flow Matching

Notes on Pope Leo XIV's encyclical on AI

AI content detector based on Qwen 0.8b fine-tuned on Pangram dataset

Is AI inference platform really that saturated now? [D]

What ClickUp&#8217;s mass layoff tells us about the future of work

The pope’s AI encyclical isn’t really about AI

Magnifica Humanitas (Encyclical Letter)

We added W8A8 activation quantization to MLX — prefill went from 2.84s to 2.52s on M5 Pro

MedExpMem: Adapting Experience Memory for Differential Diagnosis

FederatedRSF : Federated Random Survival Forests for Partially Overlapping Medical Data

Class-Dependent Hybrid Data Augmentation for Multiclass Migraine Classification under Severe Class Imbalance

When Symptoms Are Not Enough: Evidence-Weighting Patterns in Large Language Model Psychiatric Screening

ClimateChat-300K: A Multi-Modal Facebook Dataset for Understanding Diverse Perspectives in Climate Communication

The Deterministic Horizon: Impossibility Results as Design Specifications for Trustworthy AI Systems

What Does the Server See? Understanding Privacy Leakage from Large Language Models in Split Inference

OCR, granite-docling-258m vs granite-docling-2stage-258m: has anyone actually noticed any improvements?

Have we passed the peak of inflated expectations?

Custom image encoder [P]

HealthCraft: A Reinforcement Learning Safety Environment for Emergency Medicine

Calibration, Uncertainty Communication, and Deployment Readiness in CKD Risk Prediction: A Framework Evaluation Study

ChronoMedicalWorld: A Medical World Model for Learning Patient Trajectories from Longitudinal Care Data

Beyond Euclidean Proximity: Repairing Latent World Models with Horizon-Matched Trajectory Reachability Metrics

Decomposing Ensemble Spread in Lorenz '96 With Learned Stochastic Parameterizations

Explainable AI for Data-Driven Design of High-Dimensional Predictive Studies

No Epoch Like the Present: Robust Climate Emulation Requires Out-of-Distribution Generalisation

Detecting Atypical Clients in Federated Learning via Representation-Level Divergence

When Cases Get Rare: A Retrieval Benchmark for Off-Guideline Clinical Question Answering

ChronoMedKG: A Temporally-Grounded Biomedical Knowledge Graph and Benchmark for Clinical Reasoning

The Double Dilemma in Multi-Task Radiology Report Generation: A Gradient Dynamics Analysis and Solution

Training Large Language Models to Predict Clinical Events

Gmail tie-ins

b9276

Workday Stock Jumps 10% After Company Reveals AI Agent Gains

v2.38.0

Qwen3.6 35Ba3 has changed my workflows and even how I use my computer

We’re launching the Google DeepMind Accelerator program in Asia Pacific to tackle environmental risks

LlamaStation v0.9 — llama.cpp GUI for Windows with multi-backend support, TurboQuant, MTP and more

What ClickUp’s mass layoff tells us about the future of work