Tag

Developer Tool

500 articles archived under #developer-tool · RSS

arXiv — Machine Learning research 29m ago

NIVA: A Multimodal Foundation Model for Actionable Earth System Intelligence

arXiv:2606.28546v1 Announce Type: new Abstract: Recent advances in AI-driven weather and climate modeling have improved forecast skill while reducing computational cost. However, existing data-driven approaches are limited in their ability to model coupled Earth system dynamics,…

9
arXiv — Machine Learning research 29m ago

GLACIER: Rethinking Mass Spectrum Prediction as an Object Detection Problem

arXiv:2606.29161v1 Announce Type: new Abstract: Predicting tandem mass spectra (MS/MS) from molecular structures represents a central task in analytical chemistry with direct relevance to clinical metabolomics, systems biology, and adjacent disciplines. In this work, we revisit…

13
arXiv — Machine Learning research 29m ago

SP-CACW: Convergence-Aware Client Weighting for Selfish Personalized Learning

arXiv:2606.29322v1 Announce Type: new Abstract: Collaborative learning is sustainable only when it benefits each participant. Standard federated learning optimizes a global average objective, which can under perform for clients whose data distributions differ substantially from…

35
arXiv — NLP / Computation & Language research 29m ago

A French OSCE Dialogue Dataset and Controllable Virtual Patient System for Clinical Training

arXiv:2606.28526v1 Announce Type: new Abstract: The clinical and communication skills of medical students are commonly assessed through Objective Structured Clinical Examinations (OSCEs), which consist of brief scenario-driven simulations of doctor-patient interactions. However,…

36
arXiv — NLP / Computation & Language research 29m ago

The strength of clinical evidence is recoverable from language model representations but not from their stated grades

arXiv:2606.29034v1 Announce Type: new Abstract: Large language models (LLMs) increasingly summarize clinical evidence, where a claim's weight depends on how strongly it is supported. Yet these models convey confidence poorly, and properties they never state, such as truth, are…

17
arXiv — NLP / Computation & Language research 29m ago

TriageRA-CCF: Source-Side Clinical Confidence and Coverage Signals for Adaptive Rank Budgeting in Medical LLMs

arXiv:2606.29375v1 Announce Type: new Abstract: Medical large language models are commonly adapted with a fixed low-rank budget, even though medical questions differ substantially in confidence, clinical coverage, and cross-domain difficulty. We study adaptive rank budgeting for…

15
arXiv — NLP / Computation & Language research 29m ago

How much of an LLM-generated clinical corpus is actually new? A production-scale measurement of content redundancy for provenance classification

arXiv:2606.29605v1 Announce Type: new Abstract: Clinical machine learning increasingly relies on training corpora generated by large language models (LLMs) rather than annotated by clinicians, and such corpora are described and reused largely on the basis of their reported…

12
arXiv — NLP / Computation & Language research 29m ago

Clinical Reasoning Graphs: Structured Evaluation of LLM Diagnostic Reasoning Reveals Competence Without Consistency

arXiv:2606.29876v1 Announce Type: new Abstract: Modern large language models (LLMs) reach 60-70% diagnostic accuracy on complex clinical case benchmarks, but accuracy alone cannot distinguish stable clinically-grounded reasoning from pattern matching. We introduce clinical…

10
r/LocalLLaMA community 6h ago

Krea-2-Turbo Image Model - Easy to be fully uncensored, but it can also EDIT Images!

I've been super impressed with Krea-2-Turbo. It can generate high quality images in ~3 seconds. The quality is quite good compared to other local AI image gen models. Now, I don't want to make you watch or click a you tube video, so I'll just give these clear instructions on how…

5
r/LocalLLaMA community 14h ago

Anyone else end up building a web access layer for local AI agents?

I've been running local models for most of my experiments, and I kept running into the same issue. The model lives locally, but everything it needs to interact with doesn't. Every new agent ended up with another GitHub client, another Reddit integration, another documentation…

10
Vercel — AI dev-tools 20h ago

Query Speed Insights from the Vercel CLI

You can now query Speed Insights datapoints directly through the Vercel CLI. Using the vercel metrics command, you can pull core Web Vitals (LCP, INP, CLS) and other page performance metrics (FCP, TTFB) based on client-side measurements from real user traffic. By providing a…

9
arXiv — Machine Learning research 1d ago

FoggyTrust: Robust Federated Learning with Hierarchical Trust Networks

arXiv:2606.27622v1 Announce Type: new Abstract: Byzantine-robust federated learning seeks to protect distributed model training from malicious or corrupted clients without requiring access to their private data. FLTrust addresses this challenge by introducing a trusted…

33
arXiv — Machine Learning research 1d ago

OperatorSHAP: Fast and Accurate Shapley Value Estimation for Neural Operators

arXiv:2606.28065v1 Announce Type: new Abstract: Understanding model predictions is essential for physical applications, where outputs often inform safety-critical decisions, such as structural load assessment, weather warnings, and clinical diagnosis. Shapley values satisfy many…

20
arXiv — Machine Learning research 1d ago

CPAgents: Agentic Composite Phenotype Generation for Cardiac Disease Association

arXiv:2606.28179v1 Announce Type: new Abstract: Identifying robust associations between cardiac imaging phenotypes and clinical diseases is fundamental to population-scale cardiovascular research and reliable risk stratification. However, current phenome-wide association studies…

13
arXiv — NLP / Computation & Language research 1d ago

From Black-Box to Clinical Insight: A Multi-Stage Explainable Framework for Speech-Based Cognitive Impairment Detection

arXiv:2606.27973v1 Announce Type: new Abstract: Speech-based cognitive impairment detection offers a noninvasive, accessible alternative to costly biomarker assays, yet transformer-based models remain clinically uninterpretable. We propose a multi-stage explainability framework…

23
arXiv — NLP / Computation & Language research 1d ago

The Signal-Coverage Matrix: Stratifying Type and Semantic Errors in Statement Autoformalization

arXiv:2606.28013v1 Announce Type: new Abstract: Headline type-correctness (TC\%) of LLM autoformalization has climbed from $\sim$53\% to $\sim$76\% in two years, yet this scalar conceals which errors each method resolves. We propose a signal-coverage matrix that crosses the Lean…

23
arXiv — NLP / Computation & Language research 1d ago

Aloe-Vision: Robust Vision-Language Models for Healthcare

arXiv:2606.27500v1 Announce Type: cross Abstract: Large Vision-Language Models (LVLMs) specialized in healthcare are emerging as a promising research direction due to their potential impact in clinical and biomedical applications. However, progress is constrained by the scarcity…

28
Vercel — AI dev-tools 1d ago

xAI Grok audio models now available on Vercel AI Gateway

xAI's audio models are now live on AI Gateway. Realtime voice, text to speech, and speech to text are all available through the AI SDK with the same routing, observability, and spend controls as your other models. These capabilities are available on the AI SDK 7 release.…

11
r/MachineLearning community 2d ago

I silently break training codes or configs so I made pybench [P]

It is like pytest but for statistical tests: it ensures no regression of your metrics at a statistical level. It manages tedious things such that seeds, past benchmark results, ... Simple CLI working like pytest but with benchmarks/ directory instead of tests/: pybench # 1st…

38
r/LocalLLaMA community 3d ago

Hello there! (again) i ported my kokoro enhancements so you can use them in your projects.

i made a web based and python based version of the enhancements i made to kokoro's controls. both are, of course, fully client side. if you have hardware acceleration turned on in your browser, kokoro runs on webgpu at about 40ms per generation. it's really fast. note: the…

36
Vercel — AI dev-tools 3d ago

Query Web Analytics from the Vercel CLI

You can now query Web Analytics datapoints directly through the Vercel CLI. Using the vercel metrics command, you can pull page views, visitors, and custom events for your Vercel projects to analyze traffic, compare trends, and answer questions about site performance. By…

14
r/MachineLearning community 3d ago

Made a free tool that automatically cuts the best clips from long videos — thought this community might find it useful [P]

I edit a lot of long-form content and got tired of scrubbing through hour-long recordings to find the good moments. So I built something to do it. You give it a video file (or a YouTube link), it figures out which parts are actually worth watching, and exports short clips in…

22
arXiv — Machine Learning research 4d ago

Beyond Feedforward Networks: Reentry Neural Systems as the Fundamental Basis of Subjecthood and Intrinsic Safety of Next-Generation AGI

arXiv:2606.26406v1 Announce Type: new Abstract: We propose a complete architectural blueprint for safe artificial general intelligence based on a closed reentry loop (D I cycle). In contrast to feedforward networks, which are directed acyclic graphs (C=0, S=0) incapable of…

37
arXiv — NLP / Computation & Language research 4d ago

Context Recycling for Long-Horizon LLM Inference

arXiv:2606.26105v1 Announce Type: new Abstract: Large language models (LLMs) exhibit strong capabilities in short-context reasoning but degrade in performance over long conversational horizons due to context window limitations and inefficient token usage. We introduce…

27
arXiv — Machine Learning research 4d ago

Dot-Flik: A Scalable Edge AI Architecture for Distributed Insect Monitoring

arXiv:2606.26121v1 Announce Type: cross Abstract: Global insect population declines necessitate scalable, continuous monitoring systems, yet existing vision-based solutions remain constrained by high hardware costs, energy demands, and reliance on centralized processing or cloud…

11
arXiv — NLP / Computation & Language research 4d ago

Comparing BERT Sentence-Pair Classification and Few-Shot LLM Prompting for Detecting Threat and Solution Framing in German Climate News

arXiv:2606.26489v1 Announce Type: new Abstract: News media play a central role in shaping public perceptions of climate change, and whether coverage emphasizes threats or solutions has measurable effects on audience engagement and policy support. Automated detection of these…

23
arXiv — NLP / Computation & Language research 4d ago

SamaVaani: Auditing and Debiasing Multilingual Clinical ASR for Indian Languages

arXiv:2606.26901v1 Announce Type: new Abstract: Automatic Speech Recognition (ASR) is increasingly used to document clinical encounters, yet its reliability in multilingual and demographically diverse Indian healthcare context remains largely unknown. In this study, we first…

6
arXiv — NLP / Computation & Language research 4d ago

From Clicks to Intent: Cross-Platform Session Embeddings with LLM-Distilled Taxonomy for Financial Services Recommendations

arXiv:2606.26277v1 Announce Type: cross Abstract: Sequential user behavior modeling is widely adopted in industrial recommender systems; however, significant gaps remain in financial services, where pre-login web interactions and authenticated in-app experiences differ…

24
arXiv — NLP / Computation & Language research 4d ago

Somatic in the East, Psychological in the West?: Investigating Clinically-Grounded Cross-Cultural Depression Symptom Expression in LLMs

arXiv:2508.03247v2 Announce Type: replace Abstract: Prior clinical psychology research shows that Western individuals with depression tend to report psychological symptoms, while Eastern individuals report somatic ones. We test whether Large Language Models (LLMs), which are…

5
Hugging Face Daily Papers research 4d ago

GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents

Abstract Computer-use agents can execute software tasks through either graphical interfaces or programmatic command interfaces, but existing evaluations confound interaction modality with differences in tasks, initial states, verifiers, and permitted actions. We introduce a…

7
r/LocalLLaMA community 4d ago

Good YouTube channels for local LLM news and development?

Sometimes I'd prefer chilling on the couch and learning instead of reading. I've searched on YouTube and most seem like clickbait and slop. Thanks   submitted by   /u/6jarjar6 [link]   [comments]

5
r/LocalLLaMA community 4d ago

Which model for technical documentation?

Looking to create high level / low level designs (software), based on existing templates/examples, cross reference code, use mcp to download confluence/jira data - also plug into agentic ‘coding’ frameworks opencode . I mostly use opus 3.6 with Kiro-cli , but I want my data…

32
Hacker News — AI on Front Page community 4d ago

Show HN: OpenKnowledge – open source AI-first alternative to Obsidian/Notion

Hi HN, Nick here. We’re launching OpenKnowledge ( https://openknowledge.ai/ ), a “what you see is what you get” markdown editor that has direct integrations with Claude, Codex, and other agents. Available as MacOS app or Web UI+CLI. Fully free/local and OSS. We built this…

20
Vercel — AI dev-tools 4d ago

AI SDK 7

AI SDK, with over 16 million weekly downloads, is the TypeScript SDK for building AI applications, features, frameworks, and agents across any model provider. It's the same layer eve , Vercel's open-source agent framework, is built on. AI SDK 7 adds production depth for agent…

15
r/LocalLLaMA community 4d ago

Worse quality with MTP - Qwen 3.6, Gemma 4

Hi. I am self-hosting Qwen 3.6 27B Q8_K_XL with Llama.cpp on 4x5070ti. (All 4 cards are on single x16 slot bifurcated to 4x4 with risers). I've been testing it on several work repos with Opencode CLI and in like 8/10 situations the output of non-MTP model is far superior to the…

8
Vercel — AI dev-tools 4d ago

AI SDK 7 is now available

AI SDK 7 is a major release for building production agents in TypeScript. The SDK has grown from model calls and chat primitives into a broader agent platform for developing, running, integrating, and observing agents across text, audio, realtime, image, and video. Every major…

8
arXiv — Machine Learning research 5d ago

Enhancing Clinician Decision-Making via Uncertainty-Aware Multi-Expert Fusion for Stroke Rehabilitation

arXiv:2606.24960v1 Announce Type: new Abstract: Tailoring stroke rehabilitation requires assessing how movements are organized, not merely if they succeed. Currently, this assessment is a rate-limiting bottleneck. Instruments like the Action Research Arm Test (ARAT) compress…

20
arXiv — Machine Learning research 5d ago

Communicability-Inspired Positional Encoding (CIPE)

arXiv:2606.25293v1 Announce Type: new Abstract: Positional encodings (PEs) are essential for Transformers. Yet designing effective PEs for non-Euclidean graphs remains challenging. Such encodings should ideally induce an Attention-Compatible Geometry for self-attention: not…

15
arXiv — Machine Learning research 5d ago

Interpretable Concept-Guided Polynomial Tabular Kolmogorov-Arnold Network for EEG-Based Mild Cognitive Impairment Detection

arXiv:2606.25434v1 Announce Type: new Abstract: Early and scalable detection of mild cognitive impairment (MCI) remains an unresolved clinical challenge. Existing EEG-based screening approaches are constrained by handcrafted feature pipelines that discard neurophysiologically…

10
arXiv — NLP / Computation & Language research 5d ago

Cliff Tokens: Identifying Single-Token Failure Triggers in LLM Mathematical Reasoning

arXiv:2606.25524v1 Announce Type: cross Abstract: Large language models (LLMs) reach high accuracy in mathematical reasoning, but individual traces on the same problem diverge; some arrive at the correct answer while others fail. Prior work analyzes failure at the step, chunk,…

32
arXiv — NLP / Computation & Language research 5d ago

Uncertainty Quantification for Computer-Use Agents: A Benchmark across Vision-Language Models and GUI Grounding Datasets

arXiv:2606.25760v1 Announce Type: cross Abstract: Computer-use agents turn vision-language model (VLM) predictions into executable GUI clicks, so reliable uncertainty estimates are essential for rejection, calibration, miss-severity ranking, and spatial safety regions. Yet…

14
arXiv — NLP / Computation & Language research 5d ago

Paid Voices vs. Public Feeds: Interpretable Cross-Platform Theme-Based Analysis of Climate Discourse

arXiv:2601.13317v2 Announce Type: replace Abstract: Climate discourse online shapes public understanding of climate change and informs political and policy debate, yet it unfolds across structurally different environments: paid advertising platforms host targeted,…

9
Vercel — AI dev-tools 5d ago

Deep Agents and OpenCode are now available in the AI SDK Harness

The AI SDK Harness lets you run established coding-agent runtimes through one unified interface, so you can switch runtimes without changing your application code. Today we're adding two new adapters, Deep Agents and OpenCode, both running inside a Vercel Sandbox. Deep Agents…

27
Vercel — AI dev-tools 5d ago

Vercel Flags no longer requires SDK Keys for Vercel deployments

New projects using Vercel Flags no longer need to configure SDK Keys or the FLAGS environment variable when evaluating flags inside a Vercel deployment. At runtime, the Vercel adapter automatically receives a short-lived OIDC token, so authentication is handled for you with zero…

5
Anthropic SDK (Python) releases dev-tools 5d ago

v0.112.0

0.112.0 (2026-06-24) Full Changelog: v0.111.0...v0.112.0 Features client: add support for system.message streaming events ( 2450d59 ) Bug Fixes memory tool: create parent directories with the correct permissions ( #135 ) ( f2fc2a9 ) Chores api: add support for new refusal…

21
arXiv — Machine Learning research 6d ago

Reconstructing GRACE Terrestrial Water Storage with Spatio-Temporal Graph Neural Networks: An Application to South America

arXiv:2606.23833v1 Announce Type: new Abstract: Terrestrial water storage (TWS) integrates snow, soil moisture, surface water, and groundwater and is a key indicator of how climate variability and human activity reshape the global water cycle. The GRACE and GRACE-FO satellite…

21
arXiv — Machine Learning research 6d ago

Federated Survival Analysis in Healthcare: A Multi-Model Evaluation on Cross-Institutional Heterogeneous Breast Cancer Data

arXiv:2606.23871v1 Announce Type: new Abstract: Survival analysis is central to clinical decision-making, yet reliable time-to-event models require large, diverse cohorts that are rarely available at a single institution, while privacy regulations restrict the centralization of…

28
arXiv — Machine Learning research 6d ago

GRACE: Gated Refinement for Accurate Causal Edge Discovery in High-Dimensional Time Series

arXiv:2606.23880v1 Announce Type: new Abstract: From climate teleconnections to gene regulation, modern time-series datasets encompass tens or hundreds of interacting variables, making causal discovery increasingly challenging. Constraint-based methods offer statistical rigor…

30
arXiv — Machine Learning research 6d ago

KLip-PPO: A per-sample KL perspective on PPO-Clip

arXiv:2606.23932v1 Announce Type: new Abstract: Proximal Policy Optimization (PPO) is the standard policy-gradient algorithm for on-policy reinforcement learning. The literature presents it in two forms, a clipped surrogate that bounds the importance ratio between successive…

8
arXiv — Machine Learning research 6d ago

Cyclic Denoising Reveals Ultrastable Memories in Diffusion Models

arXiv:2606.24000v1 Announce Type: new Abstract: We introduce cyclic denoising -- repeated forward and reverse diffusion at controlled noise amplitudes -- as an extraction attack for image diffusion models. Inspired by random organization in disordered solids, cyclic denoising…

17

NIVA: A Multimodal Foundation Model for Actionable Earth System Intelligence

GLACIER: Rethinking Mass Spectrum Prediction as an Object Detection Problem

SP-CACW: Convergence-Aware Client Weighting for Selfish Personalized Learning

A French OSCE Dialogue Dataset and Controllable Virtual Patient System for Clinical Training

The strength of clinical evidence is recoverable from language model representations but not from their stated grades

TriageRA-CCF: Source-Side Clinical Confidence and Coverage Signals for Adaptive Rank Budgeting in Medical LLMs

How much of an LLM-generated clinical corpus is actually new? A production-scale measurement of content redundancy for provenance classification

Clinical Reasoning Graphs: Structured Evaluation of LLM Diagnostic Reasoning Reveals Competence Without Consistency

Krea-2-Turbo Image Model - Easy to be fully uncensored, but it can also EDIT Images!

Anyone else end up building a web access layer for local AI agents?

Query Speed Insights from the Vercel CLI

FoggyTrust: Robust Federated Learning with Hierarchical Trust Networks

OperatorSHAP: Fast and Accurate Shapley Value Estimation for Neural Operators

CPAgents: Agentic Composite Phenotype Generation for Cardiac Disease Association

From Black-Box to Clinical Insight: A Multi-Stage Explainable Framework for Speech-Based Cognitive Impairment Detection

The Signal-Coverage Matrix: Stratifying Type and Semantic Errors in Statement Autoformalization

Aloe-Vision: Robust Vision-Language Models for Healthcare

xAI Grok audio models now available on Vercel AI Gateway

I silently break training codes or configs so I made pybench [P]

Hello there! (again) i ported my kokoro enhancements so you can use them in your projects.

Query Web Analytics from the Vercel CLI

Made a free tool that automatically cuts the best clips from long videos — thought this community might find it useful [P]

Beyond Feedforward Networks: Reentry Neural Systems as the Fundamental Basis of Subjecthood and Intrinsic Safety of Next-Generation AGI

Context Recycling for Long-Horizon LLM Inference

Dot-Flik: A Scalable Edge AI Architecture for Distributed Insect Monitoring

Comparing BERT Sentence-Pair Classification and Few-Shot LLM Prompting for Detecting Threat and Solution Framing in German Climate News

SamaVaani: Auditing and Debiasing Multilingual Clinical ASR for Indian Languages

From Clicks to Intent: Cross-Platform Session Embeddings with LLM-Distilled Taxonomy for Financial Services Recommendations

Somatic in the East, Psychological in the West?: Investigating Clinically-Grounded Cross-Cultural Depression Symptom Expression in LLMs

GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents

Good YouTube channels for local LLM news and development?

Which model for technical documentation?

Show HN: OpenKnowledge – open source AI-first alternative to Obsidian/Notion

AI SDK 7

Worse quality with MTP - Qwen 3.6, Gemma 4

AI SDK 7 is now available

Enhancing Clinician Decision-Making via Uncertainty-Aware Multi-Expert Fusion for Stroke Rehabilitation

Communicability-Inspired Positional Encoding (CIPE)

Interpretable Concept-Guided Polynomial Tabular Kolmogorov-Arnold Network for EEG-Based Mild Cognitive Impairment Detection

Cliff Tokens: Identifying Single-Token Failure Triggers in LLM Mathematical Reasoning

Uncertainty Quantification for Computer-Use Agents: A Benchmark across Vision-Language Models and GUI Grounding Datasets

Paid Voices vs. Public Feeds: Interpretable Cross-Platform Theme-Based Analysis of Climate Discourse

Deep Agents and OpenCode are now available in the AI SDK Harness

Vercel Flags no longer requires SDK Keys for Vercel deployments

v0.112.0

Reconstructing GRACE Terrestrial Water Storage with Spatio-Temporal Graph Neural Networks: An Application to South America

Federated Survival Analysis in Healthcare: A Multi-Model Evaluation on Cross-Institutional Heterogeneous Breast Cancer Data

GRACE: Gated Refinement for Accurate Causal Edge Discovery in High-Dimensional Time Series

KLip-PPO: A per-sample KL perspective on PPO-Clip

Cyclic Denoising Reveals Ultrastable Memories in Diffusion Models