Tag

Developer Tool

500 articles archived under #developer-tool · RSS

arXiv — Machine Learning research 1mo ago

UB-SMoE: Universally Balanced Sparse Mixture-of-Experts for Resource-adaptive Federated Fine-tuning of Foundation Models

arXiv:2605.16690v1 Announce Type: new Abstract: Heterogeneous LoRA-rank methods address system heterogeneity in federated fine-tuning of foundation models by assigning client-specific ranks based on computational capabilities. However, these methods achieve only marginal…

32
arXiv — NLP / Computation & Language research 1mo ago

Artificial Intolerance: Stigmatizing Language in Clinical Documentation Skews Large Language Model Decision-Making

arXiv:2605.17228v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly deployed in high-stakes domains such as clinical decision support and medical documentation. However, the robustness of these models against subtle linguistic variations, specifically…

19
arXiv — NLP / Computation & Language research 1mo ago

Transitivity Meets Cyclicity: Explicit Preference Decomposition for Dynamic Large Language Model Alignment

arXiv:2605.17342v1 Announce Type: new Abstract: Standard RLHF relies on transitive scalar rewards, failing to capture the cyclic nature of human preferences. While some approaches like the General Preference Model (GPM) address this, we identify a theoretical limitation: their…

11
arXiv — NLP / Computation & Language research 1mo ago

Bridging the Version Gap: Multi-version Training Improves ICD Code Prediction, Especially for Rare Codes

arXiv:2605.17755v1 Announce Type: new Abstract: Clinical coding maps clinical documentation to standardized medical codes, an essential yet time-consuming administrative task that could benefit from automation. Current models on ICD coding are typically optimized for codes from…

4
arXiv — NLP / Computation & Language research 1mo ago

Systematic Evaluation of the Quality of Synthetic Clinical Notes Rephrased by LLMs at Million-Note Scale

arXiv:2605.17775v1 Announce Type: new Abstract: Large language models (LLMs) can generate or synthesize clinical text for a wide range of applications, from improving clinical documentation to augmenting clinical text analytics. Yet evaluations typically focus on a narrow aspect…

8
r/LocalLLaMA community 1mo ago

favorite Agentic Coding Harness

So far, I’ve tried Codex CLI, Claude Code, Gemini CLI, OpenCode, and recently, Pi with local models. Pi is the leanest of them all, with just four tools: read, write, edit, and bash. Its system prompt is only under 2K tokens, and it's perfect for local models. I've been trying…

29
Hacker News — AI on Front Page community 1mo ago

Pope Leo XIV’s first encyclical Magnifica humanitas to be published May 25

Article URL: https://www.vaticannews.va/en/pope/news/2026-05/pope-leo-xiv-first-encyclical-magnifica-humanitas.html Comments URL: https://news.ycombinator.com/item?id=48187201 Points: 255 # Comments: 176

17
Hacker News — AI on Front Page community 1mo ago

Click (2016)

Article URL: https://clickclickclick.click/ Comments URL: https://news.ycombinator.com/item?id=48187054 Points: 237 # Comments: 57

35
llama.cpp releases dev-tools 1mo ago

b9216

ui: Refactor models store, MCP service, and gate logs behind VITE_DEBUG ( #23236 ) refactor: Scope console logs to DEV + VITE_DEBUG env vars refactor: skip MCP proxy probe when no server requires it refactor: suppress expected disconnect errors during MCP client shutdown…

33
GitHub Blog — AI & ML official-blog 1mo ago

Take your local GitHub sessions anywhere

Kick off work in VS Code or the CLI, finish it from your phone. Remote control for GitHub Copilot sessions is now generally available on github.com and GitHub Mobile. The post Take your local GitHub sessions anywhere appeared first on The GitHub Blog .

32
Hugging Face Daily Papers research 1mo ago

Sparse Autoencoders enable Robust and Interpretable Fine-tuning of CLIP models

Abstract SAE-FT enables robust fine-tuning of vision-language models by regularizing visual representations through sparse autoencoder constraints, maintaining performance while improving robustness against distribution shifts. AI-generated summary Large-scale pre-trained…

34
arXiv — Machine Learning research 1mo ago

MuteBench: Modality Unavailability Tolerance Evaluation for Incomplete Multimodal Fusion

arXiv:2605.15235v1 Announce Type: new Abstract: Multimodal physiological data powers clinical AI systems from intensive care units to wearable devices, but sensors routinely fail in practice. Two failure modes are common: modality missing, where an entire channel is absent, and…

15
arXiv — Machine Learning research 1mo ago

Logical Grammar Induction via Graph Kolmogorov Complexity: A Neuro-Symbolic Framework for Self-Healing Clinical Data Integrity

arXiv:2605.15242v1 Announce Type: new Abstract: The reliability of Healthcare Information Systems (HIS) is frequently compromised by human-induced data entry errors, which existing statistical anomaly detection methods fail to distinguish from legitimate clinical extremes. This…

34
arXiv — Machine Learning research 1mo ago

PACER: Acyclic Causal Discovery from Large-Scale Interventional Data

arXiv:2605.15353v1 Announce Type: new Abstract: Inferring the structure of directed acyclic graphs (DAGs) from data is a central challenge in causal discovery, particularly in modern high-dimensional settings where large-scale interventional data are increasingly available.…

10
arXiv — Machine Learning research 1mo ago

GOMA: Toward Structure-Driven Multimodal Alignment from a Graph Signal Smoothing Perspective

arXiv:2605.15723v1 Announce Type: new Abstract: Multimodal alignment is commonly learned from isolated image-text pairs via CLIP-style dual encoders, leaving the relational context among entities largely unused. Multimodal attributed graphs (MAGs), where nodes carry multimodal…

37
arXiv — NLP / Computation & Language research 1mo ago

Retrieval-Augmented Large Language Models for Schema-Constrained Clinical Information Extraction

arXiv:2605.15467v1 Announce Type: new Abstract: Conversational nurse-patient transcripts contain actionable observations, but converting these transcripts into structured representations at scale remains challenging. Documentation burden is substantial, with prior studies…

31
arXiv — NLP / Computation & Language research 1mo ago

MHGraphBench: Knowledge Graph-Grounded Benchmarking of Mental Health Knowledge in Large Language Models

arXiv:2605.15589v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used in the mental health domain, yet it remains unclear how well they capture related biomedical knowledge and how reliably they apply it to clinically salient structured judgments.…

23
arXiv — NLP / Computation & Language research 1mo ago

Few-Shot Large Language Models for Actionable Triage Categorization of Online Patient Inquiries

arXiv:2605.15680v1 Announce Type: new Abstract: Online patient inquiries are often informal, incomplete, and written before professional assessment, yet they must still be routed to an appropriate level of clinical follow-up. We study this as a four-class actionable triage task…

18
arXiv — NLP / Computation & Language research 1mo ago

Can Large Language Models Imitate Human Speech for Clinical Assessment? LLM-Driven Data Augmentation for Cognitive Score Prediction

arXiv:2605.16077v1 Announce Type: new Abstract: Accurate assessment of cognitive decline from spontaneous speech remains challenging due to limited dataset size and class imbalance. In this work, we propose a large language model (LLM)-driven data augmentation framework to…

38
arXiv — NLP / Computation & Language research 1mo ago

Fully Open Meditron: An Auditable Pipeline for Clinical LLMs

arXiv:2605.16215v1 Announce Type: cross Abstract: Clinical decision support systems (CDSS) require scrutable, auditable pipelines that enable rigorous, reproducible validation. Yet current LLM-based CDSS remain largely opaque. Most "open" models are open-weight only, releasing…

9
arXiv — NLP / Computation & Language research 1mo ago

When Importance Sampling Misallocates Credit: Asymmetric Ratios for Outcome-Supervised RL

arXiv:2510.06062v2 Announce Type: replace Abstract: Reinforcement learning (RL) has shown great promise in large language models (LLMs) post-training, which typically rely on token-level clipping to maintain stability during optimization. Despite the empirical success of…

29
r/LocalLLaMA community 1mo ago

Made a simple template manager and GUI for llama.cpp so I don't have to keep memorizing CLI flags.

Introducing Hexllama Hey, I’ve always found llama-server to be more than enough for testing out local models, mostly because it guarantees you always have the absolute latest llama.cpp features and architecture support. But keeping track of different CLI commands, context sizes,…

19
llama.cpp releases dev-tools 1mo ago

b9193

server : honor --embd-normalize CLI arg ( #23125 ) The --embd-normalize flag was registered only for the embedding and debug examples, so llama-server rejected it and the /embedding handler used a hard-coded default of 2 (L2). Add LLAMA_EXAMPLE_SERVER to the flag's example set…

7
Hacker News — AI on Front Page community 1mo ago

Fecal transplants for autism deliver success in clinical trials

Article URL: https://refractor.io/adhd-autism/fecal-transplants-for-autism-delivers-success-in-clinical-trials/ Comments URL: https://news.ycombinator.com/item?id=48158494 Points: 213 # Comments: 157

16
r/LocalLLaMA community 1mo ago

Qwen3.6-35B-A3B and 9B are officially on the public Terminal-Bench 2.0 leaderboard!

Qwen3.6-35B-A3B and 9B are officially on the public Terminal-Bench 2.0 leaderboard! little-coder × Qwen3.6-35B-A3B hit 24.6% (±3.2), and now land above Gemini 2.5 Pro on Gemini CLI (19.6%) and Qwen3-Coder-480B on Terminus 2 (23.9%). I didn’t expect the scaffold-model gap from…

13
OpenAI Python SDK releases dev-tools 1mo ago

v2.37.0

2.37.0 (2026-05-13) Full Changelog: v2.36.0...v2.37.0 Features api: add service_tier parameter to responses compact method ( 625827c ) internal/types: support eagerly validating pydantic iterators ( 7e527bc ) Remove unnecessary client_id when using workload identity provider for…

15
TechCrunch — AI news-outlet 1mo ago

The OpenAI trial wraps up, and the Musk founder machine keeps spinning

The Musk v. Altman trial came to a close this week, and the final arguments kept circling back to one question: can we trust the people in charge of AI? All of this is playing out as SpaceX charges toward what could be one of the largest IPOs in American history,…

30
r/LocalLLaMA community 1mo ago

I built a self-hosted open-source MCP server that gives any local LLM real financial data — SEC filings, 13F, insider & congressional trades, short data, FRED

One thing missing when running local models as agents: real, current data. So I built Equibles — a self-hosted MCP server that scrapes and serves public U.S. financial data and exposes it as MCP tools, so any MCP-capable client (Claude Code/Desktop, Cursor, or your own…

30
r/MachineLearning community 1mo ago

[D] Position paper: using hallucination as a construction instrument to distill task-specific cognitive kernels from frontier models [D]

Background: I am a software developer, not an ML researcher. This started from a practical question — why do AI coding tools send proprietary client code to remote servers when the task only requires Swift? Following that question produced this framework. The core proposal…

8
Hacker News — AI on Front Page community 1mo ago

A 0-click exploit chain for the Pixel 10

Article URL: https://projectzero.google/2026/05/pixel-10-exploit.html Comments URL: https://news.ycombinator.com/item?id=48148460 Points: 203 # Comments: 85

36
llama.cpp releases dev-tools 1mo ago

b9161

Support for Codex CLI by skipping unsupported Responses tools ( #23041 ) Support for Codex CLI by skipping unsupported Responses tools Warn on skipped Responses tools and preserve gpt-oss apply_patch rejection Revert gpt-oss apply_patch special handling macOS/iOS: macOS Apple…

29
arXiv — Machine Learning research 1mo ago

Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders

arXiv:2605.13930v1 Announce Type: new Abstract: EEG foundation models achieve state-of-the-art clinical performance, yet the internal computations driving their predictions remain opaque: a barrier to clinical trust. We apply TopK Sparse Autoencoders (SAEs) across three…

9
arXiv — Machine Learning research 1mo ago

Reinforcement Learning for Tool-Calling Agents in Fast Healthcare Interoperability Resources (FHIR)

arXiv:2605.14126v1 Announce Type: new Abstract: Fast Healthcare Interoperability Resources (FHIR) is the dominant standard for interoperable exchange of healthcare data. In FHIR, electronic health records form a directed graph of resources. Answering clinically meaningful…

36
arXiv — Machine Learning research 1mo ago

DT-Transformer: A Foundation Model for Disease Trajectory Prediction on a Real-world Health System

arXiv:2605.14227v1 Announce Type: new Abstract: Accurate disease trajectory prediction is critical for early intervention, resource allocation, and improving long-term outcomes. While electronic health records (EHRs) provide a rich longitudinal view of patient health in clinical…

14
arXiv — Machine Learning research 1mo ago

MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification

arXiv:2605.14289v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) models scale capacity by combining specialized experts, but most existing approaches assume centralized access to training data. In practice, data are distributed across clients and cannot be shared due to…

36
arXiv — Machine Learning research 1mo ago

AIM-DDI: A Model-Agnostic Multimodal Integration Module for Drug-Drug Interaction Prediction

arXiv:2605.14327v1 Announce Type: new Abstract: Drug-drug interaction (DDI) prediction is a critical task in computational biomedicine, as adverse interactions between co-administered drugs can cause severe side effects and clinical risks. A key challenge is unseen-drug…

5
arXiv — Machine Learning research 1mo ago

RxEval: A Prescription-Level Benchmark for Evaluating LLM Medication Recommendation

arXiv:2605.14543v1 Announce Type: new Abstract: Inpatient medication recommendation requires clinicians to repeatedly select specific medications, doses, and routes as a patient's condition evolves. Existing benchmarks formulate this task as admission-level prediction over…

25
arXiv — NLP / Computation & Language research 1mo ago

Mitigating Data Scarcity in Psychological Defense Classification with Context-Aware Synthetic Augmentation

arXiv:2605.14380v1 Announce Type: new Abstract: Psychological defense mechanisms (PDMs) are unconscious cognitive processes that modulate how individuals perceive and respond to emotional distress. Automatically classifying PDMs from text is clinically valuable but severely…

14
arXiv — NLP / Computation & Language research 1mo ago

COTCAgent: Preventive Consultation via Probabilistic Chain-of-Thought Completion

arXiv:2605.15016v1 Announce Type: new Abstract: As large language models empower healthcare, intelligent clinical decision support has developed rapidly. Longitudinal electronic health records (EHR) provide essential temporal evidence for accurate clinical diagnosis and…

25
arXiv — NLP / Computation & Language research 1mo ago

Text Knows What, Tables Know When: Clinical Timeline Reconstruction via Retrieval-Augmented Multimodal Alignment

arXiv:2605.15168v1 Announce Type: new Abstract: Reconstructing precise clinical timelines is essential for modeling patient trajectories and forecasting risk in complex, heterogeneous conditions like sepsis. While unstructured clinical narratives offer semantically rich and…

24
arXiv — NLP / Computation & Language research 1mo ago

A Benchmark for Early-stage Parkinson's Disease Detection from Speech

arXiv:2605.14066v1 Announce Type: cross Abstract: Early-stage Parkinson's disease (EarlyPD) detection from speech is clinically meaningful yet underexplored, and published results are hard to compare because studies differ in datasets, languages, tasks, evaluation protocols, and…

21
Hugging Face Daily Papers research 1mo ago

WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation

Abstract WildClawBench evaluates language and vision-language models on realistic long-horizon tasks using actual CLI environments with real tools instead of synthetic sandboxes. AI-generated summary Large language and vision-language models increasingly power agents that act on…

8
Vercel — AI dev-tools 1mo ago

Use native curl syntax with Vercel CLI

You can now use native curl syntax with the Vercel CLI. The vercel curl command accepts full URLs, bare hostnames, and the --url flag, and uses your Vercel auth to bypass Deployment Protection . If you've linked a project, you can also pass just a path: Update to the latest…

37
Vercel — AI dev-tools 1mo ago

Trace any Vercel request from the CLI

You can now generate Session Traces through the Vercel CLI. Use the new vercel curl --trace command to generate an OpenTelemetry trace to the specified endpoint from the terminal. Use the new vercel traces get command to fetch the generated trace by request ID. Available on all…

38
Vercel — AI dev-tools 1mo ago

Introducing Vercel Drop

Vercel Drop lets you deploy a file or folder by dragging it into your browser. You don't need Git, the Vercel CLI , or any local setup. Drop a project onto vercel.com/drop , pick a team and project name, and select Deploy . Vercel will create a new project, upload your files,…

22
Hugging Face Daily Papers research 1mo ago

MC-RFM: Geometry-Aware Few-Shot Adaptation via Mixed-Curvature Riemannian Flow Matching

Abstract A novel Riemannian flow-matching framework for few-shot adaptation that models feature displacement on a mixed-curvature manifold combining hyperbolic and Euclidean spaces, outperforming existing methods across multiple benchmarks. AI-generated summary…

22
Latent.Space news-outlet 1mo ago

AI-Native Healthcare: 100M Doctor Visits, 10–20 Hours Saved, Prior Auth in Minutes — Janie Lee & Chai Asawa, Abridge

How Abridge is quietly turning the patient and clinician conversation into the operating system of healthcare

10
r/LocalLLaMA community 1mo ago

A VERY lightweight open web-search tool for smaller local LLMs

Hey everyone, Been playing around with local agent setups lately, mostly Cline/Roo with smaller models, and web search kept annoying me. Not because it doesn’t work, but because it usually throws way too much random page text into the context. small models really don’t handle…

29
Hugging Face Daily Papers research 1mo ago

RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation

Abstract RealICU benchmark evaluates large language models for ICU decision support using hindsight-annotated patient trajectories, revealing limitations in clinical recommendation accuracy and early interpretation bias. AI-generated summary Intensive care units (ICU) generate…

32
r/LocalLLaMA community 1mo ago

Computer-use MCP that can control multiple machines (Integrate with claude, Cursor, Codex or your custom harness)

Hey everyone, We built opendesk: it lets AI agents control your desktop using computer use MCP that can integrate with your custom workflow. Today we shipped something a bit wild: Your AI can now see, click, type, and navigate on a completely different computer, over your WiFi.…

20

UB-SMoE: Universally Balanced Sparse Mixture-of-Experts for Resource-adaptive Federated Fine-tuning of Foundation Models

Artificial Intolerance: Stigmatizing Language in Clinical Documentation Skews Large Language Model Decision-Making

Transitivity Meets Cyclicity: Explicit Preference Decomposition for Dynamic Large Language Model Alignment

Bridging the Version Gap: Multi-version Training Improves ICD Code Prediction, Especially for Rare Codes

Systematic Evaluation of the Quality of Synthetic Clinical Notes Rephrased by LLMs at Million-Note Scale

favorite Agentic Coding Harness

Pope Leo XIV’s first encyclical Magnifica humanitas to be published May 25

Click (2016)

b9216

Take your local GitHub sessions anywhere

Sparse Autoencoders enable Robust and Interpretable Fine-tuning of CLIP models

MuteBench: Modality Unavailability Tolerance Evaluation for Incomplete Multimodal Fusion

Logical Grammar Induction via Graph Kolmogorov Complexity: A Neuro-Symbolic Framework for Self-Healing Clinical Data Integrity

PACER: Acyclic Causal Discovery from Large-Scale Interventional Data

GOMA: Toward Structure-Driven Multimodal Alignment from a Graph Signal Smoothing Perspective

Retrieval-Augmented Large Language Models for Schema-Constrained Clinical Information Extraction

MHGraphBench: Knowledge Graph-Grounded Benchmarking of Mental Health Knowledge in Large Language Models

Few-Shot Large Language Models for Actionable Triage Categorization of Online Patient Inquiries

Can Large Language Models Imitate Human Speech for Clinical Assessment? LLM-Driven Data Augmentation for Cognitive Score Prediction

Fully Open Meditron: An Auditable Pipeline for Clinical LLMs

When Importance Sampling Misallocates Credit: Asymmetric Ratios for Outcome-Supervised RL

Made a simple template manager and GUI for llama.cpp so I don't have to keep memorizing CLI flags.

b9193

Fecal transplants for autism deliver success in clinical trials

Qwen3.6-35B-A3B and 9B are officially on the public Terminal-Bench 2.0 leaderboard!

v2.37.0

The OpenAI trial wraps up, and the Musk founder machine keeps spinning

I built a self-hosted open-source MCP server that gives any local LLM real financial data — SEC filings, 13F, insider & congressional trades, short data, FRED

[D] Position paper: using hallucination as a construction instrument to distill task-specific cognitive kernels from frontier models [D]

A 0-click exploit chain for the Pixel 10

b9161

Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders

Reinforcement Learning for Tool-Calling Agents in Fast Healthcare Interoperability Resources (FHIR)

DT-Transformer: A Foundation Model for Disease Trajectory Prediction on a Real-world Health System

MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification

AIM-DDI: A Model-Agnostic Multimodal Integration Module for Drug-Drug Interaction Prediction

RxEval: A Prescription-Level Benchmark for Evaluating LLM Medication Recommendation

Mitigating Data Scarcity in Psychological Defense Classification with Context-Aware Synthetic Augmentation

COTCAgent: Preventive Consultation via Probabilistic Chain-of-Thought Completion

Text Knows What, Tables Know When: Clinical Timeline Reconstruction via Retrieval-Augmented Multimodal Alignment

A Benchmark for Early-stage Parkinson's Disease Detection from Speech

WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation

Use native curl syntax with Vercel CLI

Trace any Vercel request from the CLI

Introducing Vercel Drop

MC-RFM: Geometry-Aware Few-Shot Adaptation via Mixed-Curvature Riemannian Flow Matching

AI-Native Healthcare: 100M Doctor Visits, 10–20 Hours Saved, Prior Auth in Minutes — Janie Lee & Chai Asawa, Abridge

A VERY lightweight open web-search tool for smaller local LLMs

RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation

Computer-use MCP that can control multiple machines (Integrate with claude, Cursor, Codex or your custom harness)