News / #developer-tool Tag Developer Tool 500 articles archived under #developer-tool · RSS Sign in to follow arXiv — Machine Learning research 1mo ago UB-SMoE: Universally Balanced Sparse Mixture-of-Experts for Resource-adaptive Federated Fine-tuning of Foundation Models arXiv:2605.16690v1 Announce Type: new Abstract: Heterogeneous LoRA-rank methods address system heterogeneity in federated fine-tuning of foundation models by assigning client-specific ranks based on computational capabilities. However, these methods achieve only marginal… 32 arXiv — NLP / Computation & Language research 1mo ago Artificial Intolerance: Stigmatizing Language in Clinical Documentation Skews Large Language Model Decision-Making arXiv:2605.17228v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly deployed in high-stakes domains such as clinical decision support and medical documentation. However, the robustness of these models against subtle linguistic variations, specifically… 19 arXiv — NLP / Computation & Language research 1mo ago Transitivity Meets Cyclicity: Explicit Preference Decomposition for Dynamic Large Language Model Alignment arXiv:2605.17342v1 Announce Type: new Abstract: Standard RLHF relies on transitive scalar rewards, failing to capture the cyclic nature of human preferences. While some approaches like the General Preference Model (GPM) address this, we identify a theoretical limitation: their… 11 arXiv — NLP / Computation & Language research 1mo ago Bridging the Version Gap: Multi-version Training Improves ICD Code Prediction, Especially for Rare Codes arXiv:2605.17755v1 Announce Type: new Abstract: Clinical coding maps clinical documentation to standardized medical codes, an essential yet time-consuming administrative task that could benefit from automation. Current models on ICD coding are typically optimized for codes from… 4 arXiv — NLP / Computation & Language research 1mo ago Systematic Evaluation of the Quality of Synthetic Clinical Notes Rephrased by LLMs at Million-Note Scale arXiv:2605.17775v1 Announce Type: new Abstract: Large language models (LLMs) can generate or synthesize clinical text for a wide range of applications, from improving clinical documentation to augmenting clinical text analytics. Yet evaluations typically focus on a narrow aspect… 8 r/LocalLLaMA community 1mo ago favorite Agentic Coding Harness So far, I’ve tried Codex CLI, Claude Code, Gemini CLI, OpenCode, and recently, Pi with local models. Pi is the leanest of them all, with just four tools: read, write, edit, and bash. Its system prompt is only under 2K tokens, and it's perfect for local models. I've been trying… 29 Hacker News — AI on Front Page community 1mo ago Pope Leo XIV’s first encyclical Magnifica humanitas to be published May 25 Article URL: https://www.vaticannews.va/en/pope/news/2026-05/pope-leo-xiv-first-encyclical-magnifica-humanitas.html Comments URL: https://news.ycombinator.com/item?id=48187201 Points: 255 # Comments: 176 17 Hacker News — AI on Front Page community 1mo ago Click (2016) Article URL: https://clickclickclick.click/ Comments URL: https://news.ycombinator.com/item?id=48187054 Points: 237 # Comments: 57 35 llama.cpp releases dev-tools 1mo ago b9216 ui: Refactor models store, MCP service, and gate logs behind VITE_DEBUG ( #23236 ) refactor: Scope console logs to DEV + VITE_DEBUG env vars refactor: skip MCP proxy probe when no server requires it refactor: suppress expected disconnect errors during MCP client shutdown… 33 GitHub Blog — AI & ML official-blog 1mo ago Take your local GitHub sessions anywhere Kick off work in VS Code or the CLI, finish it from your phone. Remote control for GitHub Copilot sessions is now generally available on github.com and GitHub Mobile. The post Take your local GitHub sessions anywhere appeared first on The GitHub Blog . 32 Hugging Face Daily Papers research 1mo ago Sparse Autoencoders enable Robust and Interpretable Fine-tuning of CLIP models Abstract SAE-FT enables robust fine-tuning of vision-language models by regularizing visual representations through sparse autoencoder constraints, maintaining performance while improving robustness against distribution shifts. AI-generated summary Large-scale pre-trained… 34 arXiv — Machine Learning research 1mo ago MuteBench: Modality Unavailability Tolerance Evaluation for Incomplete Multimodal Fusion arXiv:2605.15235v1 Announce Type: new Abstract: Multimodal physiological data powers clinical AI systems from intensive care units to wearable devices, but sensors routinely fail in practice. Two failure modes are common: modality missing, where an entire channel is absent, and… 15 arXiv — Machine Learning research 1mo ago Logical Grammar Induction via Graph Kolmogorov Complexity: A Neuro-Symbolic Framework for Self-Healing Clinical Data Integrity arXiv:2605.15242v1 Announce Type: new Abstract: The reliability of Healthcare Information Systems (HIS) is frequently compromised by human-induced data entry errors, which existing statistical anomaly detection methods fail to distinguish from legitimate clinical extremes. This… 34 arXiv — Machine Learning research 1mo ago PACER: Acyclic Causal Discovery from Large-Scale Interventional Data arXiv:2605.15353v1 Announce Type: new Abstract: Inferring the structure of directed acyclic graphs (DAGs) from data is a central challenge in causal discovery, particularly in modern high-dimensional settings where large-scale interventional data are increasingly available.… 10 arXiv — Machine Learning research 1mo ago GOMA: Toward Structure-Driven Multimodal Alignment from a Graph Signal Smoothing Perspective arXiv:2605.15723v1 Announce Type: new Abstract: Multimodal alignment is commonly learned from isolated image-text pairs via CLIP-style dual encoders, leaving the relational context among entities largely unused. Multimodal attributed graphs (MAGs), where nodes carry multimodal… 37 arXiv — NLP / Computation & Language research 1mo ago Retrieval-Augmented Large Language Models for Schema-Constrained Clinical Information Extraction arXiv:2605.15467v1 Announce Type: new Abstract: Conversational nurse-patient transcripts contain actionable observations, but converting these transcripts into structured representations at scale remains challenging. Documentation burden is substantial, with prior studies… 31 arXiv — NLP / Computation & Language research 1mo ago MHGraphBench: Knowledge Graph-Grounded Benchmarking of Mental Health Knowledge in Large Language Models arXiv:2605.15589v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used in the mental health domain, yet it remains unclear how well they capture related biomedical knowledge and how reliably they apply it to clinically salient structured judgments.… 23 arXiv — NLP / Computation & Language research 1mo ago Few-Shot Large Language Models for Actionable Triage Categorization of Online Patient Inquiries arXiv:2605.15680v1 Announce Type: new Abstract: Online patient inquiries are often informal, incomplete, and written before professional assessment, yet they must still be routed to an appropriate level of clinical follow-up. We study this as a four-class actionable triage task… 18 arXiv — NLP / Computation & Language research 1mo ago Can Large Language Models Imitate Human Speech for Clinical Assessment? LLM-Driven Data Augmentation for Cognitive Score Prediction arXiv:2605.16077v1 Announce Type: new Abstract: Accurate assessment of cognitive decline from spontaneous speech remains challenging due to limited dataset size and class imbalance. In this work, we propose a large language model (LLM)-driven data augmentation framework to… 38 arXiv — NLP / Computation & Language research 1mo ago Fully Open Meditron: An Auditable Pipeline for Clinical LLMs arXiv:2605.16215v1 Announce Type: cross Abstract: Clinical decision support systems (CDSS) require scrutable, auditable pipelines that enable rigorous, reproducible validation. Yet current LLM-based CDSS remain largely opaque. Most "open" models are open-weight only, releasing… 9 arXiv — NLP / Computation & Language research 1mo ago When Importance Sampling Misallocates Credit: Asymmetric Ratios for Outcome-Supervised RL arXiv:2510.06062v2 Announce Type: replace Abstract: Reinforcement learning (RL) has shown great promise in large language models (LLMs) post-training, which typically rely on token-level clipping to maintain stability during optimization. Despite the empirical success of… 29 r/LocalLLaMA community 1mo ago Made a simple template manager and GUI for llama.cpp so I don't have to keep memorizing CLI flags. Introducing Hexllama Hey, I’ve always found llama-server to be more than enough for testing out local models, mostly because it guarantees you always have the absolute latest llama.cpp features and architecture support. But keeping track of different CLI commands, context sizes,… 19 llama.cpp releases dev-tools 1mo ago b9193 server : honor --embd-normalize CLI arg ( #23125 ) The --embd-normalize flag was registered only for the embedding and debug examples, so llama-server rejected it and the /embedding handler used a hard-coded default of 2 (L2). Add LLAMA_EXAMPLE_SERVER to the flag's example set… 7 Hacker News — AI on Front Page community 1mo ago Fecal transplants for autism deliver success in clinical trials Article URL: https://refractor.io/adhd-autism/fecal-transplants-for-autism-delivers-success-in-clinical-trials/ Comments URL: https://news.ycombinator.com/item?id=48158494 Points: 213 # Comments: 157 16 r/LocalLLaMA community 1mo ago Qwen3.6-35B-A3B and 9B are officially on the public Terminal-Bench 2.0 leaderboard! Qwen3.6-35B-A3B and 9B are officially on the public Terminal-Bench 2.0 leaderboard! little-coder × Qwen3.6-35B-A3B hit 24.6% (±3.2), and now land above Gemini 2.5 Pro on Gemini CLI (19.6%) and Qwen3-Coder-480B on Terminus 2 (23.9%). I didn’t expect the scaffold-model gap from… 13 OpenAI Python SDK releases dev-tools 1mo ago v2.37.0 2.37.0 (2026-05-13) Full Changelog: v2.36.0...v2.37.0 Features api: add service_tier parameter to responses compact method ( 625827c ) internal/types: support eagerly validating pydantic iterators ( 7e527bc ) Remove unnecessary client_id when using workload identity provider for… 15 TechCrunch — AI news-outlet 1mo ago The OpenAI trial wraps up, and the Musk founder machine keeps spinning The Musk v. Altman trial came to a close this week, and the final arguments kept circling back to one question: can we trust the people in charge of AI? All of this is playing out as SpaceX charges toward what could be one of the largest IPOs in American history,… 30 r/LocalLLaMA community 1mo ago I built a self-hosted open-source MCP server that gives any local LLM real financial data — SEC filings, 13F, insider & congressional trades, short data, FRED One thing missing when running local models as agents: real, current data. So I built Equibles — a self-hosted MCP server that scrapes and serves public U.S. financial data and exposes it as MCP tools, so any MCP-capable client (Claude Code/Desktop, Cursor, or your own… 30 r/MachineLearning community 1mo ago [D] Position paper: using hallucination as a construction instrument to distill task-specific cognitive kernels from frontier models [D] Background: I am a software developer, not an ML researcher. This started from a practical question — why do AI coding tools send proprietary client code to remote servers when the task only requires Swift? Following that question produced this framework. The core proposal… 8 Hacker News — AI on Front Page community 1mo ago A 0-click exploit chain for the Pixel 10 Article URL: https://projectzero.google/2026/05/pixel-10-exploit.html Comments URL: https://news.ycombinator.com/item?id=48148460 Points: 203 # Comments: 85 36 llama.cpp releases dev-tools 1mo ago b9161 Support for Codex CLI by skipping unsupported Responses tools ( #23041 ) Support for Codex CLI by skipping unsupported Responses tools Warn on skipped Responses tools and preserve gpt-oss apply_patch rejection Revert gpt-oss apply_patch special handling macOS/iOS: macOS Apple… 29 arXiv — Machine Learning research 1mo ago Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders arXiv:2605.13930v1 Announce Type: new Abstract: EEG foundation models achieve state-of-the-art clinical performance, yet the internal computations driving their predictions remain opaque: a barrier to clinical trust. We apply TopK Sparse Autoencoders (SAEs) across three… 9 arXiv — Machine Learning research 1mo ago Reinforcement Learning for Tool-Calling Agents in Fast Healthcare Interoperability Resources (FHIR) arXiv:2605.14126v1 Announce Type: new Abstract: Fast Healthcare Interoperability Resources (FHIR) is the dominant standard for interoperable exchange of healthcare data. In FHIR, electronic health records form a directed graph of resources. Answering clinically meaningful… 36 arXiv — Machine Learning research 1mo ago DT-Transformer: A Foundation Model for Disease Trajectory Prediction on a Real-world Health System arXiv:2605.14227v1 Announce Type: new Abstract: Accurate disease trajectory prediction is critical for early intervention, resource allocation, and improving long-term outcomes. While electronic health records (EHRs) provide a rich longitudinal view of patient health in clinical… 14 arXiv — Machine Learning research 1mo ago MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification arXiv:2605.14289v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) models scale capacity by combining specialized experts, but most existing approaches assume centralized access to training data. In practice, data are distributed across clients and cannot be shared due to… 36 arXiv — Machine Learning research 1mo ago AIM-DDI: A Model-Agnostic Multimodal Integration Module for Drug-Drug Interaction Prediction arXiv:2605.14327v1 Announce Type: new Abstract: Drug-drug interaction (DDI) prediction is a critical task in computational biomedicine, as adverse interactions between co-administered drugs can cause severe side effects and clinical risks. A key challenge is unseen-drug… 5 arXiv — Machine Learning research 1mo ago RxEval: A Prescription-Level Benchmark for Evaluating LLM Medication Recommendation arXiv:2605.14543v1 Announce Type: new Abstract: Inpatient medication recommendation requires clinicians to repeatedly select specific medications, doses, and routes as a patient's condition evolves. Existing benchmarks formulate this task as admission-level prediction over… 25 arXiv — NLP / Computation & Language research 1mo ago Mitigating Data Scarcity in Psychological Defense Classification with Context-Aware Synthetic Augmentation arXiv:2605.14380v1 Announce Type: new Abstract: Psychological defense mechanisms (PDMs) are unconscious cognitive processes that modulate how individuals perceive and respond to emotional distress. Automatically classifying PDMs from text is clinically valuable but severely… 14 arXiv — NLP / Computation & Language research 1mo ago COTCAgent: Preventive Consultation via Probabilistic Chain-of-Thought Completion arXiv:2605.15016v1 Announce Type: new Abstract: As large language models empower healthcare, intelligent clinical decision support has developed rapidly. Longitudinal electronic health records (EHR) provide essential temporal evidence for accurate clinical diagnosis and… 25 arXiv — NLP / Computation & Language research 1mo ago Text Knows What, Tables Know When: Clinical Timeline Reconstruction via Retrieval-Augmented Multimodal Alignment arXiv:2605.15168v1 Announce Type: new Abstract: Reconstructing precise clinical timelines is essential for modeling patient trajectories and forecasting risk in complex, heterogeneous conditions like sepsis. While unstructured clinical narratives offer semantically rich and… 24 arXiv — NLP / Computation & Language research 1mo ago A Benchmark for Early-stage Parkinson's Disease Detection from Speech arXiv:2605.14066v1 Announce Type: cross Abstract: Early-stage Parkinson's disease (EarlyPD) detection from speech is clinically meaningful yet underexplored, and published results are hard to compare because studies differ in datasets, languages, tasks, evaluation protocols, and… 21 Hugging Face Daily Papers research 1mo ago WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation Abstract WildClawBench evaluates language and vision-language models on realistic long-horizon tasks using actual CLI environments with real tools instead of synthetic sandboxes. AI-generated summary Large language and vision-language models increasingly power agents that act on… 8 Vercel — AI dev-tools 1mo ago Use native curl syntax with Vercel CLI You can now use native curl syntax with the Vercel CLI. The vercel curl command accepts full URLs, bare hostnames, and the --url flag, and uses your Vercel auth to bypass Deployment Protection . If you've linked a project, you can also pass just a path: Update to the latest… 37 Vercel — AI dev-tools 1mo ago Trace any Vercel request from the CLI You can now generate Session Traces through the Vercel CLI. Use the new vercel curl --trace command to generate an OpenTelemetry trace to the specified endpoint from the terminal. Use the new vercel traces get command to fetch the generated trace by request ID. Available on all… 38 Vercel — AI dev-tools 1mo ago Introducing Vercel Drop Vercel Drop lets you deploy a file or folder by dragging it into your browser. You don't need Git, the Vercel CLI , or any local setup. Drop a project onto vercel.com/drop , pick a team and project name, and select Deploy . Vercel will create a new project, upload your files,… 22 Hugging Face Daily Papers research 1mo ago MC-RFM: Geometry-Aware Few-Shot Adaptation via Mixed-Curvature Riemannian Flow Matching Abstract A novel Riemannian flow-matching framework for few-shot adaptation that models feature displacement on a mixed-curvature manifold combining hyperbolic and Euclidean spaces, outperforming existing methods across multiple benchmarks. AI-generated summary… 22 Latent.Space news-outlet 1mo ago AI-Native Healthcare: 100M Doctor Visits, 10–20 Hours Saved, Prior Auth in Minutes — Janie Lee & Chai Asawa, Abridge How Abridge is quietly turning the patient and clinician conversation into the operating system of healthcare 10 r/LocalLLaMA community 1mo ago A VERY lightweight open web-search tool for smaller local LLMs Hey everyone, Been playing around with local agent setups lately, mostly Cline/Roo with smaller models, and web search kept annoying me. Not because it doesn’t work, but because it usually throws way too much random page text into the context. small models really don’t handle… 29 Hugging Face Daily Papers research 1mo ago RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation Abstract RealICU benchmark evaluates large language models for ICU decision support using hindsight-annotated patient trajectories, revealing limitations in clinical recommendation accuracy and early interpretation bias. AI-generated summary Intensive care units (ICU) generate… 32 r/LocalLLaMA community 1mo ago Computer-use MCP that can control multiple machines (Integrate with claude, Cursor, Codex or your custom harness) Hey everyone, We built opendesk: it lets AI agents control your desktop using computer use MCP that can integrate with your custom workflow. Today we shipped something a bit wild: Your AI can now see, click, type, and navigate on a completely different computer, over your WiFi.… 20 Page 10 of 10 · 500 articles ← Newer