News / #developer-tool Tag Developer Tool 500 articles archived under #developer-tool · RSS Sign in to follow arXiv — Machine Learning research 12d ago A Cross-Model VLM-Judge Protocol for Single-Image 3D Mesh Quality (and Why Cheap Proxies Fall Short) arXiv:2606.18451v1 Announce Type: new Abstract: Single-image-to-3D generators are improving quickly, but there is no agreed, human-free way to tell whether one generated mesh is better than another. Practitioners commonly rely on cheap automatic proxies (render-space CLIP… 32 arXiv — Machine Learning research 12d ago Beyond AHI: An Interpretable Causal-Discovery-Guided Framework for Sleep Recovery in Connected Health arXiv:2606.18506v1 Announce Type: new Abstract: Objective sleep assessment relies on polysomnography (PSG), yet clinical impact is often better reflected in patient-reported outcomes (PROs) such as sleepiness and fatigue. Existing summary indices, including the Apnea-Hypopnea… 34 arXiv — Machine Learning research 12d ago PSyGenTAB: A Privacy-Preserving Framework for Synthetic Clinical Tabular Data Generation via Constrained Optimization arXiv:2606.18518v1 Announce Type: new Abstract: The development of medical AI is constrained by limited access to high-quality clinical data due to institutional silos and strict privacy regulations such as HIPAA and GDPR. Synthetic data generation offers a potential solution,… 4 arXiv — NLP / Computation & Language research 12d ago Fair Cognitive Impairment Detection Through Unlearning arXiv:2606.18571v1 Announce Type: cross Abstract: Mild Cognitive Impairment (MCI) is a medical condition characterized by a noticeable decline in memory, language, or thinking abilities. MCI detection from spontaneous speech is promising for scalable screening. However, learned… 33 arXiv — Machine Learning research 12d ago ChronoSurv: A Clinical Pathway-Guided Graph Framework for Multimodal Survival Analysis arXiv:2606.19140v1 Announce Type: new Abstract: Accurate survival prediction is essential for personalized treatment planning in head and neck cancer, yet remains challenging due to the heterogeneous and high-dimensional nature of multimodal clinical data. While deep survival… 32 arXiv — NLP / Computation & Language research 12d ago Possible or Definite? A Benchmark for Evaluating Diagnostic Uncertainty Preservation in Clinical Text arXiv:2606.18471v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used for clinical text tasks such as summarization and revision. While most studies evaluate the fluency and coherence of LLM-generated text, whether LLMs correctly preserve diagnostic… 11 arXiv — NLP / Computation & Language research 12d ago Are LLMs Ready to Assist Physicians? PhysAssistBench for Interactive Doctor-Patient-EHR Assistance arXiv:2606.18613v1 Announce Type: new Abstract: The most plausible near-term role of medical LLMs is to assist rather than replace physicians, yet current evaluations often test isolated capabilities: clinical knowledge, EHR system interaction, or patient communication.… 7 arXiv — NLP / Computation & Language research 12d ago Beyond Scalar Scores: Exploring LLM-based Metrics for Clinical Significance Evaluation in Radiology Reports arXiv:2606.18797v1 Announce Type: new Abstract: Reliable evaluation of generated radiology reports requires strict clinical accuracy, as omitted critical findings or mischaracterized radiographic observations can directly affect patient care. Existing metrics obscure this… 23 arXiv — NLP / Computation & Language research 12d ago Language Models as Interfaces, Not Oracles: A Hybrid LLM-ML System for Pediatric Appendicitis arXiv:2606.19183v1 Announce Type: new Abstract: Large language models (LLMs) can make clinical decision support more accessible by interpreting free-text documentation, but their direct use as diagnostic engines is limited by sensitivity to prompts, information order, and… 33 llama.cpp releases dev-tools 12d ago b9688 server: (router) add model management API ( #23976 ) wip server: (router) add SSE realtime updates API nits wip add download API add download api update docs add delete endpoint fix std::terminate fix crash fix 2 add tests nits macOS/iOS: macOS Apple Silicon (arm64) macOS Apple… 17 arXiv — Machine Learning research 13d ago Informative Missingness to Generate Irregular Clinical Time Series arXiv:2606.17106v1 Announce Type: new Abstract: Laboratory tests in electronic health records are collected irregularly, and the absence of a test order can be as informative as the measurement itself. Such missingness reflects clinicians' decisions and patient physiology,… 8 arXiv — Machine Learning research 13d ago SpatioTemporal Causal Network Diagnostics for Geographic Tipping Point Early Warning arXiv:2606.17553v1 Announce Type: new Abstract: Geographic tipping points in ecosystems, climate subsystems, or ice sheets pose severe challenges for localized early warning. Classical spatial indicators such as Moran's I summarize global spatial structure, but they struggle… 25 arXiv — Machine Learning research 13d ago Multiple cyclicity and Wavelet Decomposition with Channel Correlation for Long-term Time Series Forecasting arXiv:2606.17996v1 Announce Type: new Abstract: Cyclicity and trend are important components of time series data and many studies based on cyclicity and trend have achieved good results in long-term time series forecasting. However, we believe that current work neglects the… 37 arXiv — Machine Learning research 13d ago RadSEM: A Finding-by-Finding Metric for Clinical Consistency in Radiology Reports arXiv:2606.17062v1 Announce Type: cross Abstract: Radiology report evaluation must distinguish clinical compatibility from surface similarity, because negation, laterality, or normal-abnormal polarity can reverse a finding. We propose RadSEM (Radiology Sentence-Level Evaluation… 11 arXiv — Machine Learning research 13d ago KFTD: Koopman-Fourier Time-Differentiable Network for Continuous Ocean Spatiotemporal Forecasting arXiv:2606.17070v1 Announce Type: cross Abstract: Accurate oceanic forecasting is critical for climate monitoring and disaster early warning. However, ocean spatiotemporal forecasting encounters the double challenges of modeling complex dynamical systems and ensuring… 10 arXiv — NLP / Computation & Language research 13d ago AIPatient Arena: EHR-grounded evaluation of large language models in end-to-end clinical consultation workflows arXiv:2606.17474v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly considered for use in clinical consultation tasks, yet most medical evaluations remain static, single-turn, or narrowly outcome-based, limiting their ability to reflect the sequential,… 17 arXiv — NLP / Computation & Language research 13d ago The Slop Paradox: How Synthetic Standardization Erodes Clinical Uncertainty and Cross-Modal Alignment in AI-Rewritten Radiology Reports arXiv:2606.17791v1 Announce Type: new Abstract: AI-assisted clinical documentation tools increasingly summarize, standardize, and reformat radiology reports using large language models (LLMs). We present a controlled measurement of the resulting information degradation. Using… 24 arXiv — NLP / Computation & Language research 13d ago When Multiple Scripts Matter: Evaluating ASR in Clinical Settings arXiv:2606.17826v1 Announce Type: new Abstract: Automatic speech recognition (ASR) in non-English clinical settings is challenged by multiscript variability, where the same term may appear in multiple valid orthographic forms. Conventional string-matching evaluation metrics… 20 arXiv — NLP / Computation & Language research 13d ago RubricsTree: Scalable and Evolving Open-Ended Evaluation of Personal Health Agents across Health Memory and Medical Skills arXiv:2606.18203v1 Announce Type: new Abstract: The LLM-empowered personal health agents with user health (sensor) metrics have offered a promising pathway to alleviate global disparities in healthcare access. However, large-scale clinical deployment remains constrained by an… 28 arXiv — NLP / Computation & Language research 13d ago SpeechDx: A Multi-Task Benchmark for Clinical Speech AI arXiv:2606.17339v1 Announce Type: cross Abstract: Speech offers a uniquely informative window into health by simultaneously engaging neurological, motor, respiratory, and vocal systems. Current clinical speech AI methods have largely progressed through isolated… 15 arXiv — NLP / Computation & Language research 13d ago Beyond Native Success: Auditing Deployment-Interface Exposure of CLIP Backdoors arXiv:2606.17815v1 Announce Type: cross Abstract: Contrastive Language-Image Pre-training models are widely reused across downstream interfaces, including feature extraction, retrieval, reranking, and selection. Existing CLIP backdoor, however, usually validate attacks on a… 11 arXiv — NLP / Computation & Language research 13d ago Reading between the Lines: Leveraging Large Language Models for Global Dementia and Depression Assessment from Clinical Interviews arXiv:2606.18019v1 Announce Type: cross Abstract: Dementia and depression are the most prevalent neuropsychiatric disorders in geriatric populations, and their overlapping symptoms pose major challenges for differential diagnosis. In this study, we investigate open-weights Large… 30 arXiv — NLP / Computation & Language research 13d ago ProvenanceGuard: Source-Aware Factuality Verification for MCP-Based LLM Agents arXiv:2606.18037v1 Announce Type: cross Abstract: Tool-using LLM agents increasingly use the Model Context Protocol (MCP) to answer from heterogeneous evidence sources, including search, APIs, databases, clinical records, and formulary tools. Standard factuality metrics usually… 27 arXiv — NLP / Computation & Language research 13d ago MedicalAgentsBench for Complex Medical Reasoning: Comparing Internalized Reasoning Models versus Externalized Agent-based Frameworks arXiv:2503.07459v3 Announce Type: replace Abstract: Complex medical reasoning requires integrating heterogeneous clinical evidence across multiple inference steps. Large language models (LLMs) now approach this through two routes: internalized reasoning and externalized agent… 19 Simon Willison community 13d ago <click-to-play> — a still that plays Tool: <click-to-play> — a still that plays A progressive enchantment Web Component that turns this markup: <click-to-play> <a href="URL to GIF"> <img src="URL to first frame" alt="..."> </a> </click-to-play> Into a still frame with a click to play button which loads the GIF on… 34 Hugging Face Daily Papers research 13d ago TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs Abstract A framework called TRIAGE is proposed to improve clinical early warning systems by training large language models to generate dialectical reasoning for continuous risk scoring with better calibration and interpretability. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 29 Vercel — AI dev-tools 13d ago CLI deployment limits removed We've removed CLI-specific deployment limits, making it easier to deploy from local machine and external CI/CD pipelines with instant feedback. Teams and AI agents can now deploy at the pace their workflows demand. Learn more about limits in the Documentation . Read more 5 Ars Technica — AI news-outlet 13d ago Anthropic "pauses" token-based billing for its Claude Agent SDK Move originally planned for Monday would have heavily increased power users' costs. 21 r/LocalLLaMA community 13d ago GLM-5.2 is the first open-weights model to cross 80% on Terminal-Bench and beats every other open model available From Source: GLM-5.2 is the first open-weights model to cross 80% on Terminal-Bench, and beats every other open model available. It also beats Gemini, making it a frontier-level model for a fraction of the cost. Open weights is back. This model is a game changer. Source: Cline… 14 NVIDIA Developer Blog official-blog 13d ago Build On-Device AI Companions with the NVIDIA ACE Game Agent SDK and Unreal Engine 5 Plugins NVIDIA RTX technologies are deeply integrated into Unreal Engine 5 through the NVIDIA RTX Branch of Unreal Engine and the NVIDIA DLSS Unreal Engine plugin. This... 23 MIT Technology Review — AI news-outlet 13d ago Want to get a data center online quickly? Give it some flex. At the end of a tense and scoreless first half of a soccer match between the English men’s team and rival Germany, millions of Brits let out a collective sigh and did what they so often do in moments of stress: They made tea. That wave of electric kettles clicking on, however,… 26 Hugging Face Daily Papers research 14d ago PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions Abstract PhoneHarness presents a mixed-action benchmark and execution framework for evaluating phone-use agents on verifiable mobile workflows, demonstrating superior performance over existing approaches through deterministic action routing and auditable execution traces.… 13 arXiv — Machine Learning research 14d ago Semantic Reasoning in Medicine: The Role of Knowledge Graphs Across Five Key Domains arXiv:2606.15155v1 Announce Type: new Abstract: Knowledge graphs (KGs) have emerged as a promising solution for integrating and reasoning over complex biomedical and clinical data in healthcare. By representing structured relationships among entities such as diseases, drugs,… 17 arXiv — Machine Learning research 14d ago RECTOR: Masked Region-Channel-Temporal Modeling for Affective and Cognitive Representation Learning arXiv:2606.15278v1 Announce Type: new Abstract: Affective and cognitive disorders manifest as distributed, time-varying brain network dynamics across regions, channels, and time, challenging robust representation learning from EEG/sEEG for clinical diagnosis. We propose RECTOR… 34 arXiv — Machine Learning research 14d ago Beyond Classification: A Cough Regression Benchmark for Respiratory Acoustic Foundation Models arXiv:2606.15436v1 Announce Type: new Abstract: Respiratory acoustic foundation models (FMs) excel at cough classification, yet their ability to predict continuous health quantities from cough audio remains largely unexplored, despite the clinical value of passive age, BMI, and… 28 arXiv — Machine Learning research 14d ago Z-Plane Neural Networks: Bounded Geometric Activation Replaces ReLU and LayerNorm arXiv:2606.15669v1 Announce Type: new Abstract: Modern deep neural networks rely on Euclidean scalar activations (e.g., ReLU) and global normalization techniques (e.g., LayerNorm) to prevent gradient instability in deep architectures. However, these mechanisms inherently cause… 23 arXiv — Machine Learning research 14d ago When Generator Replay Degrades: Projected Rehearsal Orchestration for Heterogeneous Federated Class-Incremental Learning arXiv:2606.15695v1 Announce Type: new Abstract: Federated class-incremental learning (FCIL) becomes substantially harder when clients observe different label subsets, progress through tasks at different stages, and provide uneven supervision for the same semantic concepts.… 26 arXiv — NLP / Computation & Language research 14d ago PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions arXiv:2606.14832v1 Announce Type: new Abstract: Phone agents are increasingly expected to complete real mobile workflows rather than merely predict the next screen action. However, much of the current mobile-agent literature still evaluates agents primarily as GUI controllers… 36 arXiv — NLP / Computation & Language research 14d ago ReportQA: QA-Based Radiology Report Evaluation arXiv:2606.15037v1 Announce Type: new Abstract: Radiology report evaluation is essential for advancing automated report generation. Natural language generation metrics have limited clinical relevance. Clinical efficacy (CE) metrics evaluate important medical findings, but focus… 38 arXiv — NLP / Computation & Language research 14d ago EHRNote-ChatQA: A Benchmark for Evidence-Grounded Multi-Turn Clinical Question Answering over Longitudinal Discharge Summaries arXiv:2606.15735v1 Announce Type: new Abstract: Discharge summaries are crucial clinical documents containing the context of a patient's overall hospital stay, and are routinely reviewed by medical experts for patient readmission, ongoing care, and diagnostic decision-making.… 26 arXiv — NLP / Computation & Language research 14d ago Interactor: Agentic RL oriented Iterative Creation for Ad Description Generation in Sponsored Search arXiv:2606.15911v1 Announce Type: new Abstract: This paper focuses on automatically generating informative ad descriptions in sponsored search. Unlike ad titles which are usually optimized to attract user click feedbacks, ad descriptions have a longer text span and possess the… 8 Vercel — AI dev-tools 14d ago Workflow SDK now supports inflight cancellation The Workflow SDK 5 beta now supports the standard AbortController and AbortSignal APIs across workflow and step boundaries. Create a controller inside a workflow, pass its signal into one or more steps, and cancel in-flight operations using the same API fetch already uses. That… 24 Vercel — AI dev-tools 14d ago Workflow SDK now supports TanStack Start Workflow SDK now supports TanStack Start applications on Vercel. TanStack Start is built on Vite and Nitro , so the existing workflow/vite plugin works directly. Add it to vite.config.ts alongside tanstackStart() . From there, write workflow and step functions in standard… 27 Hacker News — AI on Front Page community 14d ago Ten years of ClickHouse in open source Article URL: https://clickhouse.com/blog/open-source-10 Comments URL: https://news.ycombinator.com/item?id=48546890 Points: 225 # Comments: 65 9 GitHub Blog — AI & ML official-blog 14d ago GitHub Copilot CLI for Beginners: Overview of common slash commands GitHub Copilot CLI for Beginners: Learn how to use slash commands to control your terminal AI agent. The post GitHub Copilot CLI for Beginners: Overview of common slash commands appeared first on The GitHub Blog . 26 r/LocalLLaMA community 14d ago Maybe dumb question, but how do you serve multiple users with the full context length? After experimenting with llama.cpp, I'm wondering a thing. Let's say we have an LLM with a context size of 128k. Now let's say we want have up to 8 parallel users, and we want to provide each client with the full context capabilities. With llama.cpp, how does that work? AFAIK it… 20 Anthropic SDK (Python) releases dev-tools 14d ago v0.109.2 0.109.2 (2026-06-15) Full Changelog: v0.109.1...v0.109.2 Chores api: remove retired models from API and SDKs ( d4bcfcc ) 8 Hacker News — AI on Front Page community 15d ago Apple Foundation Models Article URL: https://platform.claude.com/docs/en/cli-sdks-libraries/libraries/apple-foundation-models Comments URL: https://news.ycombinator.com/item?id=48536776 Points: 305 # Comments: 133 29 arXiv — Machine Learning research 15d ago FedSPC: Shared Parameter Correction for Personalized Federated Learning arXiv:2606.13748v1 Announce Type: new Abstract: Personalized federated learning (PFL) is one of the important approaches in federated learning for addressing statistical heterogeneity while enabling client-specific adaptation. Many PFL methods split the model into shared and… 28 arXiv — Machine Learning research 15d ago Attention-Based Estimation of the Individual Treatment Benefit Probability under Dose Variation arXiv:2606.13821v1 Announce Type: new Abstract: Estimating the probability that a treatment outperforms a control for an individual patient, called the Individual Probability of Treatment Benefit (IPTB), offers a clinically intuitive alternative to population-average metrics.… 36 Page 3 of 10 · 500 articles ← Newer Older →