Tag

Developer Tool

500 articles archived under #developer-tool · RSS

arXiv — Machine Learning research 12d ago

A Cross-Model VLM-Judge Protocol for Single-Image 3D Mesh Quality (and Why Cheap Proxies Fall Short)

arXiv:2606.18451v1 Announce Type: new Abstract: Single-image-to-3D generators are improving quickly, but there is no agreed, human-free way to tell whether one generated mesh is better than another. Practitioners commonly rely on cheap automatic proxies (render-space CLIP…

32
arXiv — Machine Learning research 12d ago

Beyond AHI: An Interpretable Causal-Discovery-Guided Framework for Sleep Recovery in Connected Health

arXiv:2606.18506v1 Announce Type: new Abstract: Objective sleep assessment relies on polysomnography (PSG), yet clinical impact is often better reflected in patient-reported outcomes (PROs) such as sleepiness and fatigue. Existing summary indices, including the Apnea-Hypopnea…

34
arXiv — Machine Learning research 12d ago

PSyGenTAB: A Privacy-Preserving Framework for Synthetic Clinical Tabular Data Generation via Constrained Optimization

arXiv:2606.18518v1 Announce Type: new Abstract: The development of medical AI is constrained by limited access to high-quality clinical data due to institutional silos and strict privacy regulations such as HIPAA and GDPR. Synthetic data generation offers a potential solution,…

4
arXiv — NLP / Computation & Language research 12d ago

Fair Cognitive Impairment Detection Through Unlearning

arXiv:2606.18571v1 Announce Type: cross Abstract: Mild Cognitive Impairment (MCI) is a medical condition characterized by a noticeable decline in memory, language, or thinking abilities. MCI detection from spontaneous speech is promising for scalable screening. However, learned…

33
arXiv — Machine Learning research 12d ago

ChronoSurv: A Clinical Pathway-Guided Graph Framework for Multimodal Survival Analysis

arXiv:2606.19140v1 Announce Type: new Abstract: Accurate survival prediction is essential for personalized treatment planning in head and neck cancer, yet remains challenging due to the heterogeneous and high-dimensional nature of multimodal clinical data. While deep survival…

32
arXiv — NLP / Computation & Language research 12d ago

Possible or Definite? A Benchmark for Evaluating Diagnostic Uncertainty Preservation in Clinical Text

arXiv:2606.18471v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used for clinical text tasks such as summarization and revision. While most studies evaluate the fluency and coherence of LLM-generated text, whether LLMs correctly preserve diagnostic…

11
arXiv — NLP / Computation & Language research 12d ago

Are LLMs Ready to Assist Physicians? PhysAssistBench for Interactive Doctor-Patient-EHR Assistance

arXiv:2606.18613v1 Announce Type: new Abstract: The most plausible near-term role of medical LLMs is to assist rather than replace physicians, yet current evaluations often test isolated capabilities: clinical knowledge, EHR system interaction, or patient communication.…

7
arXiv — NLP / Computation & Language research 12d ago

Beyond Scalar Scores: Exploring LLM-based Metrics for Clinical Significance Evaluation in Radiology Reports

arXiv:2606.18797v1 Announce Type: new Abstract: Reliable evaluation of generated radiology reports requires strict clinical accuracy, as omitted critical findings or mischaracterized radiographic observations can directly affect patient care. Existing metrics obscure this…

23
arXiv — NLP / Computation & Language research 12d ago

Language Models as Interfaces, Not Oracles: A Hybrid LLM-ML System for Pediatric Appendicitis

arXiv:2606.19183v1 Announce Type: new Abstract: Large language models (LLMs) can make clinical decision support more accessible by interpreting free-text documentation, but their direct use as diagnostic engines is limited by sensitivity to prompts, information order, and…

33
llama.cpp releases dev-tools 12d ago

b9688

server: (router) add model management API ( #23976 ) wip server: (router) add SSE realtime updates API nits wip add download API add download api update docs add delete endpoint fix std::terminate fix crash fix 2 add tests nits macOS/iOS: macOS Apple Silicon (arm64) macOS Apple…

17
arXiv — Machine Learning research 13d ago

Informative Missingness to Generate Irregular Clinical Time Series

arXiv:2606.17106v1 Announce Type: new Abstract: Laboratory tests in electronic health records are collected irregularly, and the absence of a test order can be as informative as the measurement itself. Such missingness reflects clinicians' decisions and patient physiology,…

8
arXiv — Machine Learning research 13d ago

SpatioTemporal Causal Network Diagnostics for Geographic Tipping Point Early Warning

arXiv:2606.17553v1 Announce Type: new Abstract: Geographic tipping points in ecosystems, climate subsystems, or ice sheets pose severe challenges for localized early warning. Classical spatial indicators such as Moran's I summarize global spatial structure, but they struggle…

25
arXiv — Machine Learning research 13d ago

Multiple cyclicity and Wavelet Decomposition with Channel Correlation for Long-term Time Series Forecasting

arXiv:2606.17996v1 Announce Type: new Abstract: Cyclicity and trend are important components of time series data and many studies based on cyclicity and trend have achieved good results in long-term time series forecasting. However, we believe that current work neglects the…

37
arXiv — Machine Learning research 13d ago

RadSEM: A Finding-by-Finding Metric for Clinical Consistency in Radiology Reports

arXiv:2606.17062v1 Announce Type: cross Abstract: Radiology report evaluation must distinguish clinical compatibility from surface similarity, because negation, laterality, or normal-abnormal polarity can reverse a finding. We propose RadSEM (Radiology Sentence-Level Evaluation…

11
arXiv — Machine Learning research 13d ago

KFTD: Koopman-Fourier Time-Differentiable Network for Continuous Ocean Spatiotemporal Forecasting

arXiv:2606.17070v1 Announce Type: cross Abstract: Accurate oceanic forecasting is critical for climate monitoring and disaster early warning. However, ocean spatiotemporal forecasting encounters the double challenges of modeling complex dynamical systems and ensuring…

10
arXiv — NLP / Computation & Language research 13d ago

AIPatient Arena: EHR-grounded evaluation of large language models in end-to-end clinical consultation workflows

arXiv:2606.17474v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly considered for use in clinical consultation tasks, yet most medical evaluations remain static, single-turn, or narrowly outcome-based, limiting their ability to reflect the sequential,…

17
arXiv — NLP / Computation & Language research 13d ago

The Slop Paradox: How Synthetic Standardization Erodes Clinical Uncertainty and Cross-Modal Alignment in AI-Rewritten Radiology Reports

arXiv:2606.17791v1 Announce Type: new Abstract: AI-assisted clinical documentation tools increasingly summarize, standardize, and reformat radiology reports using large language models (LLMs). We present a controlled measurement of the resulting information degradation. Using…

24
arXiv — NLP / Computation & Language research 13d ago

When Multiple Scripts Matter: Evaluating ASR in Clinical Settings

arXiv:2606.17826v1 Announce Type: new Abstract: Automatic speech recognition (ASR) in non-English clinical settings is challenged by multiscript variability, where the same term may appear in multiple valid orthographic forms. Conventional string-matching evaluation metrics…

20
arXiv — NLP / Computation & Language research 13d ago

RubricsTree: Scalable and Evolving Open-Ended Evaluation of Personal Health Agents across Health Memory and Medical Skills

arXiv:2606.18203v1 Announce Type: new Abstract: The LLM-empowered personal health agents with user health (sensor) metrics have offered a promising pathway to alleviate global disparities in healthcare access. However, large-scale clinical deployment remains constrained by an…

28
arXiv — NLP / Computation & Language research 13d ago

SpeechDx: A Multi-Task Benchmark for Clinical Speech AI

arXiv:2606.17339v1 Announce Type: cross Abstract: Speech offers a uniquely informative window into health by simultaneously engaging neurological, motor, respiratory, and vocal systems. Current clinical speech AI methods have largely progressed through isolated…

15
arXiv — NLP / Computation & Language research 13d ago

Beyond Native Success: Auditing Deployment-Interface Exposure of CLIP Backdoors

arXiv:2606.17815v1 Announce Type: cross Abstract: Contrastive Language-Image Pre-training models are widely reused across downstream interfaces, including feature extraction, retrieval, reranking, and selection. Existing CLIP backdoor, however, usually validate attacks on a…

11
arXiv — NLP / Computation & Language research 13d ago

Reading between the Lines: Leveraging Large Language Models for Global Dementia and Depression Assessment from Clinical Interviews

arXiv:2606.18019v1 Announce Type: cross Abstract: Dementia and depression are the most prevalent neuropsychiatric disorders in geriatric populations, and their overlapping symptoms pose major challenges for differential diagnosis. In this study, we investigate open-weights Large…

30
arXiv — NLP / Computation & Language research 13d ago

ProvenanceGuard: Source-Aware Factuality Verification for MCP-Based LLM Agents

arXiv:2606.18037v1 Announce Type: cross Abstract: Tool-using LLM agents increasingly use the Model Context Protocol (MCP) to answer from heterogeneous evidence sources, including search, APIs, databases, clinical records, and formulary tools. Standard factuality metrics usually…

27
arXiv — NLP / Computation & Language research 13d ago

MedicalAgentsBench for Complex Medical Reasoning: Comparing Internalized Reasoning Models versus Externalized Agent-based Frameworks

arXiv:2503.07459v3 Announce Type: replace Abstract: Complex medical reasoning requires integrating heterogeneous clinical evidence across multiple inference steps. Large language models (LLMs) now approach this through two routes: internalized reasoning and externalized agent…

19
Simon Willison community 13d ago

<click-to-play> — a still that plays

Tool: <click-to-play> — a still that plays A progressive enchantment Web Component that turns this markup: <click-to-play> <a href="URL to GIF"> <img src="URL to first frame" alt="..."> </a> </click-to-play> Into a still frame with a click to play button which loads the GIF on…

34
Hugging Face Daily Papers research 13d ago

TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs

Abstract A framework called TRIAGE is proposed to improve clinical early warning systems by training large language models to generate dialectical reasoning for continuous risk scoring with better calibration and interpretability. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

29
Vercel — AI dev-tools 13d ago

CLI deployment limits removed

We've removed CLI-specific deployment limits, making it easier to deploy from local machine and external CI/CD pipelines with instant feedback. Teams and AI agents can now deploy at the pace their workflows demand. Learn more about limits in the Documentation . Read more

5
Ars Technica — AI news-outlet 13d ago

Anthropic "pauses" token-based billing for its Claude Agent SDK

Move originally planned for Monday would have heavily increased power users' costs.

21
r/LocalLLaMA community 13d ago

GLM-5.2 is the first open-weights model to cross 80% on Terminal-Bench and beats every other open model available

From Source: GLM-5.2 is the first open-weights model to cross 80% on Terminal-Bench, and beats every other open model available. It also beats Gemini, making it a frontier-level model for a fraction of the cost. Open weights is back. This model is a game changer. Source: Cline…

14
NVIDIA Developer Blog official-blog 13d ago

Build On-Device AI Companions with the NVIDIA ACE Game Agent SDK and Unreal Engine 5 Plugins

NVIDIA RTX technologies are deeply integrated into Unreal Engine 5 through the NVIDIA RTX Branch of Unreal Engine and the NVIDIA DLSS Unreal Engine plugin. This...

23
MIT Technology Review — AI news-outlet 13d ago

Want to get a data center online quickly? Give it some flex.

At the end of a tense and scoreless first half of a soccer match between the English men’s team and rival Germany, millions of Brits let out a collective sigh and did what they so often do in moments of stress: They made tea. That wave of electric kettles clicking on, however,…

26
Hugging Face Daily Papers research 14d ago

PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions

Abstract PhoneHarness presents a mixed-action benchmark and execution framework for evaluating phone-use agents on verifiable mobile workflows, demonstrating superior performance over existing approaches through deterministic action routing and auditable execution traces.…

13
arXiv — Machine Learning research 14d ago

Semantic Reasoning in Medicine: The Role of Knowledge Graphs Across Five Key Domains

arXiv:2606.15155v1 Announce Type: new Abstract: Knowledge graphs (KGs) have emerged as a promising solution for integrating and reasoning over complex biomedical and clinical data in healthcare. By representing structured relationships among entities such as diseases, drugs,…

17
arXiv — Machine Learning research 14d ago

RECTOR: Masked Region-Channel-Temporal Modeling for Affective and Cognitive Representation Learning

arXiv:2606.15278v1 Announce Type: new Abstract: Affective and cognitive disorders manifest as distributed, time-varying brain network dynamics across regions, channels, and time, challenging robust representation learning from EEG/sEEG for clinical diagnosis. We propose RECTOR…

34
arXiv — Machine Learning research 14d ago

Beyond Classification: A Cough Regression Benchmark for Respiratory Acoustic Foundation Models

arXiv:2606.15436v1 Announce Type: new Abstract: Respiratory acoustic foundation models (FMs) excel at cough classification, yet their ability to predict continuous health quantities from cough audio remains largely unexplored, despite the clinical value of passive age, BMI, and…

28
arXiv — Machine Learning research 14d ago

Z-Plane Neural Networks: Bounded Geometric Activation Replaces ReLU and LayerNorm

arXiv:2606.15669v1 Announce Type: new Abstract: Modern deep neural networks rely on Euclidean scalar activations (e.g., ReLU) and global normalization techniques (e.g., LayerNorm) to prevent gradient instability in deep architectures. However, these mechanisms inherently cause…

23
arXiv — Machine Learning research 14d ago

When Generator Replay Degrades: Projected Rehearsal Orchestration for Heterogeneous Federated Class-Incremental Learning

arXiv:2606.15695v1 Announce Type: new Abstract: Federated class-incremental learning (FCIL) becomes substantially harder when clients observe different label subsets, progress through tasks at different stages, and provide uneven supervision for the same semantic concepts.…

26
arXiv — NLP / Computation & Language research 14d ago

PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions

arXiv:2606.14832v1 Announce Type: new Abstract: Phone agents are increasingly expected to complete real mobile workflows rather than merely predict the next screen action. However, much of the current mobile-agent literature still evaluates agents primarily as GUI controllers…

36
arXiv — NLP / Computation & Language research 14d ago

ReportQA: QA-Based Radiology Report Evaluation

arXiv:2606.15037v1 Announce Type: new Abstract: Radiology report evaluation is essential for advancing automated report generation. Natural language generation metrics have limited clinical relevance. Clinical efficacy (CE) metrics evaluate important medical findings, but focus…

38
arXiv — NLP / Computation & Language research 14d ago

EHRNote-ChatQA: A Benchmark for Evidence-Grounded Multi-Turn Clinical Question Answering over Longitudinal Discharge Summaries

arXiv:2606.15735v1 Announce Type: new Abstract: Discharge summaries are crucial clinical documents containing the context of a patient's overall hospital stay, and are routinely reviewed by medical experts for patient readmission, ongoing care, and diagnostic decision-making.…

26
arXiv — NLP / Computation & Language research 14d ago

Interactor: Agentic RL oriented Iterative Creation for Ad Description Generation in Sponsored Search

arXiv:2606.15911v1 Announce Type: new Abstract: This paper focuses on automatically generating informative ad descriptions in sponsored search. Unlike ad titles which are usually optimized to attract user click feedbacks, ad descriptions have a longer text span and possess the…

8
Vercel — AI dev-tools 14d ago

Workflow SDK now supports inflight cancellation

The Workflow SDK 5 beta now supports the standard AbortController and AbortSignal APIs across workflow and step boundaries. Create a controller inside a workflow, pass its signal into one or more steps, and cancel in-flight operations using the same API fetch already uses. That…

24
Vercel — AI dev-tools 14d ago

Workflow SDK now supports TanStack Start

Workflow SDK now supports TanStack Start applications on Vercel. TanStack Start is built on Vite and Nitro , so the existing workflow/vite plugin works directly. Add it to vite.config.ts alongside tanstackStart() . From there, write workflow and step functions in standard…

27
Hacker News — AI on Front Page community 14d ago

Ten years of ClickHouse in open source

Article URL: https://clickhouse.com/blog/open-source-10 Comments URL: https://news.ycombinator.com/item?id=48546890 Points: 225 # Comments: 65

9
GitHub Blog — AI & ML official-blog 14d ago

GitHub Copilot CLI for Beginners: Overview of common slash commands

GitHub Copilot CLI for Beginners: Learn how to use slash commands to control your terminal AI agent. The post GitHub Copilot CLI for Beginners: Overview of common slash commands appeared first on The GitHub Blog .

26
r/LocalLLaMA community 14d ago

Maybe dumb question, but how do you serve multiple users with the full context length?

After experimenting with llama.cpp, I'm wondering a thing. Let's say we have an LLM with a context size of 128k. Now let's say we want have up to 8 parallel users, and we want to provide each client with the full context capabilities. With llama.cpp, how does that work? AFAIK it…

20
Anthropic SDK (Python) releases dev-tools 14d ago

v0.109.2

0.109.2 (2026-06-15) Full Changelog: v0.109.1...v0.109.2 Chores api: remove retired models from API and SDKs ( d4bcfcc )

8
Hacker News — AI on Front Page community 15d ago

Apple Foundation Models

Article URL: https://platform.claude.com/docs/en/cli-sdks-libraries/libraries/apple-foundation-models Comments URL: https://news.ycombinator.com/item?id=48536776 Points: 305 # Comments: 133

29
arXiv — Machine Learning research 15d ago

FedSPC: Shared Parameter Correction for Personalized Federated Learning

arXiv:2606.13748v1 Announce Type: new Abstract: Personalized federated learning (PFL) is one of the important approaches in federated learning for addressing statistical heterogeneity while enabling client-specific adaptation. Many PFL methods split the model into shared and…

28
arXiv — Machine Learning research 15d ago

Attention-Based Estimation of the Individual Treatment Benefit Probability under Dose Variation

arXiv:2606.13821v1 Announce Type: new Abstract: Estimating the probability that a treatment outperforms a control for an individual patient, called the Individual Probability of Treatment Benefit (IPTB), offers a clinically intuitive alternative to population-average metrics.…

36

A Cross-Model VLM-Judge Protocol for Single-Image 3D Mesh Quality (and Why Cheap Proxies Fall Short)

Beyond AHI: An Interpretable Causal-Discovery-Guided Framework for Sleep Recovery in Connected Health

PSyGenTAB: A Privacy-Preserving Framework for Synthetic Clinical Tabular Data Generation via Constrained Optimization

Fair Cognitive Impairment Detection Through Unlearning

ChronoSurv: A Clinical Pathway-Guided Graph Framework for Multimodal Survival Analysis

Possible or Definite? A Benchmark for Evaluating Diagnostic Uncertainty Preservation in Clinical Text

Are LLMs Ready to Assist Physicians? PhysAssistBench for Interactive Doctor-Patient-EHR Assistance

Beyond Scalar Scores: Exploring LLM-based Metrics for Clinical Significance Evaluation in Radiology Reports

Language Models as Interfaces, Not Oracles: A Hybrid LLM-ML System for Pediatric Appendicitis

b9688

Informative Missingness to Generate Irregular Clinical Time Series

SpatioTemporal Causal Network Diagnostics for Geographic Tipping Point Early Warning

Multiple cyclicity and Wavelet Decomposition with Channel Correlation for Long-term Time Series Forecasting

RadSEM: A Finding-by-Finding Metric for Clinical Consistency in Radiology Reports

KFTD: Koopman-Fourier Time-Differentiable Network for Continuous Ocean Spatiotemporal Forecasting

AIPatient Arena: EHR-grounded evaluation of large language models in end-to-end clinical consultation workflows

The Slop Paradox: How Synthetic Standardization Erodes Clinical Uncertainty and Cross-Modal Alignment in AI-Rewritten Radiology Reports

When Multiple Scripts Matter: Evaluating ASR in Clinical Settings

RubricsTree: Scalable and Evolving Open-Ended Evaluation of Personal Health Agents across Health Memory and Medical Skills

SpeechDx: A Multi-Task Benchmark for Clinical Speech AI

Beyond Native Success: Auditing Deployment-Interface Exposure of CLIP Backdoors

Reading between the Lines: Leveraging Large Language Models for Global Dementia and Depression Assessment from Clinical Interviews

ProvenanceGuard: Source-Aware Factuality Verification for MCP-Based LLM Agents

MedicalAgentsBench for Complex Medical Reasoning: Comparing Internalized Reasoning Models versus Externalized Agent-based Frameworks

<click-to-play> — a still that plays

TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs

CLI deployment limits removed

Anthropic "pauses" token-based billing for its Claude Agent SDK

GLM-5.2 is the first open-weights model to cross 80% on Terminal-Bench and beats every other open model available

Build On-Device AI Companions with the NVIDIA ACE Game Agent SDK and Unreal Engine 5 Plugins

Want to get a data center online quickly? Give it some flex.

PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions

Semantic Reasoning in Medicine: The Role of Knowledge Graphs Across Five Key Domains

RECTOR: Masked Region-Channel-Temporal Modeling for Affective and Cognitive Representation Learning

Beyond Classification: A Cough Regression Benchmark for Respiratory Acoustic Foundation Models

Z-Plane Neural Networks: Bounded Geometric Activation Replaces ReLU and LayerNorm

When Generator Replay Degrades: Projected Rehearsal Orchestration for Heterogeneous Federated Class-Incremental Learning

PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions

ReportQA: QA-Based Radiology Report Evaluation

EHRNote-ChatQA: A Benchmark for Evidence-Grounded Multi-Turn Clinical Question Answering over Longitudinal Discharge Summaries

Interactor: Agentic RL oriented Iterative Creation for Ad Description Generation in Sponsored Search

Workflow SDK now supports inflight cancellation

Workflow SDK now supports TanStack Start

Ten years of ClickHouse in open source

GitHub Copilot CLI for Beginners: Overview of common slash commands

Maybe dumb question, but how do you serve multiple users with the full context length?

v0.109.2

Apple Foundation Models

FedSPC: Shared Parameter Correction for Personalized Federated Learning

Attention-Based Estimation of the Individual Treatment Benefit Probability under Dose Variation