Tag

Developer Tool

500 articles archived under #developer-tool · RSS

arXiv — NLP / Computation & Language research 25d ago

Measuring the sensitivity of LLM-based structured extraction to prompt, model, and schema choices in clinical discharge summaries

arXiv:2606.05970v1 Announce Type: new Abstract: Large language models are increasingly used for structured extraction from clinical free-text notes, but the sensitivity of their output to upstream configuration choices is less understood than their accuracy on fixed benchmarks.…

23
Hacker News — AI on Front Page community 25d ago

Open Code Review – An AI-powered code review CLI tool

Article URL: https://github.com/alibaba/open-code-review Comments URL: https://news.ycombinator.com/item?id=48406358 Points: 233 # Comments: 66

32
Hugging Face Daily Papers research 25d ago

Evaluating Large Language Models in Dynamic Clinical Decision-Making with Standardized Patient Cases

Abstract MedSP1000 introduces an interactive benchmark derived from standardized patients to evaluate clinical agents' dynamic performance across encounters, revealing limitations of current large language models in medical applications. Generated by…

18
llama.cpp releases dev-tools 25d ago

b9503

fix(mtmd): handle Gemma 4 audio projector embedding size ( #24091 ) mtmd: handle Gemma 4 audio projector embedding size rm projection_dim from clip_n_mmproj_embd Co-authored-by: Xuan Son Nguyen [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64,…

28
arXiv — Machine Learning research 26d ago

Early Detection of Alzheimer's Disease Using Explainable Machine Learning on Clinical Biomarkers: A Multi-Class Classification Study Using the Alzheimer's Disease Neuroimaging Initiative (ADNI) Dataset

arXiv:2606.03995v1 Announce Type: new Abstract: Background: Alzheimer's disease (AD) affects over 55 million people worldwide. Accurate, interpretable detection of normal cognition (NC), mild cognitive impairment (MCI), and AD from routine clinical assessments remains a critical…

14
arXiv — Machine Learning research 26d ago

KODA: Contrastive Representation Comparison and Alignment for Vision-Language Foundation Models

arXiv:2606.04180v1 Announce Type: new Abstract: Vision-language foundation models such as CLIP and SigLIP provide widely used representations for multimodal learning systems. While these models are typically compared through downstream performance, such evaluations often do not…

8
arXiv — NLP / Computation & Language research 26d ago

When Clients Stop Following: A Cognitive Conceptualization Diagram-driven Framework for Strategic Counseling

arXiv:2606.04389v1 Announce Type: new Abstract: Large Language Models (LLMs) show promise in psychological counseling, yet existing benchmarks rely heavily on highly cooperative simulated clients. We observe a critical counselor-following phenomenon: these clients often rapidly…

14
arXiv — NLP / Computation & Language research 26d ago

Evaluating Large Language Models in Dynamic Clinical Decision-Making with Standardized Patient Cases

arXiv:2606.05112v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly proposed as clinical agents, yet static, single-turn benchmarks cannot capture how a model dynamically delivers care across an encounter: gathering information, planning treatment, and…

32
Hugging Face official-blog 26d ago

Designing the hf CLI as an agent-optimized way to work with the Hub

Back to Articles Designing the hf CLI as an agent-optimized way to work with the Hub Published June 4, 2026 Update on GitHub Upvote 4 Célina Hanouti celinah Lucain Pouget Wauplin hf is the official command-line entrypoint to the Hugging Face Hub. Anything you can do on the Hub…

12
Ollama releases dev-tools 26d ago

v0.30.4-rc1: llama-server: fix gemma4 patch wiring (#16477)

This will fix the "clip.cpp:4399: Unknown projector type" crash.

4
Ollama releases dev-tools 26d ago

v0.30.4: llama-server: fix gemma4 patch wiring (#16477)

This will fix the "clip.cpp:4399: Unknown projector type" crash.

38
r/LocalLLaMA community 26d ago

How to use audio and vision modalities in llama.cpp?

How to use audio and vision modalities in llama.cpp with Gemma4 12B it? I’m on release b9494, but when I run llama-cli it shows “modalities: text” only, and crashes if I try to add an image.   submitted by   /u/No-Leave-4512 [link]   [comments]

20
Hugging Face Daily Papers research 26d ago

KletterMix: Climbing Toward High-Quality German Pretraining Data

Abstract A high-quality German-language corpus for language model pretraining is introduced through careful translation of an English corpus while preserving document structure and metadata, demonstrating improved downstream performance in German-language tasks. Generated by…

28
Hacker News — AI on Front Page community 26d ago

Mouseless – keyboard-driven control of macOS/Linux/Windows

Article URL: https://mouseless.click Comments URL: https://news.ycombinator.com/item?id=48383667 Points: 223 # Comments: 107

38
arXiv — Machine Learning research 27d ago

Auditable Climate Risk Intelligence from Fragmented ESG Data: Deterministic Orchestration and Imbalance-Aware Learning for Scope 1-3 Validation

arXiv:2606.02604v1 Announce Type: new Abstract: ESG and climate risk data remain fragmented across heterogeneous Scope 1, Scope 2, and Scope 3 reporting environments, while conventional validation pipelines lack provenance aware auditability, hidden drift detection, and…

9
arXiv — Machine Learning research 27d ago

DECA: Decentralizing Block-Wise Adam for Efficient LLM Full-Parameter Fine-Tuning on Non-IID Data

arXiv:2606.03209v1 Announce Type: new Abstract: Fine-tuning large language models (LLMs) in privacy-sensitive and resource-constrained environments remains challenging. Since training data are often distributed across multiple clients, decentralized fine-tuning offers a natural…

15
arXiv — Machine Learning research 27d ago

Learning Temporal Causal Structure via Smooth Differentiable Optimization

arXiv:2606.03227v1 Announce Type: new Abstract: Causal discovery with instantaneous effects in multivariate time series is challenging, as the instantaneous structure must be acyclic. Prior methods enforce this by either separating instantaneous and lagged estimation into…

4
arXiv — Machine Learning research 27d ago

Multi-Modal Graph Neural Network with Transformer-Guided Adaptive Diffusion for Preclinical Alzheimer Classification

arXiv:2606.03322v1 Announce Type: new Abstract: The graphical representation of the brain offers critical insights into diagnosing and prognosing neurodegenerative disease via relationships between regions of interest (ROIs). Despite recent emergence of various Graph Neural…

14
arXiv — NLP / Computation & Language research 27d ago

AI Rater Discrimination Depends on Scoring Protocol in Complex Clinical Decision-Making

arXiv:2606.03198v1 Announce Type: new Abstract: Clinical AI evaluation increasingly delegates scoring to large language models (LLMs) acting as AI raters, yet their scoring behavior across evaluation conditions has not been quantitatively characterized. We address this gap…

17
arXiv — NLP / Computation & Language research 27d ago

The Word and the Way: Strategies for Domain-Specific BERT Pre-Training in German Medical NLP

arXiv:2606.03250v1 Announce Type: new Abstract: Digital healthcare generates vast amounts of clinical text that can support AI-assisted applications, yet German biomedical language models remain limited by older architectures or restricted training data. We present ChristBERT…

33
arXiv — NLP / Computation & Language research 27d ago

SagaQA: A Multi-hop Reasoning Benchmark for Long-form Narrative Understanding in TV Series

arXiv:2606.03301v1 Announce Type: new Abstract: We introduce SagaQA, a long-form video benchmark for multi-hop reasoning over full-length TV series. Existing video reasoning benchmarks often emphasize local understanding of adjacent frames or clips. SagaQA addresses this gap by…

33
arXiv — NLP / Computation & Language research 27d ago

Selective Token-Level Cryptographic Redaction for Privacy-Preserving Clinical Deployment of Large Language Models

arXiv:2606.03399v1 Announce Type: new Abstract: While large language models (LLMs) are increasingly used for clinical applications, many existing pipelines require sending raw sensitive health information to remote servers for processing, which heightens the risk of privacy…

4
arXiv — NLP / Computation & Language research 27d ago

Does Language Shift Break Medical Vision-Language Models? Indonesian Radiology Visual Question Answering Case Study

arXiv:2606.03693v1 Announce Type: new Abstract: Medical Vision-Language Models (VLMs) are typically evaluated on English radiology visual question answering benchmarks, leaving their robustness under non-English clinical language largely unexplored. We introduce IndoRad-VQA, an…

10
arXiv — NLP / Computation & Language research 27d ago

KletterMix: Climbing Toward High-Quality German Pretraining Data

arXiv:2606.03773v1 Announce Type: new Abstract: High-quality pretraining data is a central ingredient in modern language models, but German-language resources remain far less developed than their English counterparts: they are often smaller, less carefully curated, weakly…

19
Hacker News — AI on Front Page community 27d ago

MAI-Code-1-Flash

https://microsoft.ai/models/mai-code-1-flash/ https://microsoft.ai/pdf/MAI-Code-1-Flash-Model-Card.PDF Launching seven new MAI models: https://microsoft.ai/news/building-a-hillclimbing-machine-la... Comments URL: https://news.ycombinator.com/item?id=48374466 Points: 228 #…

36
Hacker News — AI on Front Page community 27d ago

1-Click GitHub Token Stealing via a VSCode Bug

Article URL: https://blog.ammaraskar.com/github-token-stealing/ Comments URL: https://news.ycombinator.com/item?id=48371562 Points: 220 # Comments: 30

4
Vercel — AI dev-tools 28d ago

Edit Git settings for all projects in a repo

Monorepos that deploy many projects can now configure all of their project's Git settings more conveniently. Previously, if you wanted to consistently configure each project's settings for commit status, repository_dispatch events , etc., you had to click through to every…

16
Hugging Face Daily Papers research 28d ago

Multi-Agent Computer Use

Abstract Multi-agent computer use systems outperform single-agent approaches on complex tasks by enabling parallel execution and dynamic task decomposition through directed acyclic graphs. AI-generated summary Computer use agents (CUAs) today are primarily deployed as single…

18
arXiv — Machine Learning research 28d ago

PE-means: Improved Differentially Private $k$-means Clustering through Private Evolution

arXiv:2606.00342v1 Announce Type: new Abstract: We study the problem of differentially private (DP) $k$-means clustering in Euclidean space. Previous solutions rely on summing the private data directly, which induces a sensitivity proportional to the domain. We introduce…

17
arXiv — Machine Learning research 28d ago

Canonicalized Stable-List Replay for Private Federated Continual Learning over Language-Model Embeddings

arXiv:2606.00426v1 Announce Type: new Abstract: Federated continual learning (FCL) lets distributed clients adapt language-model heads to evolving NLP tasks without sharing raw text. Under user-level differential privacy (DP), replay-based continual learning faces a structural…

17
arXiv — NLP / Computation & Language research 28d ago

A Multi-Domain Red Teaming Framework for Safety, Robustness, and Fairness Evaluation of Medical Large Language Models

arXiv:2606.00027v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed across healthcare, yet existing benchmarks fail to capture model behavior under adversarial or ethically complex conditions common in clinical practice. We developed a…

37
arXiv — NLP / Computation & Language research 28d ago

LLMs for Cardiovascular Risk Prediction from Structured Clinical Data

arXiv:2606.00031v1 Announce Type: new Abstract: Coronary artery disease (CAD) remains one of the leading causes of death globally, highlighting the need for reliable predictive systems to support early diagnosis and risk assessment. While traditional machine learning models…

14
arXiv — NLP / Computation & Language research 28d ago

LinguIUTics at PsyDefDetect: Iterative Imbalance-Aware Fine-tuning of Qwen3-8B for Psychological Defense Mechanism Classification

arXiv:2606.00647v1 Announce Type: new Abstract: Detecting psychological defense mechanisms in conversational text remains a challenging clinical NLP problem. For the PsyDefDetect 2026 shared task (nine-class utterance classification evaluated via macro F1), our team LinguIUTics…

5
arXiv — NLP / Computation & Language research 28d ago

Med-HEAL: Analyzing and Mitigating Hallucinations in Medical LLMs with Hallucination-Aware In-Context Learning

arXiv:2606.01301v1 Announce Type: new Abstract: Hallucinations in medical large language models (LLMs) pose serious risks for clinical decision support, particularly when models must reason over complex electronic health records (EHRs). However, existing benchmarks often lack a…

8
r/MachineLearning community 28d ago

MeshFlow: production-safe multi-agent orchestration — SHA-256 audit chain, HIPAA/SOX/GDPR built in, 70-85% token cost reduction [Open Source][D]

79% of enterprises have adopted AI agents. Only 11% run them in production. We've spent the past year building agent systems for banks, clinical operations teams, and engineering orgs. The problem isn't that agents don't work — they work fine. The problem is that every framework…

12
Vercel — AI dev-tools 28d ago

Build Chat SDK web UIs in Vue or Svelte

The Chat SDK web adapter now has first-class support for Vue and Svelte, joining the existing React integration. Because the adapter speaks the AI SDK UI message stream protocol , the same server route works. Each framework ships its own useChat , built on the matching AI SDK…

16
Vercel — AI dev-tools 28d ago

Build custom Slack runtimes

Chat SDK now ships the Slack adapter 's primitives as standalone imports for apps that already handle their own routing, state, or workflow execution. Use only what you need: Request verification and payload parsing ( @chat-adapter/slack/webhook ) Markdown formatting (…

20
OpenAI Python SDK releases dev-tools 28d ago

v2.40.0

2.40.0 (2026-06-01) Full Changelog: v2.39.0...v2.40.0 Features api: Add Amazon Bedrock Responses support Bug Fixes api: allow setting bedrock api keys on the client directly ( 4d5bfde )

19
Vercel — AI dev-tools 28d ago

Chat SDK adds Velt support

Chat SDK now supports Velt with the new vendor-official adapter . Build bots that read and reply within Velt comment threads, right where your team already works: documents, text editors, and canvases. Tag the bot, and it will answer in the same thread, grounding its reply with…

24
Vercel — AI dev-tools 28d ago

Chat SDK adds AgentPhone support

Chat SDK now supports AgentPhone with the new vendor-official adapter . Give your bot its own phone number so it can handle voice calls and text messages using the same handlers you already write. When a call ends, the transcript is delivered as a message, allowing your bot to…

14
Hacker News — AI on Front Page community 28d ago

NPM packages from RedHat have been compromised

Article URL: https://github.com/RedHatInsights/javascript-clients/issues/492 Comments URL: https://news.ycombinator.com/item?id=48356625 Points: 327 # Comments: 151

37
r/LocalLLaMA community 28d ago

MTP is nice and all, but what about PP speeds?

I don't know for the rest of you, but with my setup, as soon as i enable MTP, the PP performance and GPU usage drops significantly for some reason. It's not as much a memory issue for me as it is declining performance. My setup is: 2x Radeon VII 16gb on ROCm, 1x Rtx3080 8gb Max…

28
Hugging Face Daily Papers research 28d ago

One Click per Cell Type Suffices: Training-free Group Interaction for Cell Instance Segmentation

Abstract Group Prompting enables efficient cell instance segmentation by leveraging per-type prompting through a training-free framework that uses multi-scale encoder features and recursive prompt expansion. AI-generated summary Cell instance segmentation models trained on…

32
Hugging Face Daily Papers research 28d ago

How can embedding models bind concepts?

Abstract Vision-language models like CLIP struggle with concept binding despite recognizing individual concepts, but controlled transformer models can learn low-complexity binding functions that generalize better through multiplicative interactions. AI-generated summary Humans…

11
r/LocalLLaMA community 29d ago

Just found a 1-click RCE in pewdiepie's Odysseus Chat

PR being submitted to help the project as we speak. Sound on for extra lols.   submitted by   /u/theonejvo [link]   [comments]

7
Vercel — AI dev-tools 29d ago

Qwen 3.7 Plus now available on AI Gateway

Qwen 3.7 Plus from Alibaba is now available on Vercel AI Gateway . The model unifies vision and language into a single agent foundation, with capabilities spanning GUI and CLI operation, coding and productivity workflows with full-modality input, and visual agent tasks including…

26
arXiv — Machine Learning research 29d ago

Gait2Hip-60: A Unified Deep Learning Benchmark for Predicting Hip Muscle Forces and Joint Moments from Multi-Cadence Gait Kinematics

arXiv:2605.30374v1 Announce Type: new Abstract: Estimating hip muscle forces and joint moments during gait typically relies on musculoskeletal simulation, which is informative but time-consuming and difficult to apply in clinical settings. This study developed a deep learning…

10
arXiv — Machine Learning research 29d ago

Counterfactual Evaluation Reveals Hidden Capability Profiles in Clinical LLMs and Agents

arXiv:2605.30590v1 Announce Type: new Abstract: Two clinical AI systems can score nearly identically on coverage-based rubrics yet behave radically differently when their patient inputs change: one updates its recommendations to match the new clinical signal, while the other…

23
arXiv — NLP / Computation & Language research 29d ago

Generalistic or Specific Embeddings, Which is Better? An Empirical Study on Search for Clinical Coding in Non-English Languages

arXiv:2605.30529v1 Announce Type: new Abstract: Sentence-embedding models for semantic search are overwhelmingly developed and evaluated on English corpora. When applied to clinical retrieval in other languages -- particularly retrieval of ICD-10-CM / CIE-10 codes -- recall…

26
arXiv — NLP / Computation & Language research 29d ago

Same Patient, Different Words, Different Diagnosis? Evaluating Semantic Stability in Clinical LLMs

arXiv:2605.30646v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly used in clinical applications. However, their behavior remains highly sensitive to subtle linguistic variations, such as rephrasing or syntactic variation. This sensitivity poses risks…

27

Measuring the sensitivity of LLM-based structured extraction to prompt, model, and schema choices in clinical discharge summaries

Open Code Review – An AI-powered code review CLI tool

Evaluating Large Language Models in Dynamic Clinical Decision-Making with Standardized Patient Cases

b9503

Early Detection of Alzheimer's Disease Using Explainable Machine Learning on Clinical Biomarkers: A Multi-Class Classification Study Using the Alzheimer's Disease Neuroimaging Initiative (ADNI) Dataset

KODA: Contrastive Representation Comparison and Alignment for Vision-Language Foundation Models

When Clients Stop Following: A Cognitive Conceptualization Diagram-driven Framework for Strategic Counseling

Evaluating Large Language Models in Dynamic Clinical Decision-Making with Standardized Patient Cases

Designing the hf CLI as an agent-optimized way to work with the Hub

v0.30.4-rc1: llama-server: fix gemma4 patch wiring (#16477)

v0.30.4: llama-server: fix gemma4 patch wiring (#16477)

How to use audio and vision modalities in llama.cpp?

KletterMix: Climbing Toward High-Quality German Pretraining Data

Mouseless – keyboard-driven control of macOS/Linux/Windows

Auditable Climate Risk Intelligence from Fragmented ESG Data: Deterministic Orchestration and Imbalance-Aware Learning for Scope 1-3 Validation

DECA: Decentralizing Block-Wise Adam for Efficient LLM Full-Parameter Fine-Tuning on Non-IID Data

Learning Temporal Causal Structure via Smooth Differentiable Optimization

Multi-Modal Graph Neural Network with Transformer-Guided Adaptive Diffusion for Preclinical Alzheimer Classification

AI Rater Discrimination Depends on Scoring Protocol in Complex Clinical Decision-Making

The Word and the Way: Strategies for Domain-Specific BERT Pre-Training in German Medical NLP

SagaQA: A Multi-hop Reasoning Benchmark for Long-form Narrative Understanding in TV Series

Selective Token-Level Cryptographic Redaction for Privacy-Preserving Clinical Deployment of Large Language Models

Does Language Shift Break Medical Vision-Language Models? Indonesian Radiology Visual Question Answering Case Study

KletterMix: Climbing Toward High-Quality German Pretraining Data

MAI-Code-1-Flash

1-Click GitHub Token Stealing via a VSCode Bug

Edit Git settings for all projects in a repo

Multi-Agent Computer Use

PE-means: Improved Differentially Private $k$-means Clustering through Private Evolution

Canonicalized Stable-List Replay for Private Federated Continual Learning over Language-Model Embeddings

A Multi-Domain Red Teaming Framework for Safety, Robustness, and Fairness Evaluation of Medical Large Language Models

LLMs for Cardiovascular Risk Prediction from Structured Clinical Data

LinguIUTics at PsyDefDetect: Iterative Imbalance-Aware Fine-tuning of Qwen3-8B for Psychological Defense Mechanism Classification

Med-HEAL: Analyzing and Mitigating Hallucinations in Medical LLMs with Hallucination-Aware In-Context Learning

MeshFlow: production-safe multi-agent orchestration — SHA-256 audit chain, HIPAA/SOX/GDPR built in, 70-85% token cost reduction [Open Source][D]

Build Chat SDK web UIs in Vue or Svelte

Build custom Slack runtimes

v2.40.0

Chat SDK adds Velt support

Chat SDK adds AgentPhone support

NPM packages from RedHat have been compromised

MTP is nice and all, but what about PP speeds?

One Click per Cell Type Suffices: Training-free Group Interaction for Cell Instance Segmentation

How can embedding models bind concepts?

Just found a 1-click RCE in pewdiepie's Odysseus Chat

Qwen 3.7 Plus now available on AI Gateway

Gait2Hip-60: A Unified Deep Learning Benchmark for Predicting Hip Muscle Forces and Joint Moments from Multi-Cadence Gait Kinematics

Counterfactual Evaluation Reveals Hidden Capability Profiles in Clinical LLMs and Agents

Generalistic or Specific Embeddings, Which is Better? An Empirical Study on Search for Clinical Coding in Non-English Languages

Same Patient, Different Words, Different Diagnosis? Evaluating Semantic Stability in Clinical LLMs