Tag

Gpu

500 articles archived under #gpu · RSS

r/LocalLLaMA community 26d ago

This day in LLM history….105 years ago today, Qwen 3.6 27b was released open source. /s

Unfortunately, the steam-powered GPUs of the era were incapable of anything higher than a 4K context limit.   submitted by   /u/Porespellar [link]   [comments]

33
r/MachineLearning community 26d ago

TorchDAE: Implicit DAE Solvers with Index Reduction and Adjoint Sensitivity [P]

Hello everyone, I've been working on a PyTorch library for solving Differential Algebraic Equations (DAEs) that supports vectorized execution and GPU acceleration. The library implements several algorithms that are not currently available in the Python ecosystem, including…

27
llama.cpp releases dev-tools 26d ago

b9489

cuda: reserve space for quantize kv-cache at startup ( #23907 ) cuda: reserve space for quantize kv-cache at startup address review comments remove forward decl Co-authored-by: Johannes Gäßler [email protected] remove assert in ggml-cuda.cu Co-authored-by: Johannes Gäßler…

25
r/LocalLLaMA community 27d ago

How much VRAM needed for Qwen 3.6 27B Q8 with 262K context?

trying to figure out my next GPU purchase. currently running IQ4XS and Q4 KV with 262K context and want to upgrade and run uncompressed KV and the model at Q8. anyone know how much VRAM is needed? would 48GB be enough?   submitted by   /u/My_Unbiased_Opinion [link]  …

24
Stratechery (Ben Thompson) community 27d ago

The Nvidia AI PC, Project Solara, Microsoft AI

The Nvidia AI PC feels like a relic of another AI era; Microsoft's vision for devices at Build was much more compelling.

5
arXiv — Machine Learning research 27d ago

MOSAIC: Efficient Mixture-of-Agent Scheduling via Adaptive Aggregation and Inference Concurrency

arXiv:2606.03014v1 Announce Type: new Abstract: Mixture-of-Agents (MoA) systems improve reasoning accuracy by routing each query to multiple expert LLMs and aggregating their outputs. Efficiently executing this workload on limited GPU resources has bottlenecks. Skill-based…

22
arXiv — Machine Learning research 27d ago

Will Accurate Fields Mislead Photonic Design? FromGlobal Accuracy to Port Readout

arXiv:2606.03038v1 Announce Type: new Abstract: Neural field surrogates can accelerate photonic design loops, but a surrogate that looks accurate in global field error can still mis-rank candidate devices when the final decision depends on localized output-port readouts. This…

23
arXiv — NLP / Computation & Language research 27d ago

Greener Than Humans? Environmental Attitudes in Large Language Models

arXiv:2606.02741v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used in sustainability-related decision support, reporting, and public communication, yet little systematic evidence exists on the environmental attitudes embedded in their outputs.…

19
arXiv — NLP / Computation & Language research 27d ago

Chatbots Output Meaningful (but Problematic) Language

arXiv:2606.02973v1 Announce Type: new Abstract: Are utterances by AI chatbots meaningful? Concretely, if a user asks, say, Anthropic's agent Claude, "What is the capital of Spain?" and Claude answers, "Madrid is the capital of Spain," does that sentence have its ordinary meaning…

28
arXiv — NLP / Computation & Language research 27d ago

Predicting Inference-Time Scaling Gains from Labeled Validation-Set Output Statistics

arXiv:2606.02981v1 Announce Type: new Abstract: Best-of-$N$ inference scaling (drawing $N$ candidate answers from a language model and returning the one a reward model ranks highest) improves accuracy by an amount that varies across models, but predicting that amount in advance…

30
arXiv — NLP / Computation & Language research 27d ago

The Unsampled Truth: Psychometrics in SLMs Measure Prompt Artifacts, Not Psychological Constructs

arXiv:2606.03357v1 Announce Type: new Abstract: When prompting SLMs for psychometric assessments, researchers assume the outputs reflect semantic reasoning. We evaluate this premise across 13 open-weights models (0.6B to 14B parameters) using a prompt variation framework that…

18
arXiv — NLP / Computation & Language research 27d ago

Consistency Training Can Entrench Misalignment

arXiv:2606.03810v1 Announce Type: new Abstract: Consistency training encourages a model to produce similar outputs across related inputs or sampling procedures. Such methods are simple, scalable, and largely label-free, but their effects on model alignment remain poorly…

31
Hugging Face Daily Papers research 27d ago

NVIDIA OmniDreams: Real-Time Generative World Model for Closed-Loop Autonomous Vehicle Simulation

Abstract OmniDreams, a foundation generative world model trained from the Cosmos diffusion model, enables real-time action-conditioned video generation for autonomous driving policy evaluation in complex, unseen scenarios. Generated by Qwen/Qwen2.5-Coder-32B-Instruct As…

23
r/LocalLLaMA community 27d ago

Why do we benchmark quants on perplexity and prose but never on tool call validity?

The mixed precision quant discussion here lately, MoE aware stuff that keeps shared experts and the edge layers at higher precision is great, but it's almost all measured against perplexity and general output quality. What I never see is structured output. Tool call JSON,…

37
Hacker News — AI on Front Page community 27d ago

Use your Nvidia GPU's VRAM as swap space on Linux

Article URL: https://github.com/c0dejedi/nbd-vram Comments URL: https://news.ycombinator.com/item?id=48377404 Points: 225 # Comments: 65

18
r/LocalLLaMA community 27d ago

Weird issue with OpenCode and Qwen3.6

I’m using Qwen3.6-27B running on my server with llama-server for AI coding with OpenCode. Sometimes for some reason, the response stops when its reasoning like if it has finished outputting the full response. I have to type “continue” and it continues working like if nothing…

30
llama.cpp releases dev-tools 27d ago

b9483

hexagon: profiler output fix and script updates ( #24042 ) hex-ops: fix profiler output (ie remove the redundant NONEs) hex-prof: update profiling script to support tot.usec column macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED…

26
NVIDIA Developer Blog official-blog 27d ago

Build Personal AI Agents on Windows PCs with New Tools from Microsoft and NVIDIA

AI agents are changing how you interact with your PC. Creators, developers, and AI enthusiasts are already using these agents extensively to assist with...

17
r/LocalLLaMA community 27d ago

Would you consider getting an NVIDIA RTX Spark laptop?

If yes, why? If no, also say why. I’d consider one if it’s faster at local AI inference than my current hardware and can still handle gaming decently. The 128GB unified memory idea is pretty interesting, but I’m unsure about Windows on Arm and game compatibility. Would you buy…

27
The Information — AI news-outlet 27d ago

Microsoft Debuts New Nvidia-Powered “Dev Box” PC; OS for Wearable and Desktop “Agent First” Devices

Microsoft on Tuesday unveiled a new desktop PC powered by an Nvidia processor geared towards developers who want to run AI models locally. The new PC, called the Surface RTX Spark Dev Box, is powered by Nvidia’s new Arm-based chips announced earlier this week. Microsoft…

14
r/LocalLLaMA community 27d ago

Why don't we still have any games with AI agents used as NPC characters?

Do you remember this NVIDIA AI-NPC presentation from 3 years ago? https://www.youtube.com/watch?v=5R8xZb6J3r0 Where all of that? Why do we even try getting agents to do all the work if they still cannot be reliably used as a characters in the video games? Isn't it should be the…

11
r/LocalLLaMA community 27d ago

I Put a Datacenter GPU in My Gaming PC for £200

Hey there! I wrote a blogpost about my experience running local models on a V100 from a newbie perspective and got loads of views outside of reddit, so I thought I'd share it here too!   submitted by   /u/tymscar [link]   [comments]

33
r/LocalLLaMA community 27d ago

Benchmarks of 20 small LLMs on a 6GB RTX 4050

I'm looking for models that can run on my GPU and actually do something useful. I think that any small difference could be a "big" improvement, because they are all so small. So I went to the LM studio database and searched many variants from the same family, trying to select…

37
NVIDIA Developer Blog official-blog 27d ago

Deploy Self-Evolving Agents for Faster, More Secure Research with a Hermes Agent and NVIDIA NemoClaw

AI agents are a powerful tool for synthesizing data to accelerate research, summarize information, and help teams make decisions faster. But combining internal...

29
Hugging Face Daily Papers research 27d ago

Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs

Abstract Watermarking AI-generated text for detection fails when multiple models are used, as averaging outputs cancels perturbations and suppresses detection while improving quality and speed. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Watermarking embeds statistical…

4
r/LocalLLaMA community 27d ago

What are you using to preprocess pdfs before feeding them to a local model?

I have been running a local setup for document QA and the output quality varies a lot depending on what the pdf looks like when it hits the LLM. clean prose docs are fine but anything with tables or multi column layouts comes out garbled and the model just works with whatever…

37
Hugging Face Daily Papers research 27d ago

AI, Take the Wheel: What Drives Delegation and Trust in Human-Computer Cooperative Question Answering?

Abstract Human-AI collaboration in question-answering tasks reveals suboptimal reliance decisions where humans under-rely on correct AI suggestions and over-rely when AI misleads them, with confirmation bias contributing to reduced trust in conflicting AI outputs. Generated by…

25
Hugging Face Daily Papers research 28d ago

Can Predicted Dynamics Exist in the Physical World?

Abstract Physical admissibility validation for AI systems uses prediction-control interfaces with kinematic and dynamic conditions to filter invalid proposals while maintaining high performance. AI-generated summary Predictive Physical AI systems output state rollouts, action…

33
r/LocalLLaMA community 28d ago

nvidia-LocateAnything-3B detects sushi as sweet in the video demo

https://preview.redd.it/xc0l68bj7t4h1.png?width=616&format=png&auto=webp&s=48a8b14bc4ae95700cd4efa76772f4e71fb2d41a https://huggingface.co/nvidia/LocateAnything-3B funny how they left this in the demo atleast it's honest   submitted by   /u/chocofoxy [link]  …

7
r/LocalLLaMA community 28d ago

NVIDIA releases Cosmos 3 Omnimodal world modelson HF

https://huggingface.co/nvidia/Cosmos3-Super-Text2Image Nano: 16B Super: 64B Cosmos3 is a collection of Omnimodal world models capable of generating dynamic, high-quality video, image, audio, and action commands from combinations of text, image, video, and action trajectory…

7
r/LocalLLaMA community 28d ago

Moss tts 1.5 8b Examples. It is the currently best voice cloning model for English as of June 2026

Moss tts 1.5 8b is better than fish audio s2 pro and qwen 3 tts voice clone tts. You can easily get more better quality if you set up the duration of the voice in output you want and some temperature and other changes. This was just used on default setting. It can be improved…

20
arXiv — Machine Learning research 28d ago

BudgetDraft: Acceptance-Aware Multi-View Training for Sparse-KV Speculative Decoding

arXiv:2606.00144v1 Announce Type: new Abstract: Speculative decoding speeds up autoregressive decoding by using a drafter to propose multiple tokens that a verifier validates in parallel. In resource-constrained deployments, the drafter uses a sparse KV cache to limit peak GPU…

33
arXiv — Machine Learning research 28d ago

The Assistant as a Privileged Persona: A canonical reference in cross-persona self-recognition

arXiv:2606.00545v1 Announce Type: new Abstract: Post-trained language models can recognize their own outputs from a sentence or two out of context. In a companion paper \citep{jack2026twomodes} we showed they can also recognize when they are currently acting on-policy, through…

31
arXiv — Machine Learning research 28d ago

Same Payload, Different Channel: Measuring Trust Asymmetry in Tool-Using Language Models

arXiv:2606.00566v1 Announce Type: new Abstract: As language models take on agentic roles that span calling external APIs, reading tool outputs, and acting on instructions embedded in third-party content, their attack surface expands well beyond what users type. Whether a model…

14
arXiv — NLP / Computation & Language research 28d ago

Linguistics-Aware Non-Distortionary LLM Watermarking

arXiv:2606.00613v1 Announce Type: new Abstract: Watermarking should identify language-model output without degrading quality or limiting verification to the model provider. Multilingual deployment makes this harder because morphology, segmentation, and script change where…

37
arXiv — NLP / Computation & Language research 28d ago

EPIC: Efficient and Parallel Inference under CFG Constraints for Diffusion Language Models

arXiv:2606.00722v1 Announce Type: new Abstract: Controlling language model outputs is essential for ensuring structural validity, reliability, and downstream usability, and diffusion language models are no exception. Recent advances in diffusion language model decoding have…

19
arXiv — NLP / Computation & Language research 28d ago

HypothesisMed: Inference-Time Answer Fusion and Structured Hypothesis-Space Reporting for Biomedical Question Answering

arXiv:2606.00971v1 Announce Type: new Abstract: Biomedical question answering with large language models is commonly evaluated using answer accuracy, but answer accuracy alone does not indicate whether a model can produce parseable outputs, follow structured reliability…

23
arXiv — NLP / Computation & Language research 28d ago

A Finite-Calibration Regime Map for LLM Judge Panels

arXiv:2606.01034v1 Announce Type: new Abstract: We study when LLM judge panels should be calibrated with low-dimensional stackers versus joint output tables under finite human-label budgets. Low-dimensional stackers have small estimation cost but miss interactions, whereas…

24
arXiv — NLP / Computation & Language research 28d ago

When Is 0.1% Enough? Analyzing the Combined Effects of Dimensionality Reduction and Quantization on Text Embedding Compression

arXiv:2606.01074v1 Announce Type: new Abstract: Recent high-performing text embedding models often output high-dimensional real-valued vectors, resulting in substantial storage and computational costs. To address this issue, compression methods based on dimensionality reduction…

18
Latent.Space news-outlet 28d ago

[AINews] NVIDIA Cosmos 3, Nemotron 3 Ultra, and RTX Spark

Jensen scores a huge win.

27
Hugging Face Daily Papers research 28d ago

Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs

Abstract Automated systems for generating scientific figures face limitations in handling diverse figure types and conditions, prompting the development of multi-agent frameworks that generalize across different input scenarios and produce editable output formats. AI-generated…

29
NVIDIA Developer Blog official-blog 28d ago

Deploy Agentic-Ready AI at the Edge with Memory Efficiency in NVIDIA JetPack 7.2

As AI agents move from the digital world to the physical environment, they can readily use NVIDIA Jetson to accelerate real-world deployment with optimized...

24
NVIDIA Developer Blog official-blog 28d ago

Run Local AI Agents with Faster Models and Multi-Node Clustering on NVIDIA DGX Spark

The rise of autonomous, long-running AI agents has introduced a new class of compute demand, namely tasks that maintain large context windows, spawn concurrent...

16
TechCrunch — AI news-outlet 28d ago

Nvidia chases $200B CPU market with AI agent PCs from Microsoft, Dell, and HP

If Nvidia has cracked a way to bring AI agents easily, safely and usefully to the masses, it could — and should — be big.

38
r/LocalLLaMA community 28d ago

ICYM: llama.cpp b9455 --SM Tensor KV Cache Fix is MERGED

Them boys can cook, one big fix after another! If you're running --sm tensor on multi-gpu this is the KV cache quantization fix https://github.com/ggml-org/llama.cpp/releases/tag/b9455 JohannesGaessler commented 5 days ago This PR implements support for the combination of -sm…

11
llama.cpp releases dev-tools 28d ago

b9464

speculative : fix n_outputs_max and remove draft-simple auto-enable ( #23988 ) speculative : add common_speculative_n_max helper function Extract the speculative max-draft-size logic from server_n_outputs_max into a reusable common_speculative_n_max() function in…

7
r/LocalLLaMA community 28d ago

NVIDIA GB300 Grace Blackwell Ultra pricetags

https://www.scan.co.uk/shop/ai-and-robotics/workstations-ai/nvidia-dgx-station   submitted by   /u/X-N2O [link]   [comments]

5
llama.cpp releases dev-tools 28d ago

b9460

llama: limit max outputs of llama_context ( #23861 ) llama: save more VRAM by reserving n_outputs == n_seqs when possible add n_outputs_per_seq move n_outputs_max to server-context change ubatch to batch everywhere macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…

15
r/LocalLLaMA community 28d ago

Computex 2026: Intel launches Crescent Island GPU with up to 480GB VRAM

https://www.neowin.net/news/computex-2026-intel-launches-crescent-island-gpu-with-up-to-480gb-vram/ Crescent Island is based on the company"s Arc Xe 3P architecture which lies inside current Panther Lake iGPs as well. This is Intel"s latest, most powerful card and it packs up to…

36
Ollama releases dev-tools 28d ago

v0.30.0-rc32: llama-server followups (#16353)

llama-server followups Misc fixes for #16031 Add back dropped ROCm build flag for multi-GPU support on windows Fix amdhip64_*.dll version detection for "latest" selection Fix embeddings API for consistent normalize behavior with prior versions ci: set up for automated llama.cpp…

19

This day in LLM history….105 years ago today, Qwen 3.6 27b was released open source. /s

TorchDAE: Implicit DAE Solvers with Index Reduction and Adjoint Sensitivity [P]

b9489

How much VRAM needed for Qwen 3.6 27B Q8 with 262K context?

The Nvidia AI PC, Project Solara, Microsoft AI

MOSAIC: Efficient Mixture-of-Agent Scheduling via Adaptive Aggregation and Inference Concurrency

Will Accurate Fields Mislead Photonic Design? FromGlobal Accuracy to Port Readout

Greener Than Humans? Environmental Attitudes in Large Language Models

Chatbots Output Meaningful (but Problematic) Language

Predicting Inference-Time Scaling Gains from Labeled Validation-Set Output Statistics

The Unsampled Truth: Psychometrics in SLMs Measure Prompt Artifacts, Not Psychological Constructs

Consistency Training Can Entrench Misalignment

NVIDIA OmniDreams: Real-Time Generative World Model for Closed-Loop Autonomous Vehicle Simulation

Why do we benchmark quants on perplexity and prose but never on tool call validity?

Use your Nvidia GPU's VRAM as swap space on Linux

Weird issue with OpenCode and Qwen3.6

b9483

Build Personal AI Agents on Windows PCs with New Tools from Microsoft and NVIDIA

Would you consider getting an NVIDIA RTX Spark laptop?

Microsoft Debuts New Nvidia-Powered “Dev Box” PC; OS for Wearable and Desktop “Agent First” Devices

Why don't we still have any games with AI agents used as NPC characters?

I Put a Datacenter GPU in My Gaming PC for £200

Benchmarks of 20 small LLMs on a 6GB RTX 4050

Deploy Self-Evolving Agents for Faster, More Secure Research with a Hermes Agent and NVIDIA NemoClaw

Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs

What are you using to preprocess pdfs before feeding them to a local model?

AI, Take the Wheel: What Drives Delegation and Trust in Human-Computer Cooperative Question Answering?

Can Predicted Dynamics Exist in the Physical World?

nvidia-LocateAnything-3B detects sushi as sweet in the video demo

NVIDIA releases Cosmos 3 Omnimodal world modelson HF

Moss tts 1.5 8b Examples. It is the currently best voice cloning model for English as of June 2026

BudgetDraft: Acceptance-Aware Multi-View Training for Sparse-KV Speculative Decoding

The Assistant as a Privileged Persona: A canonical reference in cross-persona self-recognition

Same Payload, Different Channel: Measuring Trust Asymmetry in Tool-Using Language Models

Linguistics-Aware Non-Distortionary LLM Watermarking

EPIC: Efficient and Parallel Inference under CFG Constraints for Diffusion Language Models

HypothesisMed: Inference-Time Answer Fusion and Structured Hypothesis-Space Reporting for Biomedical Question Answering

A Finite-Calibration Regime Map for LLM Judge Panels

When Is 0.1% Enough? Analyzing the Combined Effects of Dimensionality Reduction and Quantization on Text Embedding Compression

[AINews] NVIDIA Cosmos 3, Nemotron 3 Ultra, and RTX Spark

Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs

Deploy Agentic-Ready AI at the Edge with Memory Efficiency in NVIDIA JetPack 7.2

Run Local AI Agents with Faster Models and Multi-Node Clustering on NVIDIA DGX Spark

Nvidia chases $200B CPU market with AI agent PCs from Microsoft, Dell, and HP

ICYM: llama.cpp b9455 --SM Tensor KV Cache Fix is MERGED

b9464

NVIDIA GB300 Grace Blackwell Ultra pricetags

b9460

Computex 2026: Intel launches Crescent Island GPU with up to 480GB VRAM

v0.30.0-rc32: llama-server followups (#16353)