News / #gpu Tag Gpu 500 articles archived under #gpu · RSS Sign in to follow r/LocalLLaMA community 26d ago This day in LLM history….105 years ago today, Qwen 3.6 27b was released open source. /s Unfortunately, the steam-powered GPUs of the era were incapable of anything higher than a 4K context limit.   submitted by   /u/Porespellar [link]   [comments] 33 r/MachineLearning community 26d ago TorchDAE: Implicit DAE Solvers with Index Reduction and Adjoint Sensitivity [P] Hello everyone, I've been working on a PyTorch library for solving Differential Algebraic Equations (DAEs) that supports vectorized execution and GPU acceleration. The library implements several algorithms that are not currently available in the Python ecosystem, including… 27 llama.cpp releases dev-tools 26d ago b9489 cuda: reserve space for quantize kv-cache at startup ( #23907 ) cuda: reserve space for quantize kv-cache at startup address review comments remove forward decl Co-authored-by: Johannes Gäßler [email protected] remove assert in ggml-cuda.cu Co-authored-by: Johannes Gäßler… 25 r/LocalLLaMA community 27d ago How much VRAM needed for Qwen 3.6 27B Q8 with 262K context? trying to figure out my next GPU purchase. currently running IQ4XS and Q4 KV with 262K context and want to upgrade and run uncompressed KV and the model at Q8. anyone know how much VRAM is needed? would 48GB be enough?   submitted by   /u/My_Unbiased_Opinion [link]  … 24 Stratechery (Ben Thompson) community 27d ago The Nvidia AI PC, Project Solara, Microsoft AI The Nvidia AI PC feels like a relic of another AI era; Microsoft's vision for devices at Build was much more compelling. 5 arXiv — Machine Learning research 27d ago MOSAIC: Efficient Mixture-of-Agent Scheduling via Adaptive Aggregation and Inference Concurrency arXiv:2606.03014v1 Announce Type: new Abstract: Mixture-of-Agents (MoA) systems improve reasoning accuracy by routing each query to multiple expert LLMs and aggregating their outputs. Efficiently executing this workload on limited GPU resources has bottlenecks. Skill-based… 22 arXiv — Machine Learning research 27d ago Will Accurate Fields Mislead Photonic Design? FromGlobal Accuracy to Port Readout arXiv:2606.03038v1 Announce Type: new Abstract: Neural field surrogates can accelerate photonic design loops, but a surrogate that looks accurate in global field error can still mis-rank candidate devices when the final decision depends on localized output-port readouts. This… 23 arXiv — NLP / Computation & Language research 27d ago Greener Than Humans? Environmental Attitudes in Large Language Models arXiv:2606.02741v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used in sustainability-related decision support, reporting, and public communication, yet little systematic evidence exists on the environmental attitudes embedded in their outputs.… 19 arXiv — NLP / Computation & Language research 27d ago Chatbots Output Meaningful (but Problematic) Language arXiv:2606.02973v1 Announce Type: new Abstract: Are utterances by AI chatbots meaningful? Concretely, if a user asks, say, Anthropic's agent Claude, "What is the capital of Spain?" and Claude answers, "Madrid is the capital of Spain," does that sentence have its ordinary meaning… 28 arXiv — NLP / Computation & Language research 27d ago Predicting Inference-Time Scaling Gains from Labeled Validation-Set Output Statistics arXiv:2606.02981v1 Announce Type: new Abstract: Best-of-$N$ inference scaling (drawing $N$ candidate answers from a language model and returning the one a reward model ranks highest) improves accuracy by an amount that varies across models, but predicting that amount in advance… 30 arXiv — NLP / Computation & Language research 27d ago The Unsampled Truth: Psychometrics in SLMs Measure Prompt Artifacts, Not Psychological Constructs arXiv:2606.03357v1 Announce Type: new Abstract: When prompting SLMs for psychometric assessments, researchers assume the outputs reflect semantic reasoning. We evaluate this premise across 13 open-weights models (0.6B to 14B parameters) using a prompt variation framework that… 18 arXiv — NLP / Computation & Language research 27d ago Consistency Training Can Entrench Misalignment arXiv:2606.03810v1 Announce Type: new Abstract: Consistency training encourages a model to produce similar outputs across related inputs or sampling procedures. Such methods are simple, scalable, and largely label-free, but their effects on model alignment remain poorly… 31 Hugging Face Daily Papers research 27d ago NVIDIA OmniDreams: Real-Time Generative World Model for Closed-Loop Autonomous Vehicle Simulation Abstract OmniDreams, a foundation generative world model trained from the Cosmos diffusion model, enables real-time action-conditioned video generation for autonomous driving policy evaluation in complex, unseen scenarios. Generated by Qwen/Qwen2.5-Coder-32B-Instruct As… 23 r/LocalLLaMA community 27d ago Why do we benchmark quants on perplexity and prose but never on tool call validity? The mixed precision quant discussion here lately, MoE aware stuff that keeps shared experts and the edge layers at higher precision is great, but it's almost all measured against perplexity and general output quality. What I never see is structured output. Tool call JSON,… 37 Hacker News — AI on Front Page community 27d ago Use your Nvidia GPU's VRAM as swap space on Linux Article URL: https://github.com/c0dejedi/nbd-vram Comments URL: https://news.ycombinator.com/item?id=48377404 Points: 225 # Comments: 65 18 r/LocalLLaMA community 27d ago Weird issue with OpenCode and Qwen3.6 I’m using Qwen3.6-27B running on my server with llama-server for AI coding with OpenCode. Sometimes for some reason, the response stops when its reasoning like if it has finished outputting the full response. I have to type “continue” and it continues working like if nothing… 30 llama.cpp releases dev-tools 27d ago b9483 hexagon: profiler output fix and script updates ( #24042 ) hex-ops: fix profiler output (ie remove the redundant NONEs) hex-prof: update profiling script to support tot.usec column macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED… 26 NVIDIA Developer Blog official-blog 27d ago Build Personal AI Agents on Windows PCs with New Tools from Microsoft and NVIDIA AI agents are changing how you interact with your PC. Creators, developers, and AI enthusiasts are already using these agents extensively to assist with... 17 r/LocalLLaMA community 27d ago Would you consider getting an NVIDIA RTX Spark laptop? If yes, why? If no, also say why. I’d consider one if it’s faster at local AI inference than my current hardware and can still handle gaming decently. The 128GB unified memory idea is pretty interesting, but I’m unsure about Windows on Arm and game compatibility. Would you buy… 27 The Information — AI news-outlet 27d ago Microsoft Debuts New Nvidia-Powered “Dev Box” PC; OS for Wearable and Desktop “Agent First” Devices Microsoft on Tuesday unveiled a new desktop PC powered by an Nvidia processor geared towards developers who want to run AI models locally. The new PC, called the Surface RTX Spark Dev Box, is powered by Nvidia’s new Arm-based chips announced earlier this week. Microsoft… 14 r/LocalLLaMA community 27d ago Why don't we still have any games with AI agents used as NPC characters? Do you remember this NVIDIA AI-NPC presentation from 3 years ago? https://www.youtube.com/watch?v=5R8xZb6J3r0 Where all of that? Why do we even try getting agents to do all the work if they still cannot be reliably used as a characters in the video games? Isn't it should be the… 11 r/LocalLLaMA community 27d ago I Put a Datacenter GPU in My Gaming PC for £200 Hey there! I wrote a blogpost about my experience running local models on a V100 from a newbie perspective and got loads of views outside of reddit, so I thought I'd share it here too!   submitted by   /u/tymscar [link]   [comments] 33 r/LocalLLaMA community 27d ago Benchmarks of 20 small LLMs on a 6GB RTX 4050 I'm looking for models that can run on my GPU and actually do something useful. I think that any small difference could be a "big" improvement, because they are all so small. So I went to the LM studio database and searched many variants from the same family, trying to select… 37 NVIDIA Developer Blog official-blog 27d ago Deploy Self-Evolving Agents for Faster, More Secure Research with a Hermes Agent and NVIDIA NemoClaw AI agents are a powerful tool for synthesizing data to accelerate research, summarize information, and help teams make decisions faster. But combining internal... 29 Hugging Face Daily Papers research 27d ago Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs Abstract Watermarking AI-generated text for detection fails when multiple models are used, as averaging outputs cancels perturbations and suppresses detection while improving quality and speed. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Watermarking embeds statistical… 4 r/LocalLLaMA community 27d ago What are you using to preprocess pdfs before feeding them to a local model? I have been running a local setup for document QA and the output quality varies a lot depending on what the pdf looks like when it hits the LLM. clean prose docs are fine but anything with tables or multi column layouts comes out garbled and the model just works with whatever… 37 Hugging Face Daily Papers research 27d ago AI, Take the Wheel: What Drives Delegation and Trust in Human-Computer Cooperative Question Answering? Abstract Human-AI collaboration in question-answering tasks reveals suboptimal reliance decisions where humans under-rely on correct AI suggestions and over-rely when AI misleads them, with confirmation bias contributing to reduced trust in conflicting AI outputs. Generated by… 25 Hugging Face Daily Papers research 28d ago Can Predicted Dynamics Exist in the Physical World? Abstract Physical admissibility validation for AI systems uses prediction-control interfaces with kinematic and dynamic conditions to filter invalid proposals while maintaining high performance. AI-generated summary Predictive Physical AI systems output state rollouts, action… 33 r/LocalLLaMA community 28d ago nvidia-LocateAnything-3B detects sushi as sweet in the video demo https://preview.redd.it/xc0l68bj7t4h1.png?width=616&format=png&auto=webp&s=48a8b14bc4ae95700cd4efa76772f4e71fb2d41a https://huggingface.co/nvidia/LocateAnything-3B funny how they left this in the demo atleast it's honest   submitted by   /u/chocofoxy [link]  … 7 r/LocalLLaMA community 28d ago NVIDIA releases Cosmos 3 Omnimodal world modelson HF https://huggingface.co/nvidia/Cosmos3-Super-Text2Image Nano: 16B Super: 64B Cosmos3 is a collection of Omnimodal world models capable of generating dynamic, high-quality video, image, audio, and action commands from combinations of text, image, video, and action trajectory… 7 r/LocalLLaMA community 28d ago Moss tts 1.5 8b Examples. It is the currently best voice cloning model for English as of June 2026 Moss tts 1.5 8b is better than fish audio s2 pro and qwen 3 tts voice clone tts. You can easily get more better quality if you set up the duration of the voice in output you want and some temperature and other changes. This was just used on default setting. It can be improved… 20 arXiv — Machine Learning research 28d ago BudgetDraft: Acceptance-Aware Multi-View Training for Sparse-KV Speculative Decoding arXiv:2606.00144v1 Announce Type: new Abstract: Speculative decoding speeds up autoregressive decoding by using a drafter to propose multiple tokens that a verifier validates in parallel. In resource-constrained deployments, the drafter uses a sparse KV cache to limit peak GPU… 33 arXiv — Machine Learning research 28d ago The Assistant as a Privileged Persona: A canonical reference in cross-persona self-recognition arXiv:2606.00545v1 Announce Type: new Abstract: Post-trained language models can recognize their own outputs from a sentence or two out of context. In a companion paper \citep{jack2026twomodes} we showed they can also recognize when they are currently acting on-policy, through… 31 arXiv — Machine Learning research 28d ago Same Payload, Different Channel: Measuring Trust Asymmetry in Tool-Using Language Models arXiv:2606.00566v1 Announce Type: new Abstract: As language models take on agentic roles that span calling external APIs, reading tool outputs, and acting on instructions embedded in third-party content, their attack surface expands well beyond what users type. Whether a model… 14 arXiv — NLP / Computation & Language research 28d ago Linguistics-Aware Non-Distortionary LLM Watermarking arXiv:2606.00613v1 Announce Type: new Abstract: Watermarking should identify language-model output without degrading quality or limiting verification to the model provider. Multilingual deployment makes this harder because morphology, segmentation, and script change where… 37 arXiv — NLP / Computation & Language research 28d ago EPIC: Efficient and Parallel Inference under CFG Constraints for Diffusion Language Models arXiv:2606.00722v1 Announce Type: new Abstract: Controlling language model outputs is essential for ensuring structural validity, reliability, and downstream usability, and diffusion language models are no exception. Recent advances in diffusion language model decoding have… 19 arXiv — NLP / Computation & Language research 28d ago HypothesisMed: Inference-Time Answer Fusion and Structured Hypothesis-Space Reporting for Biomedical Question Answering arXiv:2606.00971v1 Announce Type: new Abstract: Biomedical question answering with large language models is commonly evaluated using answer accuracy, but answer accuracy alone does not indicate whether a model can produce parseable outputs, follow structured reliability… 23 arXiv — NLP / Computation & Language research 28d ago A Finite-Calibration Regime Map for LLM Judge Panels arXiv:2606.01034v1 Announce Type: new Abstract: We study when LLM judge panels should be calibrated with low-dimensional stackers versus joint output tables under finite human-label budgets. Low-dimensional stackers have small estimation cost but miss interactions, whereas… 24 arXiv — NLP / Computation & Language research 28d ago When Is 0.1% Enough? Analyzing the Combined Effects of Dimensionality Reduction and Quantization on Text Embedding Compression arXiv:2606.01074v1 Announce Type: new Abstract: Recent high-performing text embedding models often output high-dimensional real-valued vectors, resulting in substantial storage and computational costs. To address this issue, compression methods based on dimensionality reduction… 18 Latent.Space news-outlet 28d ago [AINews] NVIDIA Cosmos 3, Nemotron 3 Ultra, and RTX Spark Jensen scores a huge win. 27 Hugging Face Daily Papers research 28d ago Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs Abstract Automated systems for generating scientific figures face limitations in handling diverse figure types and conditions, prompting the development of multi-agent frameworks that generalize across different input scenarios and produce editable output formats. AI-generated… 29 NVIDIA Developer Blog official-blog 28d ago Deploy Agentic-Ready AI at the Edge with Memory Efficiency in NVIDIA JetPack 7.2 As AI agents move from the digital world to the physical environment, they can readily use NVIDIA Jetson to accelerate real-world deployment with optimized... 24 NVIDIA Developer Blog official-blog 28d ago Run Local AI Agents with Faster Models and Multi-Node Clustering on NVIDIA DGX Spark The rise of autonomous, long-running AI agents has introduced a new class of compute demand, namely tasks that maintain large context windows, spawn concurrent... 16 TechCrunch — AI news-outlet 28d ago Nvidia chases $200B CPU market with AI agent PCs from Microsoft, Dell, and HP If Nvidia has cracked a way to bring AI agents easily, safely and usefully to the masses, it could — and should — be big. 38 r/LocalLLaMA community 28d ago ICYM: llama.cpp b9455 --SM Tensor KV Cache Fix is MERGED Them boys can cook, one big fix after another! If you're running --sm tensor on multi-gpu this is the KV cache quantization fix https://github.com/ggml-org/llama.cpp/releases/tag/b9455 JohannesGaessler commented 5 days ago This PR implements support for the combination of -sm… 11 llama.cpp releases dev-tools 28d ago b9464 speculative : fix n_outputs_max and remove draft-simple auto-enable ( #23988 ) speculative : add common_speculative_n_max helper function Extract the speculative max-draft-size logic from server_n_outputs_max into a reusable common_speculative_n_max() function in… 7 r/LocalLLaMA community 28d ago NVIDIA GB300 Grace Blackwell Ultra pricetags https://www.scan.co.uk/shop/ai-and-robotics/workstations-ai/nvidia-dgx-station   submitted by   /u/X-N2O [link]   [comments] 5 llama.cpp releases dev-tools 28d ago b9460 llama: limit max outputs of llama_context ( #23861 ) llama: save more VRAM by reserving n_outputs == n_seqs when possible add n_outputs_per_seq move n_outputs_max to server-context change ubatch to batch everywhere macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon… 15 r/LocalLLaMA community 28d ago Computex 2026: Intel launches Crescent Island GPU with up to 480GB VRAM https://www.neowin.net/news/computex-2026-intel-launches-crescent-island-gpu-with-up-to-480gb-vram/ Crescent Island is based on the company"s Arc Xe 3P architecture which lies inside current Panther Lake iGPs as well. This is Intel"s latest, most powerful card and it packs up to… 36 Ollama releases dev-tools 28d ago v0.30.0-rc32: llama-server followups (#16353) llama-server followups Misc fixes for #16031 Add back dropped ROCm build flag for multi-GPU support on windows Fix amdhip64_*.dll version detection for "latest" selection Fix embeddings API for consistent normalize behavior with prior versions ci: set up for automated llama.cpp… 19 Page 9 of 10 · 500 articles ← Newer Older →