Tag

Hardware

274 articles archived under #hardware · RSS

arXiv — Machine Learning research 1h ago

scKDGM: KAN-guided Dynamic Graph Masked Learning for Single-Cell RNA-seq Clustering

arXiv:2606.28459v1 Announce Type: new Abstract: Single-cell RNA sequencing (scRNA-seq) clustering is essential for identifying cell types, but high dimensionality, sparsity, dropout, and technical noise hinder robust expression representation and cell graph construction.…

27
arXiv — Machine Learning research 1h ago

Improving Patient Subtyping on Longitudinal Data using Representations from Mamba-based Architecture

arXiv:2606.28623v1 Announce Type: new Abstract: Effective sub-typing (also known as grouping or clustering) of patients using their electronic health record (EHR) data can greatly inform precision medicine efforts. However, subtyping temporal EHR datasets is known to be…

37
arXiv — Machine Learning research 1h ago

Nonlinear mixture model motivated subspace clustering

arXiv:2606.29261v1 Announce Type: new Abstract: We derive the linear union-of-subspaces (UoS) model for subspace clustering (SC) from the nonlinear mixture model (NMM) used in blind source separation (BSS) to represent a D-dimensional observation vector as an unknown…

7
r/LocalLLaMA community 14h ago

Instead of decentralized training effort we should build the “One dataset”

There are many threads here calling for united LLM training run of a new open model. Mainly, after govt. stunt of banning commercial frontier models. And also due to the lack of small-medium open-weight models releases lately. I genuinelly believe at some point we’ll have “SETI…

38
Import AI (Jack Clark) community 16h ago

Import AI 463: Self-improving robots; a 10k Chinese GPU cluster; and an elegiac essay for the human era

What eras bookend our interregnum?

36
TechCrunch — AI news-outlet 16h ago

Omen AI’s plan to optimize data centers is all wet

Omen AI raised a $31 million Series A to monitor chip coolant and stop bacterial outbreaks in data centers.

8
llama.cpp releases dev-tools 18h ago

b9840

DeepSeek V4 ( #24162 ) convert: add dsv4 conversion add basic setup add llm_graph_input_dsv4 add save-load state add sinkhorn eps - correction by @fairydreaming add rope fix cleanup dead code fix bugs support pro model: added by @fairydreaming remove redundant V cache Chat…

26
arXiv — Machine Learning research 1d ago

Dual-Learning based Penalized Multi-Align Clustering for Multi-View Incomplete and Disorderly Data

arXiv:2606.27984v1 Announce Type: new Abstract: Multimodal feature fusion can effectively capture complex patterns in real-world data by integrating complementary information from different modalities. However, in many applications, such as boiler combustion monitoring,…

18
arXiv — NLP / Computation & Language research 1d ago

Mechanism-Driven Monitors for Preemptive Detection of LLM Training Instability

arXiv:2606.28116v1 Announce Type: new Abstract: Frontier large language model training consumes massive accelerator fleets and long wall-clock computation, making stability failures costly when they occur. After a numerical or a hyperparameter fault has already destabilized the…

31
arXiv — NLP / Computation & Language research 1d ago

Cluster, Route, Escalate: Cascaded Framework for Cost-Aware LLM Serving

arXiv:2606.27457v1 Announce Type: cross Abstract: Efficient deployment of large language models (LLMs) in production forces a trade-off between accuracy and cost. Operators often default to a single model that is either expensive for easy queries or insufficient for hard ones.…

20
arXiv — NLP / Computation & Language research 1d ago

DG^VoiC: Speaker Clustering for Fraud Investigation under Real Call-Centre Conditions

arXiv:2606.28048v1 Announce Type: cross Abstract: Insurance fraud remains costly and operationally difficult, particularly in call-centre workflows where many customer interactions begin at FNOL. While recent fraud detection methods mainly rely on structured data, text, or…

19
r/LocalLLaMA community 1d ago

Success story with MiMo-V2.5-GGUF:UD-Q5_K_XL

I don't see many stories about this model, but after several attempts (after I finished finally reconfiguring my cluster) I did something useful with it: it wrote a built-in llama.cpp tool for executing C++ code and using the results. Here's an exercise that MiMo V2.5 gave me to…

27
Hacker News — AI on Front Page community 2d ago

AMD Strix Halo RDMA Cluster Setup Guide

Article URL: https://github.com/kyuz0/amd-strix-halo-vllm-toolboxes/blob/main/rdma_cluster/setup_guide.md Comments URL: https://news.ycombinator.com/item?id=48703258 Points: 207 # Comments: 61

22
TechCrunch — AI news-outlet 2d ago

SoftBank’s CEO isn’t the only one with questions about Elon Musk’s orbital data center hype

Not everyone is buying Elon Musk’s vision for orbital data centers.

19
r/MachineLearning community 2d ago

Kicking off GPU Mode [D]

Hey ! I’m starting a series to document my work on GPU infrastructure, LLMs, and CV. Stop #1 is up: A brief look at why GPUs are the center of the industry, the CPU/GPU divide, and why nvidia-smi is the first place you check when things break. We’ll move past the basics quickly…

27
r/MachineLearning community 3d ago

Roast my 3-year roadmap: Pivoting from Python/BaaS to AI Infrastructure & Go (Graduating 2029) [D]

I'm a B.Tech student in India graduating in mid-2029. Currently, I know Python, SQL, Docker, basic prompt engineering, and I've built a few LLM apps using BaaS like Supabase/Firebase. I’m running all this on an Intel i5 13th Gen laptop with an RTX 5050 (8GB VRAM). The Pivot: I…

10
TechCrunch — AI news-outlet 3d ago

Why everyone from OpenAI to SpaceX is building their own chips (and turning up the heat on Nvidia)

Nvidia has dominated the AI chip market for years, but the era of total dependence might be ending.   OpenAI just shared its plans to spice things up with Jalapeño, its custom inference chip built with Broadcom, joining Google, Apple, and SpaceX in a growing list…

35
r/LocalLLaMA community 3d ago

Why do people keep investing in Intel for AI?

If you get a good deal on some Xeons with a lot of memory bandwidth, or a cheap GPU for home inference, that's cool, no disrespect. But how in the hell are Wall Street types considering Intel part of the "AI picks and shovels" play? Who's buying Intel for their AI data centers?…

17
r/LocalLLaMA community 3d ago

8 Tesla T4 Cards, what should it do?

I have collected 8 Tesla T4 Datacenter Cards from a few retired VDI servers. I have one in a DEG1 and works ok on n its own. What should we do with the rest?   submitted by   /u/imonlysmarterthanyou [link]   [comments]

7
r/MachineLearning community 3d ago

Live Continual Learning in Machine Learning [D]

My question on live continual learning use cases was removed by moderators here because they think i asked basic level question about live continual learning which i thought is a frontier level research. But anyways. Is anyone interested in talking about continual learning…

30
TechCrunch — AI news-outlet 3d ago

OpenAI’s Jalapeño chip is Big Tech’s spiciest move away from Nvidia

Nvidia has dominated the AI chip market for years, but the era of total dependence might be ending.   OpenAI just shared its plans to spice things up with Jalapeño, its custom inference chip built with Broadcom, joining Google, Apple, and SpaceX in a growing list…

25
r/LocalLLaMA community 3d ago

Ornith 1.0 - terminology and concepts explained (basic)

I made a quick guide for myself while wanting to try the new models, so I share it with you. It's pretty basic, but it may be useful for new people here. I also published the repo with the open code config and the commands: https://github.com/facuHannoch/AI_Workflows-Ornith-1.0…

34
arXiv — Machine Learning research 4d ago

SharQ: Bridging Activation Sparsity and FP4 Quantization for LLM Inference

arXiv:2606.26587v1 Announce Type: new Abstract: Low-bit floating-point formats and semi-structured sparsity are increasingly supported by modern accelerators, yet combining them for LLM activation compression remains challenging: activations contain input-dependent outliers that…

29
arXiv — NLP / Computation & Language research 4d ago

Axon: A Synthesizing Superoptimizer for Tensor Programs

arXiv:2606.26344v1 Announce Type: cross Abstract: Writing high performance kernels for AI accelerators requires deep expertise in tiling, instruction selection, data layout, and operator fusion placing a significant burden on programmers. In this paper, we focus on tile based AI…

33
r/LocalLLaMA community 4d ago

When you don't have a data center GPU

Please don't tell me someone is going to (yet again) reply with the longest finetune-merge name in eternity...   submitted by   /u/Iwaku_Real [link]   [comments]

4
ThursdAI news-outlet 4d ago

GLM 5.2 total victory: the week open source won and nobody panicked

From CoreWeave: A chill week, but a total Open Source victory for GLM 5.2 + Sakana Fugu, Krea Open Sources, OpenAI makes inference chips with broadcom, Karpathy gets heat about the new Claude Tag...

35
arXiv — Machine Learning research 5d ago

Dense Supervision Is Not Enough: The Readout Blind Spot in Looped Language Models

arXiv:2606.24898v1 Announce Type: new Abstract: Looped language models turn hidden states into runtime state: each state is decoded for prediction and fed back into future computation. This creates a basic supervision question: which state variables does cross-entropy actually…

37
arXiv — NLP / Computation & Language research 5d ago

Is GraphRAG Needed? From Basic RAG to Graph-/Agentic Solutions with Context Optimization

arXiv:2606.25656v1 Announce Type: new Abstract: As advanced RAG variants like GraphRAG and Agentic RAG emerge, one leading question is when and how to use them. Here, we introduce a framework for different RAG scenarios evaluation and comparison on semi-structured knowledge…

21
arXiv — NLP / Computation & Language research 5d ago

Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining

arXiv:2606.26050v1 Announce Type: cross Abstract: Midway through an ordinary pretraining run, a small language model learns the pronoun-gender rule: cued with a girl's name ("Sue cried because"), it resolves the next pronoun to she, generalizing to held-out probes (0.94 by step…

4
r/LocalLLaMA community 5d ago

Locked Dell quote for 6x RTX PRO 6000 Max-Q at $8,960 — expires tonight. What would you do?

Building an inference cluster to run GLM 5.2 locally. Got a Dell quote locked at $8,959.99/unit for 6x RTX PRO 6000 Blackwell Max-Q (300W). List price just jumped to $15,999 yesterday. Quote expires in ~3 hours and I can't swing all 6 right now. I have a second quote for 2 units…

36
r/LocalLLaMA community 5d ago

Any chance I could cluster my DGX Spark (128GB unified memory) and my AMD Ryzen AI Max 395 (128GM unified memory) together to run 1 model?

Hey all, So I have a Nvidia DGX Spark and an AMD Strix 395, both have 128GB of unified memory. The Spark has 200Gbit network and the AMD Strix has 5Gbit ethernet (but it has a pcie gen 4x4 slot). Is there any chance I can cluster the 2 together to run a larger model that can fit…

30
r/LocalLLaMA community 5d ago

OpenAI and Broadcom unveil LLM-optimized inference chip

https://openai.com/index/openai-broadcom-jalapeno-inference-chip/ Quoted from the start of the blog post: Early testing shows that the first-generation accelerator will deliver performance per watt substantially better than current state-of-the-art Built from the ground up for…

11
Hacker News — AI on Front Page community 5d ago

45°C cooling design cuts data center water use to near zero

Article URL: https://blogs.nvidia.com/blog/liquid-cooling-ai-factories/ Comments URL: https://news.ycombinator.com/item?id=48660178 Points: 206 # Comments: 157

22
OpenAI official-blog 5d ago

OpenAI and Broadcom unveil LLM-optimized inference chip

OpenAI and Broadcom introduce Jalapeño, a custom AI chip built for LLM inference to improve performance, efficiency, and scale across AI systems.

28
arXiv — NLP / Computation & Language research 6d ago

ModTGCN: Modularity-aware Graph Neural Networks for Text Classification

arXiv:2606.23694v1 Announce Type: new Abstract: Graph-based text classification models typically rely on local neighborhood aggregation and overlook global community structure, despite semantic document graphs exhibiting strong class-consistent clustering. Ignoring this can blur…

22
arXiv — NLP / Computation & Language research 6d ago

Layer-wise Probing of wav2vec 2.0 and Whisper for Consonant Cluster Reduction in African American English

arXiv:2606.23948v1 Announce Type: new Abstract: Self-supervised and supervised speech models are increasingly used to investigate which linguistic information their internal representations encode, and at what level of abstraction they encode it. One underexplored phenomenon is…

6
Ars Technica — AI news-outlet 6d ago

Oracle’s 21,000 layoffs help drive its debt-fueled AI investments

Oracle is spending billions on data center infrastructure to support AI.

20
r/LocalLLaMA community 6d ago

Is it possible to run a giant model like GLM5.2 on this cluster (4x servers with 512GB RAM + dual AMD Epyc)? 16 channel memory should hit 409GB/s per node.

Hey all, I have a piece of hardware laying around which is pretty fast from a traditional (non-GPU) server viewpoint. The hardware is the following: Dell C6525 Server with Quad Node (4x server blades) with the following: 2x AMD EPYC 7702 64-Core Processors 8 memory channels per…

30
TechCrunch — AI news-outlet 7d ago

Nvidia wants to cut data center water use, but that’s not the same as fixing AI’s water problem

Nvidia announced a new cooling system that cuts water use inside the data center. But it does nothing to address AI's biggest water use — fossil fuel power plants.

5
TechCrunch — AI news-outlet 7d ago

SpaceX inks compute deal with Reflection AI, an open-source AI lab

Reflection AI will pay $150 million a month beginning July 1, 2026 through 2029 for immediate access to Nvidia's latest GB300 AI chips and supporting hardware across SpaceX's Colossus 2 data center near Memphis, Tennessee.

33
r/MachineLearning community 8d ago

Data-centric debugging for teams training neural nets [P]

We just did a big revamp of WeightsLab and wanted to share it here. If you’ve ever spent hours debugging a training run only to discover it was a data problem all along, this is for you. WeightsLab lets you pause training mid-run, inspect your live loss signals, and catch…

29
r/LocalLLaMA community 8d ago

What‘s your local „Haiku“-Replacement?

Seriously looking for a reliable and fast local Haiku replacement. Basically it should be able to summarize technical stuff, code documentation, architectural descriptions Any suggestions? Edit: sorry, totally forgot that my local machine is a M4 Max 128GB. But at the same time…

6
r/LocalLLaMA community 9d ago

Deep Neural Network that can turn any Image into a Playable Game! BUT LOCALLY, NOT ON DATACENTER

Hi everyone!! I really wanted to share my research what I've been working on. I wanted to build a nn that can simulate games, or at least start doing that Most video generators are too large to run on consumer hardware realtime, so I I designed a model that does this from…

14
r/LocalLLaMA community 10d ago

New Agentic Benchmark Out: Claude Fable and GLM 5.2 Top Their Cohorts

You can read about it here: https://artificialanalysis.ai/articles/aa-briefcase This is a solid benchmark from Artificial Analysis. It basically tests an LLMs ability to plan and execute tasks. And more importantly, it is a new benchmark that is not saturated, so no one can…

32
r/LocalLLaMA community 10d ago

EvoTensile: Evolutionary algorithms for AMD Tensile GEMM kernel tuning

There has been an effort to tune kernels in hipBLASLt so the most basic matmuls can run faster. It's known that on Strix Halo (gfx1151), GEMM with NN and TN input layouts (used in inference) are already well-tuned, while NT and TT layouts (used in training) are not yet tuned.…

8
arXiv — Machine Learning research 11d ago

Exploring the potential of AlphaEarth and TESSERA embeddings for Fine-scale Local Climate Zone Mapping: A case study across five cities in Switzerland

arXiv:2606.20034v1 Announce Type: new Abstract: Understanding urban spatial morphology is critical for climate modeling, risk assessment, and sustainable urban design, and Local Climate Zone (LCZ) mapping provides the basic framework for this. However, many cities still use…

10
arXiv — NLP / Computation & Language research 11d ago

Clusters are All You Need: Pre-Training the Tsetlin Machine with Semantic Clusters from Language Models for Interpretability

arXiv:2606.19815v1 Announce Type: new Abstract: Pre-trained language models such as BERT achieve strong text classification performance but lack transparency, limiting their use in high-stakes settings. The Tsetlin Machine (TM) offers fully interpretable, clause-based reasoning…

25
arXiv — NLP / Computation & Language research 11d ago

TransLaw: A Large-Scale Dataset and Multi-Agent Benchmark Simulating Professional Translation of Hong Kong Case Law

arXiv:2507.00875v3 Announce Type: replace Abstract: Translating Hong Kong Court Judgments from English to Traditional Chinese is mandated by Articles 8-9 of the Basic Law, yet remains constrained by a shortage of parallel resources and rigorous demands on legal terminology,…

38
arXiv — NLP / Computation & Language research 11d ago

ShoppingBench: A Real-World Intent-Grounded Shopping Benchmark for LLM-based Agents

arXiv:2508.04266v4 Announce Type: replace Abstract: Existing benchmarks in e-commerce primarily focus on basic user intents, such as finding or purchasing products. However, real-world users often pursue more complex goals, such as applying vouchers, managing budgets, and…

22
Hacker News — AI on Front Page community 11d ago

Show HN: Are You in the Weights?

With more traffic moving off-web and into LLMs, I got curious about what traces we leave "in the weights". My design partner and I built a site in the past few weeks that checks recognition across frontier and small models. It queries many of them in parallel, clusters the…

37

scKDGM: KAN-guided Dynamic Graph Masked Learning for Single-Cell RNA-seq Clustering

Improving Patient Subtyping on Longitudinal Data using Representations from Mamba-based Architecture

Nonlinear mixture model motivated subspace clustering

Instead of decentralized training effort we should build the “One dataset”

Import AI 463: Self-improving robots; a 10k Chinese GPU cluster; and an elegiac essay for the human era

Omen AI&#8217;s plan to optimize data centers is all wet

b9840

Dual-Learning based Penalized Multi-Align Clustering for Multi-View Incomplete and Disorderly Data

Mechanism-Driven Monitors for Preemptive Detection of LLM Training Instability

Cluster, Route, Escalate: Cascaded Framework for Cost-Aware LLM Serving

DG^VoiC: Speaker Clustering for Fraud Investigation under Real Call-Centre Conditions

Success story with MiMo-V2.5-GGUF:UD-Q5_K_XL

AMD Strix Halo RDMA Cluster Setup Guide

SoftBank’s CEO isn’t the only one with questions about Elon Musk&#8217;s orbital data center hype

Kicking off GPU Mode [D]

Roast my 3-year roadmap: Pivoting from Python/BaaS to AI Infrastructure & Go (Graduating 2029) [D]

Why everyone from OpenAI to SpaceX is building their own chips (and turning up the heat on Nvidia)

Why do people keep investing in Intel for AI?

8 Tesla T4 Cards, what should it do?

Live Continual Learning in Machine Learning [D]

OpenAI&#8217;s Jalapeño chip is Big Tech&#8217;s spiciest move away from Nvidia

Ornith 1.0 - terminology and concepts explained (basic)

SharQ: Bridging Activation Sparsity and FP4 Quantization for LLM Inference

Axon: A Synthesizing Superoptimizer for Tensor Programs

When you don't have a data center GPU

GLM 5.2 total victory: the week open source won and nobody panicked

Dense Supervision Is Not Enough: The Readout Blind Spot in Looped Language Models

Is GraphRAG Needed? From Basic RAG to Graph-/Agentic Solutions with Context Optimization

Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining

Locked Dell quote for 6x RTX PRO 6000 Max-Q at $8,960 — expires tonight. What would you do?

Any chance I could cluster my DGX Spark (128GB unified memory) and my AMD Ryzen AI Max 395 (128GM unified memory) together to run 1 model?

OpenAI and Broadcom unveil LLM-optimized inference chip

45°C cooling design cuts data center water use to near zero

OpenAI and Broadcom unveil LLM-optimized inference chip

ModTGCN: Modularity-aware Graph Neural Networks for Text Classification

Layer-wise Probing of wav2vec 2.0 and Whisper for Consonant Cluster Reduction in African American English

Oracle’s 21,000 layoffs help drive its debt-fueled AI investments

Is it possible to run a giant model like GLM5.2 on this cluster (4x servers with 512GB RAM + dual AMD Epyc)? 16 channel memory should hit 409GB/s per node.

Nvidia wants to cut data center water use, but that&#8217;s not the same as fixing AI&#8217;s water problem

SpaceX inks compute deal with Reflection AI, an open-source AI lab

Data-centric debugging for teams training neural nets [P]

What‘s your local „Haiku“-Replacement?

Deep Neural Network that can turn any Image into a Playable Game! BUT LOCALLY, NOT ON DATACENTER

New Agentic Benchmark Out: Claude Fable and GLM 5.2 Top Their Cohorts

EvoTensile: Evolutionary algorithms for AMD Tensile GEMM kernel tuning

Exploring the potential of AlphaEarth and TESSERA embeddings for Fine-scale Local Climate Zone Mapping: A case study across five cities in Switzerland

Clusters are All You Need: Pre-Training the Tsetlin Machine with Semantic Clusters from Language Models for Interpretability

TransLaw: A Large-Scale Dataset and Multi-Agent Benchmark Simulating Professional Translation of Hong Kong Case Law

ShoppingBench: A Real-World Intent-Grounded Shopping Benchmark for LLM-based Agents

Show HN: Are You in the Weights?

Omen AI’s plan to optimize data centers is all wet

SoftBank’s CEO isn’t the only one with questions about Elon Musk’s orbital data center hype

OpenAI’s Jalapeño chip is Big Tech’s spiciest move away from Nvidia

Nvidia wants to cut data center water use, but that’s not the same as fixing AI’s water problem