Ahead of AI (Sebastian Raschka)

20 articles archived · Visit source ↗ · RSS

Ahead of AI (Sebastian Raschka) research 25d ago

My Workflow for Understanding LLM Architectures

A learning-oriented workflow for understanding new open-weight model releases

14
Ahead of AI (Sebastian Raschka) research 1mo ago

Components of A Coding Agent

How coding agents use tools, memory, and repo context to make LLMs work better in practice

7
Ahead of AI (Sebastian Raschka) research 1mo ago

A Visual Guide to Attention Variants in Modern LLMs

From MHA and GQA to MLA, sparse attention, and hybrid architectures

38
Ahead of AI (Sebastian Raschka) research 2mo ago

A Dream of Spring for Open-Weight LLMs: 10 Architectures from Jan-Feb 2026

A Round Up And Comparison of 10 Open-Weight LLM Releases in Spring 2026

30
Ahead of AI (Sebastian Raschka) research 3mo ago

Categories of Inference-Time Scaling for Improved LLM Reasoning

And an Overview of Recent Inference-Scaling Papers

11
Ahead of AI (Sebastian Raschka) research 4mo ago

The State Of LLMs 2025: Progress, Problems, and Predictions

A 2025 review of large language models, from DeepSeek R1 and RLVR to inference-time scaling, benchmarks, architectures, and predictions for 2026.

33
Ahead of AI (Sebastian Raschka) research 4mo ago

LLM Research Papers: The 2025 List (July to December)

In June, I shared a bonus article with my curated and bookmarked research paper lists to the paid subscribers who make this Substack possible.

25
Ahead of AI (Sebastian Raschka) research 5mo ago

From DeepSeek V3 to V3.2: Architecture, Sparse Attention, and RL Updates

Understanding How DeepSeek's Flagship Open-Weight Models Evolved

34
Ahead of AI (Sebastian Raschka) research 6mo ago

Beyond Standard LLMs

Linear Attention Hybrids, Text Diffusion, Code World Models, and Small Recursive Transformers

29
Ahead of AI (Sebastian Raschka) research 7mo ago

Understanding the 4 Main Approaches to LLM Evaluation (From Scratch)

Multiple-Choice Benchmarks, Verifiers, Leaderboards, and LLM Judges with Code Examples

29
Ahead of AI (Sebastian Raschka) research 8mo ago

Understanding and Implementing Qwen3 From Scratch

A Detailed Look at One of the Leading Open-Source LLMs

14
Ahead of AI (Sebastian Raschka) research 9mo ago

From GPT-2 to gpt-oss: Analyzing the Architectural Advances

And How They Stack Up Against Qwen3

15
Ahead of AI (Sebastian Raschka) research 9mo ago

The Big LLM Architecture Comparison

From DeepSeek-V3 to Kimi K2: A Look At Modern LLM Architecture Design

21
Ahead of AI (Sebastian Raschka) research 10mo ago

LLM Research Papers: The 2025 List (January to June)

A topic-organized collection of 200+ LLM research papers from 2025

37
Ahead of AI (Sebastian Raschka) research 11mo ago

Understanding and Coding the KV Cache in LLMs from Scratch

KV caches are one of the most critical techniques for efficient inference in LLMs in production.

5
Ahead of AI (Sebastian Raschka) research 12mo ago

Coding LLMs from the Ground Up: A Complete Course

Why build LLMs from scratch? It's probably the best and most efficient way to learn how LLMs really work. Plus, many readers have told me they had a lot of fun doing it.

30
Ahead of AI (Sebastian Raschka) research 12mo ago

The State of Reinforcement Learning for LLM Reasoning

Understanding GRPO and New Insights from Reasoning Model Papers

25
Ahead of AI (Sebastian Raschka) research 13mo ago

First Look at Reasoning From Scratch: Chapter 1

Welcome to the next stage of large language models (LLMs): reasoning. LLMs have transformed how we process and generate text, but their success has been largely driven by statistical pattern recognition. However, new advances in reasoning methodologies now enable LLMs to tackle…

25
Ahead of AI (Sebastian Raschka) research 14mo ago

The State of LLM Reasoning Model Inference

Inference-Time Compute Scaling Methods to Improve Reasoning Models

26
Ahead of AI (Sebastian Raschka) research 15mo ago

Understanding Reasoning LLMs

Methods and Strategies for Building and Refining Reasoning Models

26

My Workflow for Understanding LLM Architectures

Components of A Coding Agent

A Visual Guide to Attention Variants in Modern LLMs

A Dream of Spring for Open-Weight LLMs: 10 Architectures from Jan-Feb 2026

Categories of Inference-Time Scaling for Improved LLM Reasoning

The State Of LLMs 2025: Progress, Problems, and Predictions

LLM Research Papers: The 2025 List (July to December)

From DeepSeek V3 to V3.2: Architecture, Sparse Attention, and RL Updates

Beyond Standard LLMs

Understanding the 4 Main Approaches to LLM Evaluation (From Scratch)

Understanding and Implementing Qwen3 From Scratch

From GPT-2 to gpt-oss: Analyzing the Architectural Advances

The Big LLM Architecture Comparison

LLM Research Papers: The 2025 List (January to June)

Understanding and Coding the KV Cache in LLMs from Scratch

Coding LLMs from the Ground Up: A Complete Course

The State of Reinforcement Learning for LLM Reasoning

First Look at Reasoning From Scratch: Chapter 1

The State of LLM Reasoning Model Inference

Understanding Reasoning LLMs