Hugging Face · November 25, 2025 · 45 min read

Diffusers welcomes FLUX-2

#image-gen

Mirrored from Hugging Face for archival readability. Support the source by reading on the original site.

Like Read original ↗

Back to Articles

Welcome FLUX.2 - BFL’s new open image generation model 🤗

Published November 25, 2025

Update on GitHub

Upvote

190

Aritra Roy Gosthipaty

ariG23498

Linoy Tsaban

linoyts

Apolinário from multimodal AI art

multimodalart

FLUX.2 is the recent series of image generation models from Black Forest Labs, preceded by the Flux.1 series. It is an entirely new model with a new architecture and pre-training done from scratch!

In this post, we discuss the key changes introduced in FLUX.2, performing inference with it under various setups, and LoRA fine-tuning.

🚨 FLUX.2 is not meant to be a drop-in replacement of FLUX.1, but a new image generation and editing model.

Table of contents

FLUX.2 introduction
Inference with Diffusers
Advanced Prompting
LoRA fine-tuning

FLUX.2: A Brief Introduction

FLUX.2 can be used for both image-guided and text-guided image generation. Furthermore, it can take multiple images as reference inputs, while producing the final output image. Below, we briefly discuss the key changes introduced in FLUX.2.

Text encoder

First, instead of two text encoders as in Flux.1, it uses a single text encoder — Mistral Small 3.1. Using a single text encoder greatly simplifies the process of computing prompt embeddings. The pipeline allows for a max_sequence_length of 512. Instead of using a single-layer output for the prompt embedding, FLUX.2 stacks outputs from intermediate layers, which have been known to be more beneficial.

DiT

FLUX.2 follows the same general multimodel diffusion transformer (MM-DiT) + parallel DiT architecture as Flux.1. As a refresher, MM-DiT blocks first process the image latents and conditioning text in separate streams, only joining the two together for the attention operation, and are thus referred to as “double-stream” blocks. The parallel blocks then operate on the concatenated image and text streams and can be regarded as “single-stream” blocks.

The key DiT changes from Flux.1 to FLUX.2 are as follows:

Time and guidance information (in the form of AdaLayerNorm-Zero modulation parameters) is shared across all double-stream and single-stream transformer blocks, respectively, rather than having individual modulation parameters for each block as in Flux.1.
None of the layers in the model use bias parameters. In particular, neither the attention nor feedforward (FF) sub-blocks of either transformer block use bias parameters in any of their layers.
In Flux.1, the single-stream transformer blocks fused the attention output projection with the FF output projection. FLUX.2 single-stream blocks also fuse the attention QKV projections with the FF input projection, creating a fully parallel transformer block:

Figure taken from the ViT-22B paper.

Note that compared to the ViT-22B block depicted above, FLUX.2 uses a SwiGLU-style MLP activation rather than a GELU activation (and also doesn’t use bias parameters).
A larger proportion of the transformer blocks in FLUX.2 are single-stream blocks (8 double-stream blocks to 48 single-stream blocks, compared to 19/38 for Flux.1). This also means that single-stream blocks make up a larger proportion of the DiT parameters: Flux.1[dev]-12B has ~54% of its total parameters in the double-stream blocks, whereas FLUX.2[dev]-32B has ~24% of its parameters in the double-stream blocks (and ~73% in the single-stream blocks).

Misc

A new Autoencoder aka AutoencoderKLFlux2
Better way to incorporate resolution-dependent timestep schedules

Inference With Diffusers

FLUX.2 uses a larger DiT and Mistral3 Small as its text encoder. When used together without any kind of offloading, the inference takes more than 80GB VRAM. In the following sections, we show how to perform inference with FLUX.2 in more accessible ways, under various system-level constraints.

Installation and Authentication

Before you try out the following code snippets, make sure you have installed diffusers from main and have run hf auth login.

pip uninstall diffusers -y && pip install git+https://github.com/huggingface/diffusers -U

Regular Inference

from diffusers import Flux2Pipeline
import torch

repo_id = "black-forest-labs/FLUX.2-dev"
pipe = Flux2Pipeline.from_pretrained(repo_id, torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()

image = pipe(
    prompt="dog dancing near the sun",
    num_inference_steps=50, # 28 is a good trade-off
    guidance_scale=4,
    height=1024,
    width=1024
).images[0]

The above code snippet was tested on an H100, and it isn’t sufficient to run inference on it without CPU offloading. With CPU offloading enabled, this setup takes ~62GB to run.

Users who have access to Hopper-series GPUs can take advantage of Flash Attention 3 to speed up inference:

from diffusers import Flux2Pipeline
import torch

repo_id = "black-forest-labs/FLUX.2-dev"
pipe = Flux2Pipeline.from_pretrained(path, torch_dtype=torch.bfloat16)
+ pipe.transformer.set_attention_backend("_flash_3_hub")
pipe.enable_model_cpu_offload()

image = pipe(
    prompt="dog dancing near the sun",
    num_inference_steps=50,
    guidance_scale=2.5,
    height=1024,
    width=1024
).images[0]

You can check out the supported attention backends (we have many!) here.

Resource-constrained

Using 4-bit quantization

Using bitsandbytes, we can load the transformer and text encoder models in 4-bit, allowing owners of 24GB GPUs to use the model locally. You can run this snippet on a GPU with ~20 GB of free VRAM.

Unfold

import torch
from transformers import Mistral3ForConditionalGeneration

from diffusers import Flux2Pipeline, Flux2Transformer2DModel

repo_id = "diffusers/FLUX.2-dev-bnb-4bit"
device = "cuda:0"
torch_dtype = torch.bfloat16

transformer = Flux2Transformer2DModel.from_pretrained(
  repo_id, subfolder="transformer", torch_dtype=torch_dtype, device_map="cpu"
)
text_encoder = Mistral3ForConditionalGeneration.from_pretrained(
  repo_id, subfolder="text_encoder", dtype=torch_dtype, device_map="cpu"
)

pipe = Flux2Pipeline.from_pretrained(
  repo_id, transformer=transformer, text_encoder=text_encoder, torch_dtype=torch_dtype
)
pipe.enable_model_cpu_offload()

prompt = "Realistic macro photograph of a hermit crab using a soda can as its shell, partially emerging from the can, captured with sharp detail and natural colors, on a sunlit beach with soft shadows and a shallow depth of field, with blurred ocean waves in the background. The can has the text `BFL Diffusers` on it and it has a color gradient that start with #FF5733 at the top and transitions to #33FF57 at the bottom."

image = pipe(
  prompt=prompt,
  generator=torch.Generator(device=device).manual_seed(42),
  num_inference_steps=50, # 28 is a good trade-off
  guidance_scale=4,
).images[0]

image.save("flux2_t2i_nf4.png")

Notice that we're using a repository that contains the NF4-quantized versions of the FLUX.2 DiT and the Mistral text encoder.

Local + remote

Due to the modular design of a Diffusers pipeline, we can isolate modules and work with them in sequence. We decouple the text encoder and deploy it to an Inference Endpoint. This helps us with freeing up the VRAM usage for the DiT and VAE only.

⚠️ To use the remote text encoder, you need to have a valid token. If you are already authenticated, no further action is needed.

The example below uses a combination of local and remote inference. Additionally, we quantize the DiT with NF4 quantization through bitsandbytes.

You can run this snippet on a GPU with 18 GB of VRAM:

Unfold

from diffusers import Flux2Pipeline, Flux2Transformer2DModel
from diffusers import BitsAndBytesConfig as DiffBitsAndBytesConfig
from huggingface_hub import get_token
import requests
import torch
import io

def remote_text_encoder(prompts: str | list[str]):
  def _encode_single(prompt: str):
      response = requests.post(
          "https://remote-text-encoder-flux-2.huggingface.co/predict",
          json={"prompt": prompt},
          headers={
              "Authorization": f"Bearer {get_token()}",
              "Content-Type": "application/json"
          }
      )
      assert response.status_code == 200, f"{response.status_code=}"
      return torch.load(io.BytesIO(response.content))

  if isinstance(prompts, (list, tuple)):
      embeds = [_encode_single(p) for p in prompts]
      return torch.cat(embeds, dim=0)

  return _encode_single(prompts).to("cuda")

repo_id = "black-forest-labs/FLUX.2-dev"
quantized_dit_id = "diffusers/FLUX.2-dev-bnb-4bit"
dit = Flux2Transformer2DModel.from_pretrained(
  quantized_dit_id, subfolder="transformer", torch_dtype=torch_dtype, device_map="cpu"
)

pipe = Flux2Pipeline.from_pretrained(
  repo_id,
  text_encoder=None,
  transformer=dit,
  torch_dtype=torch.bfloat16,
)
pipe.enable_model_cpu_offload()

print("Running remote text encoder ☁️")
prompt1 = "a photo of a forest with mist swirling around the tree trunks. The word 'FLUX.2' is painted over it in big, red brush strokes with visible texture"
prompt2 = "a photo of a dense forest with rain. The word 'FLUX.2' is painted over it in big, red brush strokes with visible texture"
prompt_embeds = remote_text_encoder([prompt1, prompt2])
print("Done ✅")

out = pipe(
  prompt_embeds=prompt_embeds,
  generator=torch.Generator(device="cuda").manual_seed(42),
  num_inference_steps=50, # 28 is a good trade-off
  guidance_scale=4,
  height=1024,
  width=1024,
)

for idx, image in enumerate(out.images):
  image.save(f"flux_out_{idx}.png")

For GPUs with even lower VRAM, we have group_offloading, which allows GPUs with as little as 8GB of free VRAM to use this model. However, you'll need 32GB of free RAM. Alternatively, if you're willing to sacrifice some speed, you can set low_cpu_mem_usage=True to reduce the RAM requirement to just 10GB.

Unfold

import io
import os

import requests
import torch

from diffusers import Flux2Pipeline, Flux2Transformer2DModel

repo_id = "diffusers/FLUX.2-dev-bnb-4bit"
torch_dtype = torch.bfloat16
device = "cuda"

def remote_text_encoder(prompts: str | list[str]):
  def _encode_single(prompt: str):
      response = requests.post(
          "https://remote-text-encoder-flux-2.huggingface.co/predict",
          json={"prompt": prompt},
          headers={"Authorization": f"Bearer {os.environ['HF_TOKEN']}", "Content-Type": "application/json"},
      )
      assert response.status_code == 200, f"{response.status_code=}"
      return torch.load(io.BytesIO(response.content))

  if isinstance(prompts, (list, tuple)):
      embeds = [_encode_single(p) for p in prompts]
      return torch.cat(embeds, dim=0)

  return _encode_single(prompts).to("cuda")

transformer = Flux2Transformer2DModel.from_pretrained(
  repo_id, subfolder="transformer", torch_dtype=torch_dtype, device_map="cpu"
)

pipe = Flux2Pipeline.from_pretrained(
  repo_id,
  text_encoder=None,
  transformer=transformer,
  torch_dtype=torch_dtype,
)
pipe.transformer.enable_group_offload(
  onload_device=device,
  offload_device="cpu",
  offload_type="leaf_level",
  use_stream=True,
  # low_cpu_mem_usage=True # uncomment for lower RAM usage
)
pipe.to(device)

prompt = "a photo of a forest with mist swirling around the tree trunks. The word 'FLUX.2' is painted over it in big, red brush strokes with visible texture"
prompt_embeds = remote_text_encoder(prompt)

image = pipe(
  prompt_embeds=prompt_embeds,
  generator=torch.Generator(device=device).manual_seed(42),
  num_inference_steps=50,
  guidance_scale=4,
  height=1024,
  width=1024,
).images[0]

You can check out other supported quantization backends here and other memory-saving techniques here.

To check how different quantizations affect an image, you can play with the playground below or access it as standlone in the FLUX.2 Quantization experiments Space

Multiple images as reference

FLUX.2 supports using multiple images as inputs, allowing you to use up to 10 images. However, keep in mind that each additional image will require more VRAM. You can reference the images by index (e.g., image 1, image 2) or by natural language (e.g., the kangaroo, the turtle). For optimal results, the best approach is to use a combination of both methods.

Unfold

import torch
from transformers import Mistral3ForConditionalGeneration

from diffusers import Flux2Pipeline, Flux2Transformer2DModel
from diffusers.utils import load_image

repo_id = "diffusers-internal-dev/new-model-image-final-weights"
device = "cuda:0"
torch_dtype = torch.bfloat16

pipe = Flux2Pipeline.from_pretrained(
  repo_id, torch_dtype=torch_dtype
)
pipe.enable_model_cpu_offload()

image_one = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/flux2_blog/kangaroo.png")
image_two = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/flux2_blog/turtle.png")

prompt = "the boxer kangaroo from image 1 and the martial artist turtle from image 2 are fighting in an epic battle scene at a beach of a tropical island, 35mm, depth of field, 50mm lens, f/3.5, cinematic lighting"

image = pipe(
  prompt=prompt,
  image=[image_one, image_two],
  generator=torch.Generator(device=device).manual_seed(42),
  num_inference_steps=50,
  guidance_scale=2.5,
  width=1024,
  height=768,
).images[0]

image.save(f"./flux2_t2i.png")

Multi-image input

Advanced Prompting

FLUX.2 supports advanced prompting techniques like structured JSON prompting, precise hex color control, and multi-reference image editing. Aside for the added control, this also allows for flexibility in changing specific attributes while maintaining others overall the same.
For example, let's start with this json as the base schema (taken from the official FLUX.2 prompting guide):

{
  "scene": "overall scene description",
  "subjects": [
    {
      "description": "detailed subject description",
      "position": "where in frame",
      "action": "what they're doing"
    }
  ],
  "style": "artistic style",
  "color_palette": ["#hex1", "#hex2", "#hex3"],
  "lighting": "lighting description",
  "mood": "emotional tone",
  "background": "background details",
  "composition": "framing and layout",
  "camera": {
    "angle": "camera angle",
    "lens": "lens type",
    "depth_of_field": "focus behavior"
  }
}

Building up on that, let's turn it into a prompt for a shot of a good old fashion walkman on a carpet (simply pass this prompt to your chosen diffusers inference example from above):

prompt = """
{
  "scene": "Professional studio product photography setup with soft-textured carpet surface",
  "subjects": [
    {
      "description": "Old silver Walkman placed on a carpet in the middle of an empty room",
      "pose": "Stationary, lying flat",
      "position": "Center foreground on carpeted surface",
      "color_palette": ["brushed silver", "dark gray accents"]
    }
  ],
  "style": "Ultra-realistic product photography with commercial quality",
  "color_palette": ["brushed silver", "neutral beige", "soft white highlights"],
  "lighting": "Three-point softbox setup creating soft, diffused highlights with no harsh shadows",
  "mood": "Clean, professional, minimalist",
  "background": "Soft-textured carpet surface with subtle studio backdrop suggesting an empty room",
  "composition": "rule of thirds",
  "camera": {
    "angle": "high angle",
    "distance": "medium shot",
    "focus": "Sharp focus on metallic Walkman textures and physical controls",
    "lens-mm": 85,
    "f-number": "f/5.6",
    "ISO": 200
  }
}

"""

Now, let's change the color of the carpet to a specific teal-blue shade (#367588) and add wired headphones plugged into the walkman:

prompt = """
{
  "scene": "Professional studio product photography setup with soft-textured carpet surface",
  "subjects": [
    {
      "description": "Old silver Walkman placed on a teal-blue carpet (#367588) in the middle of an empty room, with wired headphones plugged in",
      "pose": "Stationary, lying flat",
      "position": "Center foreground on carpeted surface",
      "color_palette": ["brushed silver", "dark gray accents", "#367588"]
    },
    {
      "description": "Wired headphones connected to the Walkman, cable loosely coiled on the carpet",
      "pose": "Stationary",
      "position": "Next to and partially in front of the Walkman on the carpet",
      "color_palette": ["dark gray", "soft black", "#367588"]
    }
  ],
  "style": "Ultra-realistic product photography with commercial quality",
  "color_palette": ["brushed silver", "#367588", "neutral beige", "soft white highlights"],
  "lighting": "Three-point softbox setup creating soft, diffused highlights with no harsh shadows",
  "mood": "Clean, professional, minimalist",
  "background": "Soft-textured teal-blue carpet surface (#367588) with subtle studio backdrop suggesting an empty room",
  "composition": "rule of thirds",
  "camera": {
    "angle": "high angle",
    "distance": "medium shot",
    "focus": "Sharp focus on metallic Walkman textures, wired headphones, and carpet fibers",
    "lens-mm": 85,
    "f-number": "f/5.6",
    "ISO": 200
  }
}
"""

The carpet color now matches the hex code provided, and the headphones have been with small changes to the overall scene.

Check out the official prompting guide for more examples and details.

LoRA fine-tuning

Being both a text-to-image and an image-to-image model, FLUX.2 makes the perfect fine-tuning candidate for many use-cases! However, as inference alone takes more than 80GB of VRAM, LoRA fine-tuning is even more challenging to run on consumer GPUs. To squeeze in as much memory saving as we can, we utilize some of the inference optimizations described above for training as well, together with shared memory saving techniques, to substantially reduce memory consumption. To train it, you can use either the diffusers code below or Ostris' AI Toolkit.

We provide both text-to-image and image-to-image training scripts, for the purpose of this blog will focus on a text-to-image training example.

Memory optimizations for fine-tuning

Many of these techniques complement each other and can be used together to reduce memory consumption further. However, some techniques may be mutually exclusive, so be sure to check before launching a training run.

Unfold to check details on the memory-saving techniques used:

Remote Text Encoder: to leverage the remote text encoding for training, simply pass --remote_text_encoder. Note that you must either be logged in to your Hugging Face account (hf auth login) OR pass a token with --hub_token.
CPU Offloading: by passing --offload the vae and text encoder to will be offloaded to CPU memory and only moved to GPU when needed.
Latent Caching: Pre-encode the training images with the vae, and then delete it to free up some memory. To enable latent_caching simply pass --cache_latents.
QLoRA: Low Precision Training with Quantization - using 8-bit or 4-bit quantization. You can use the following flags:
- FP8 training with torchao: enable FP8 training by passing --do_fp8_training. Since we are utilizing FP8 tensor cores, we need CUDA GPUs with compute capability at least 8.9 or greater. If you're looking for memory-efficient training on relatively older cards, we encourage you to check out other trainers like SimpleTuner, ai-toolkit, etc.
- NF4 training with bitsandbytes: Alternatively, you can use 8-bit or 4-bit quantization with bitsandbytes by passing:- --bnb_quantization_config_path with a corresponding path to a json file containing your config. see below for more details.
Gradient Checkpointing and Accumulation: --gradient accumulation refers to the number of updates steps to accumulate before performing a backward/update pass.by passing a value > 1 you can reduce the amount of backward/update passes and hence also memory reqs.* with --gradient checkpointing we can save memory by not storing all intermediate activations during the forward pass.Instead, only a subset of these activations (the checkpoints) are stored and the rest is recomputed as needed during the backward pass. Note that this comes at the expanse of a slower backward pass.
8-bit-Adam Optimizer: When training with AdamW(doesn't apply to prodigy) You can pass --use_8bit_adam to reduce the memory requirements of training. Make sure to install bitsandbytes if you want to do so.

Please make sure to check out the README for prerequisites before starting training.

For this example, we’ll use multimodalart/1920-raider-waite-tarot-public-domain dataset with the following configuration using FP8 training. Feel free to experiment more with the hyper-parameters and share your results 🤗

accelerate launch train_dreambooth_lora_flux2.py \
  --pretrained_model_name_or_path="black-forest-labs/FLUX.2-dev"  \
  --mixed_precision="bf16" \
  --gradient_checkpointing \
  --remote_text_encoder \
  --cache_latents \
  --caption_column="caption"\
  --do_fp8_training \
  --dataset_name="multimodalart/1920-raider-waite-tarot-public-domain" \
  --output_dir="tarot_card_Flux2_LoRA" \
  --instance_prompt="trcrd tarot card" \
  --resolution=1024 \
  --train_batch_size=2 \
  --guidance_scale=1 \
  --gradient_accumulation_steps=1 \
  --optimizer="adamW" \
  --use_8bit_adam\
  --learning_rate=1e-4 \
  --report_to="wandb" \
  --lr_scheduler="constant_with_warmup" \
  --lr_warmup_steps=200 \
  --checkpointing_steps=250\
  --max_train_steps=1000 \
  --rank=8\
  --validation_prompt="a trtcrd of a person on a computer, on the computer you see a meme being made with an ancient looking trollface, 'the shitposter' arcana, in the style of TOK a trtcrd, tarot style" \
  --validation_epochs=25 \
  --seed="0"\
  --push_to_hub

LoRA finetuning

The left image was generated using the pre-trained FLUX.2 model, and the right image was produced the LoRA.

In case your hardware isn’t compatible with FP8 training, you can use QLoRA with bitsandbytes. You first need to define a config.json file like so:

{
    "load_in_4bit": true,
    "bnb_4bit_quant_type": "nf4"
}

And then pass its path to --bnb_quantization_config_path:

accelerate launch train_dreambooth_lora_flux2.py \
  --pretrained_model_name_or_path="black-forest-labs/FLUX.2-dev"  \
  --mixed_precision="bf16" \
  --gradient_checkpointing \
  --remote_text_encoder \
  --cache_latents \
  --caption_column="caption"\
  **--bnb_quantization_config_path="config.json" \**
  --dataset_name="multimodalart/1920-raider-waite-tarot-public-domain" \
  --output_dir="tarot_card_Flux2_LoRA" \
  --instance_prompt="a tarot card" \
  --resolution=1024 \
  --train_batch_size=2 \
  --guidance_scale=1 \
  --gradient_accumulation_steps=1 \
  --optimizer="adamW" \
  --use_8bit_adam\
  --learning_rate=1e-4 \
  --report_to="wandb" \
  --lr_scheduler="constant_with_warmup" \
  --lr_warmup_steps=200 \
  --max_train_steps=1000 \
  --rank=8\
  --validation_prompt="a trtcrd of a person on a computer, on the computer you see a meme being made with an ancient looking trollface, 'the shitposter' arcana, in the style of TOK a trtcrd, tarot style" \
  --seed="0"

Resources

FLUX.2 announcement post
Diffusers documentation
FLUX.2 official demo
FLUX.2 on the Hub
FLUX.2 original codebase

Models mentioned in this article 1

Datasets mentioned in this article 1

Collections mentioned in this article 1

(LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware

105

June 19, 2025

guidediffusersquantization

Exploring Quantization Backends in Diffusers

May 21, 2025

Community

Amazing! thanks team. \n","updatedAt":"2025-11-25T16:35:43.703Z","author":{"_id":"633c3172ec4e4abf307b7dc6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/633c3172ec4e4abf307b7dc6/PQxG5qMMqlQ-BtjMNUl32.png","fullname":"Charchit Sharma","name":"charchits7","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":11,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9754365086555481},"editors":["charchits7"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/633c3172ec4e4abf307b7dc6/PQxG5qMMqlQ-BtjMNUl32.png"],"reactions":[{"reaction":"❤️","users":["sayakpaul","Graham-USMC","fogside","SekkSea","rkfg","ariG23498","chivier","ovuruska","Chelowek054","maximilianofir"],"count":10}],"isReport":false}},{"id":"69260c39763efbaa0e7259f7","author":{"_id":"5fd5e18a90b6dc4633f6d292","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fd5e18a90b6dc4633f6d292/gZXHW5dd9R86AV9LMZ--y.png","fullname":"Maziyar Panahi","name":"MaziyarPanahi","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4981,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fd5e18a90b6dc4633f6d292/QPiv8pt4JNxr0FdGnpFef.png","fullname":"OpenMed","name":"OpenMed","type":"org","isHf":false,"details":"Health x AI","plan":"team"}},"createdAt":"2025-11-25T20:06:17.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"amazing! great work! 👏\nis there a support for multi-gpus? (device_map=auto)","html":"amazing! great work! 👏 is there a support for multi-gpus? (device_map=auto)\n","updatedAt":"2025-11-25T20:06:17.858Z","author":{"_id":"5fd5e18a90b6dc4633f6d292","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fd5e18a90b6dc4633f6d292/gZXHW5dd9R86AV9LMZ--y.png","fullname":"Maziyar Panahi","name":"MaziyarPanahi","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4981,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fd5e18a90b6dc4633f6d292/QPiv8pt4JNxr0FdGnpFef.png","fullname":"OpenMed","name":"OpenMed","type":"org","isHf":false,"details":"Health x AI","plan":"team"}}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8909691572189331},"editors":["MaziyarPanahi"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/5fd5e18a90b6dc4633f6d292/gZXHW5dd9R86AV9LMZ--y.png"],"reactions":[{"reaction":"👍","users":["noah-vandal"],"count":1}],"isReport":false},"replies":[{"id":"6926b8f3bf5c52ab85acc28a","author":{"_id":"5f7fbd813e94f16a85448745","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1649681653581-5f7fbd813e94f16a85448745.jpeg","fullname":"Sayak Paul","name":"sayakpaul","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":913,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583856921041-5dd96eb166059660ed1ee413.png","fullname":"Hugging Face","name":"huggingface","type":"org","isHf":true,"details":"The AI community building the future.","plan":"team"}},"createdAt":"2025-11-26T08:23:15.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"You should be able to incorporate that in different forms. Check this out:\nhttps://huggingface.co/docs/diffusers/main/en/training/distributed_inference","html":"You should be able to incorporate that in different forms. Check this out: <a href=\"https://huggingface.co/docs/diffusers/main/en/training/distributed_inference\">https://huggingface.co/docs/diffusers/main/en/training/distributed_inference</a>\n","updatedAt":"2025-11-26T08:23:15.787Z","author":{"_id":"5f7fbd813e94f16a85448745","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1649681653581-5f7fbd813e94f16a85448745.jpeg","fullname":"Sayak Paul","name":"sayakpaul","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":913,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583856921041-5dd96eb166059660ed1ee413.png","fullname":"Hugging Face","name":"huggingface","type":"org","isHf":true,"details":"The AI community building the future.","plan":"team"}}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8594894409179688},"editors":["sayakpaul"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1649681653581-5f7fbd813e94f16a85448745.jpeg"],"reactions":[{"reaction":"🤗","users":["ariG23498","MaziyarPanahi"],"count":2}],"isReport":false,"parentCommentId":"69260c39763efbaa0e7259f7"}},{"id":"6929a61f44399d33a72bab63","author":{"_id":"5fd5e18a90b6dc4633f6d292","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fd5e18a90b6dc4633f6d292/gZXHW5dd9R86AV9LMZ--y.png","fullname":"Maziyar Panahi","name":"MaziyarPanahi","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4981,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fd5e18a90b6dc4633f6d292/QPiv8pt4JNxr0FdGnpFef.png","fullname":"OpenMed","name":"OpenMed","type":"org","isHf":false,"details":"Health x AI","plan":"team"}},"createdAt":"2025-11-28T13:39:43.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"> You should be able to incorporate that in different forms. Check this out:\n> https://huggingface.co/docs/diffusers/main/en/training/distributed_inference\nbeautiful! thank you, will try it today! \n","html":"<blockquote>\nYou should be able to incorporate that in different forms. Check this out: <a href=\"https://huggingface.co/docs/diffusers/main/en/training/distributed_inference\">https://huggingface.co/docs/diffusers/main/en/training/distributed_inference</a> beautiful! thank you, will try it today! \n</blockquote>\n","updatedAt":"2025-11-28T13:39:43.420Z","author":{"_id":"5fd5e18a90b6dc4633f6d292","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fd5e18a90b6dc4633f6d292/gZXHW5dd9R86AV9LMZ--y.png","fullname":"Maziyar Panahi","name":"MaziyarPanahi","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4981,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fd5e18a90b6dc4633f6d292/QPiv8pt4JNxr0FdGnpFef.png","fullname":"OpenMed","name":"OpenMed","type":"org","isHf":false,"details":"Health x AI","plan":"team"}}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8715914487838745},"editors":["MaziyarPanahi"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/5fd5e18a90b6dc4633f6d292/gZXHW5dd9R86AV9LMZ--y.png"],"reactions":[],"isReport":false,"parentCommentId":"69260c39763efbaa0e7259f7"}}]},{"id":"6926ae9ce74a6113cd906800","author":{"_id":"6316d72a29411a6864ba18d6","avatarUrl":"/avatars/c167d71e34cb8c01f0db76a6d9dd7e38.svg","fullname":"muntedslunt","name":"muntedslunt","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false},"createdAt":"2025-11-26T07:39:08.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"lol... nope","html":"lol... nope\n","updatedAt":"2025-11-26T07:39:08.832Z","author":{"_id":"6316d72a29411a6864ba18d6","avatarUrl":"/avatars/c167d71e34cb8c01f0db76a6d9dd7e38.svg","fullname":"muntedslunt","name":"muntedslunt","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"fi","probability":0.4861249625682831},"editors":["muntedslunt"],"editorAvatarUrls":["/avatars/c167d71e34cb8c01f0db76a6d9dd7e38.svg"],"reactions":[{"reaction":"😔","users":["ariG23498"],"count":1}],"isReport":false},"replies":[{"id":"6926b9026fb402142b1885ea","author":{"_id":"5f7fbd813e94f16a85448745","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1649681653581-5f7fbd813e94f16a85448745.jpeg","fullname":"Sayak Paul","name":"sayakpaul","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":913,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583856921041-5dd96eb166059660ed1ee413.png","fullname":"Hugging Face","name":"huggingface","type":"org","isHf":true,"details":"The AI community building the future.","plan":"team"}},"createdAt":"2025-11-26T08:23:30.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"What's that supposed to mean?","html":"What's that supposed to mean?\n","updatedAt":"2025-11-26T08:23:30.260Z","author":{"_id":"5f7fbd813e94f16a85448745","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1649681653581-5f7fbd813e94f16a85448745.jpeg","fullname":"Sayak Paul","name":"sayakpaul","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":913,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583856921041-5dd96eb166059660ed1ee413.png","fullname":"Hugging Face","name":"huggingface","type":"org","isHf":true,"details":"The AI community building the future.","plan":"team"}}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9989904761314392},"editors":["sayakpaul"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1649681653581-5f7fbd813e94f16a85448745.jpeg"],"reactions":[],"isReport":false,"parentCommentId":"6926ae9ce74a6113cd906800"}}]},{"id":"6926df57f265b7e9efc9e8e4","author":{"_id":"663ba7b476e6d5b98fd22645","avatarUrl":"/avatars/204817bb7e23e5e5eaf51bb80ae8c630.svg","fullname":"Nguyen Nhu Giap","name":"NhuGiap","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2025-11-26T11:07:03.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Hi, can you tell me a bit about your motivations behind omitting all bias parameters in network architecture? Thanks!","html":"Hi, can you tell me a bit about your motivations behind omitting all bias parameters in network architecture? Thanks!\n","updatedAt":"2025-11-26T11:07:03.829Z","author":{"_id":"663ba7b476e6d5b98fd22645","avatarUrl":"/avatars/204817bb7e23e5e5eaf51bb80ae8c630.svg","fullname":"Nguyen Nhu Giap","name":"NhuGiap","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8452562093734741},"editors":["NhuGiap"],"editorAvatarUrls":["/avatars/204817bb7e23e5e5eaf51bb80ae8c630.svg"],"reactions":[],"isReport":false},"replies":[{"id":"6926e3c2f4cfb3d76eecff92","author":{"_id":"5f7fbd813e94f16a85448745","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1649681653581-5f7fbd813e94f16a85448745.jpeg","fullname":"Sayak Paul","name":"sayakpaul","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":913,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583856921041-5dd96eb166059660ed1ee413.png","fullname":"Hugging Face","name":"huggingface","type":"org","isHf":true,"details":"The AI community building the future.","plan":"team"}},"createdAt":"2025-11-26T11:25:54.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"That's a question for the Black Forest Labs team, not us.","html":"That's a question for the Black Forest Labs team, not us.\n","updatedAt":"2025-11-26T11:25:54.747Z","author":{"_id":"5f7fbd813e94f16a85448745","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1649681653581-5f7fbd813e94f16a85448745.jpeg","fullname":"Sayak Paul","name":"sayakpaul","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":913,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583856921041-5dd96eb166059660ed1ee413.png","fullname":"Hugging Face","name":"huggingface","type":"org","isHf":true,"details":"The AI community building the future.","plan":"team"}}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9801974892616272},"editors":["sayakpaul"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1649681653581-5f7fbd813e94f16a85448745.jpeg"],"reactions":[{"reaction":"➕","users":["ariG23498","hroch"],"count":2}],"isReport":false,"parentCommentId":"6926df57f265b7e9efc9e8e4"}}]},{"id":"69270f150706eeebd5b81399","createdAt":"2025-11-26T14:30:45.000Z","type":"comment","data":{"edited":true,"hidden":true,"hiddenBy":"","latest":{"raw":"This comment has been hidden","html":"This comment has been hidden","updatedAt":"2025-12-26T23:00:07.845Z"},"numEdits":0,"editors":[],"editorAvatarUrls":[],"reactions":[]}},{"id":"692727cacfcedf38b072c769","author":{"_id":"68c00ad35db4872ee259c823","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/rwnTddduiJMd0c0QUXCb1.png","fullname":"Guilherme Sabino Vaz","name":"guilhermevaz","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2025-11-26T16:16:10.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Amazing work! Can you tell me when the depth-maps model will be released?\nHas anyone already tried giving a depth map as a normal image? How does the model behave?","html":"Amazing work! Can you tell me when the depth-maps model will be released? Has anyone already tried giving a depth map as a normal image? How does the model behave?\n","updatedAt":"2025-11-26T16:16:10.603Z","author":{"_id":"68c00ad35db4872ee259c823","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/rwnTddduiJMd0c0QUXCb1.png","fullname":"Guilherme Sabino Vaz","name":"guilhermevaz","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9621202945709229},"editors":["guilhermevaz"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/rwnTddduiJMd0c0QUXCb1.png"],"reactions":[],"isReport":false}},{"id":"692746e2cfcedf38b072c77f","author":{"_id":"683d1665e41c42facedbcaf8","avatarUrl":"/avatars/3c1daed6469b74f67acf9606172bf974.svg","fullname":"Sven Killig","name":"sonic74","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2025-11-26T18:28:50.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"https://github.com/huggingface/blog/blob/main/flux-2.md?plain=1#L283 causes\n```\n transformer_id, subfolder=\"transformer\", torch_dtype=torch_dtype, device_map=\"cpu\"\n ^^^^^^^^^^^^^^\nNameError: name 'transformer_id' is not defined\n```","html":"<a href=\"https://github.com/huggingface/blog/blob/main/flux-2.md?plain=1#L283\" rel=\"nofollow\">https://github.com/huggingface/blog/blob/main/flux-2.md?plain=1#L283</a> causes\n<pre><code> transformer_id, subfolder=\"transformer\", torch_dtype=torch_dtype, device_map=\"cpu\"\n ^^^^^^^^^^^^^^\nNameError: name 'transformer_id' is not defined\n</code></pre>\n","updatedAt":"2025-11-27T03:45:29.131Z","author":{"_id":"683d1665e41c42facedbcaf8","avatarUrl":"/avatars/3c1daed6469b74f67acf9606172bf974.svg","fullname":"Sven Killig","name":"sonic74","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.47359123826026917},"editors":["sonic74"],"editorAvatarUrls":["/avatars/3c1daed6469b74f67acf9606172bf974.svg"],"reactions":[],"isReport":false},"replies":[{"id":"6927c3288b9bc560603d668d","author":{"_id":"608aabf24955d2bfc3cd99c6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/608aabf24955d2bfc3cd99c6/-YxmtpzEmf3NKOTktODRP.jpeg","fullname":"Aritra Roy Gosthipaty","name":"ariG23498","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":645,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583856921041-5dd96eb166059660ed1ee413.png","fullname":"Hugging Face","name":"huggingface","type":"org","isHf":true,"details":"The AI community building the future.","plan":"team"}},"createdAt":"2025-11-27T03:19:04.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Probably an installation error?\n\n`pip install git+https://github.com/huggingface/diffusers -U` should help you with this.","html":"Probably an installation error?\n<code>pip install git+https://github.com/huggingface/diffusers -U</code> should help you with this.\n","updatedAt":"2025-11-27T03:19:04.126Z","author":{"_id":"608aabf24955d2bfc3cd99c6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/608aabf24955d2bfc3cd99c6/-YxmtpzEmf3NKOTktODRP.jpeg","fullname":"Aritra Roy Gosthipaty","name":"ariG23498","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":645,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583856921041-5dd96eb166059660ed1ee413.png","fullname":"Hugging Face","name":"huggingface","type":"org","isHf":true,"details":"The AI community building the future.","plan":"team"}}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8660337924957275},"editors":["ariG23498"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/608aabf24955d2bfc3cd99c6/-YxmtpzEmf3NKOTktODRP.jpeg"],"reactions":[],"isReport":false,"parentCommentId":"692746e2cfcedf38b072c77f"}},{"id":"6927c39309f9a13f42877020","author":{"_id":"683d1665e41c42facedbcaf8","avatarUrl":"/avatars/3c1daed6469b74f67acf9606172bf974.svg","fullname":"Sven Killig","name":"sonic74","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2025-11-27T03:20:51.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"No, it's an undefined variable in line 30 of the `enable_group_offload` example.","html":"No, it's an undefined variable in line 30 of the <code>enable_group_offload</code> example.\n","updatedAt":"2025-11-27T03:20:51.421Z","author":{"_id":"683d1665e41c42facedbcaf8","avatarUrl":"/avatars/3c1daed6469b74f67acf9606172bf974.svg","fullname":"Sven Killig","name":"sonic74","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6889880299568176},"editors":["sonic74"],"editorAvatarUrls":["/avatars/3c1daed6469b74f67acf9606172bf974.svg"],"reactions":[],"isReport":false,"parentCommentId":"692746e2cfcedf38b072c77f"}},{"id":"6927c47bf265b7e9efc9e90d","author":{"_id":"608aabf24955d2bfc3cd99c6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/608aabf24955d2bfc3cd99c6/-YxmtpzEmf3NKOTktODRP.jpeg","fullname":"Aritra Roy Gosthipaty","name":"ariG23498","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":645,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583856921041-5dd96eb166059660ed1ee413.png","fullname":"Hugging Face","name":"huggingface","type":"org","isHf":true,"details":"The AI community building the future.","plan":"team"}},"createdAt":"2025-11-27T03:24:43.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Could you share a colab notebook or a github gist with the code? I could look into it that way.","html":"Could you share a colab notebook or a github gist with the code? I could look into it that way.\n","updatedAt":"2025-11-27T03:24:43.358Z","author":{"_id":"608aabf24955d2bfc3cd99c6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/608aabf24955d2bfc3cd99c6/-YxmtpzEmf3NKOTktODRP.jpeg","fullname":"Aritra Roy Gosthipaty","name":"ariG23498","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":645,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583856921041-5dd96eb166059660ed1ee413.png","fullname":"Hugging Face","name":"huggingface","type":"org","isHf":true,"details":"The AI community building the future.","plan":"team"}}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9770315885543823},"editors":["ariG23498"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/608aabf24955d2bfc3cd99c6/-YxmtpzEmf3NKOTktODRP.jpeg"],"reactions":[],"isReport":false,"parentCommentId":"692746e2cfcedf38b072c77f"}},{"id":"6927c58dbf5c52ab85acc2a8","author":{"_id":"683d1665e41c42facedbcaf8","avatarUrl":"/avatars/3c1daed6469b74f67acf9606172bf974.svg","fullname":"Sven Killig","name":"sonic74","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2025-11-27T03:29:17.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"It's just a copy of https://github.com/huggingface/blog/blob/main/flux-2.md?plain=1#L283 but here you go: https://gist.github.com/sonic74/423c03483fbc13e7fd99ac97bcec8ff8","html":"It's just a copy of <a href=\"https://github.com/huggingface/blog/blob/main/flux-2.md?plain=1#L283\" rel=\"nofollow\">https://github.com/huggingface/blog/blob/main/flux-2.md?plain=1#L283</a> but here you go: <a href=\"https://gist.github.com/sonic74/423c03483fbc13e7fd99ac97bcec8ff8\" rel=\"nofollow\">https://gist.github.com/sonic74/423c03483fbc13e7fd99ac97bcec8ff8</a>\n","updatedAt":"2025-11-27T03:43:12.764Z","author":{"_id":"683d1665e41c42facedbcaf8","avatarUrl":"/avatars/3c1daed6469b74f67acf9606172bf974.svg","fullname":"Sven Killig","name":"sonic74","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":3,"identifiedLanguage":{"language":"en","probability":0.6202142238616943},"editors":["sonic74"],"editorAvatarUrls":["/avatars/3c1daed6469b74f67acf9606172bf974.svg"],"reactions":[],"isReport":false,"parentCommentId":"692746e2cfcedf38b072c77f"}},{"id":"6927c7cd09ea436c473a25df","author":{"_id":"683d1665e41c42facedbcaf8","avatarUrl":"/avatars/3c1daed6469b74f67acf9606172bf974.svg","fullname":"Sven Killig","name":"sonic74","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2025-11-27T03:38:53.000Z","type":"comment","data":{"edited":true,"hidden":true,"hiddenBy":"","latest":{"raw":"This comment has been hidden","html":"This comment has been hidden","updatedAt":"2025-11-27T03:42:09.048Z","author":{"_id":"683d1665e41c42facedbcaf8","avatarUrl":"/avatars/3c1daed6469b74f67acf9606172bf974.svg","fullname":"Sven Killig","name":"sonic74","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"editors":[],"editorAvatarUrls":[],"reactions":[],"parentCommentId":"692746e2cfcedf38b072c77f"}},{"id":"6927cf079200762340f9130a","author":{"_id":"608aabf24955d2bfc3cd99c6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/608aabf24955d2bfc3cd99c6/-YxmtpzEmf3NKOTktODRP.jpeg","fullname":"Aritra Roy Gosthipaty","name":"ariG23498","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":645,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583856921041-5dd96eb166059660ed1ee413.png","fullname":"Hugging Face","name":"huggingface","type":"org","isHf":true,"details":"The AI community building the future.","plan":"team"}},"createdAt":"2025-11-27T04:09:43.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"I think this would work: https://github.com/ariG23498/custom-inference-endpoint/blob/main/flux.2-with-remote-text-encoder.ipynb","html":"I think this would work: <a href=\"https://github.com/ariG23498/custom-inference-endpoint/blob/main/flux.2-with-remote-text-encoder.ipynb\" rel=\"nofollow\">https://github.com/ariG23498/custom-inference-endpoint/blob/main/flux.2-with-remote-text-encoder.ipynb</a>\n","updatedAt":"2025-11-27T04:09:43.090Z","author":{"_id":"608aabf24955d2bfc3cd99c6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/608aabf24955d2bfc3cd99c6/-YxmtpzEmf3NKOTktODRP.jpeg","fullname":"Aritra Roy Gosthipaty","name":"ariG23498","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":645,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583856921041-5dd96eb166059660ed1ee413.png","fullname":"Hugging Face","name":"huggingface","type":"org","isHf":true,"details":"The AI community building the future.","plan":"team"}}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8133991956710815},"editors":["ariG23498"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/608aabf24955d2bfc3cd99c6/-YxmtpzEmf3NKOTktODRP.jpeg"],"reactions":[],"isReport":false,"parentCommentId":"692746e2cfcedf38b072c77f"}},{"id":"6927d1cdf265b7e9efc9e912","author":{"_id":"683d1665e41c42facedbcaf8","avatarUrl":"/avatars/3c1daed6469b74f67acf9606172bf974.svg","fullname":"Sven Killig","name":"sonic74","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2025-11-27T04:21:33.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"I hoped `enable_group_offload` would speed it up a bit more than\n```\n>python3 flux.2-with-remote-text-encoder.py\ntorch.__version__='2.9.1+cu130'\ndiffusers.__version__='0.36.0.dev0'\nUsing GPU: NVIDIA GeForce RTX 5070 Ti\nTotal VRAM: 15 GBs\nRunning remote text encoder ☁️\nDone ✅\n 2%|███▏ | 1/50 [02:29<2:02:25, 149.92s/it]\n```","html":"I hoped <code>enable_group_offload</code> would speed it up a bit more than\n<pre><code>>python3 flux.2-with-remote-text-encoder.py\ntorch.__version__='2.9.1+cu130'\ndiffusers.__version__='0.36.0.dev0'\nUsing GPU: NVIDIA GeForce RTX 5070 Ti\nTotal VRAM: 15 GBs\nRunning remote text encoder ☁️\nDone ✅\n 2%|███▏ | 1/50 [02:29<2:02:25, 149.92s/it]\n</code></pre>\n","updatedAt":"2025-11-27T04:21:33.019Z","author":{"_id":"683d1665e41c42facedbcaf8","avatarUrl":"/avatars/3c1daed6469b74f67acf9606172bf974.svg","fullname":"Sven Killig","name":"sonic74","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5549377799034119},"editors":["sonic74"],"editorAvatarUrls":["/avatars/3c1daed6469b74f67acf9606172bf974.svg"],"reactions":[],"isReport":false,"parentCommentId":"692746e2cfcedf38b072c77f"}},{"id":"6927d2b6d00cb5a9bf99ed5c","author":{"_id":"608aabf24955d2bfc3cd99c6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/608aabf24955d2bfc3cd99c6/-YxmtpzEmf3NKOTktODRP.jpeg","fullname":"Aritra Roy Gosthipaty","name":"ariG23498","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":645,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583856921041-5dd96eb166059660ed1ee413.png","fullname":"Hugging Face","name":"huggingface","type":"org","isHf":true,"details":"The AI community building the future.","plan":"team"}},"createdAt":"2025-11-27T04:25:26.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"I don't think group offload speeds things up. It is meant to be a trick to reduce memory usage. The trade offs here should be noted.","html":"I don't think group offload speeds things up. It is meant to be a trick to reduce memory usage. The trade offs here should be noted.\n","updatedAt":"2025-11-27T04:25:26.510Z","author":{"_id":"608aabf24955d2bfc3cd99c6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/608aabf24955d2bfc3cd99c6/-YxmtpzEmf3NKOTktODRP.jpeg","fullname":"Aritra Roy Gosthipaty","name":"ariG23498","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":645,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583856921041-5dd96eb166059660ed1ee413.png","fullname":"Hugging Face","name":"huggingface","type":"org","isHf":true,"details":"The AI community building the future.","plan":"team"}}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9854326844215393},"editors":["ariG23498"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/608aabf24955d2bfc3cd99c6/-YxmtpzEmf3NKOTktODRP.jpeg"],"reactions":[],"isReport":false,"parentCommentId":"692746e2cfcedf38b072c77f"}},{"id":"6927d3928b9bc560603d6692","author":{"_id":"683d1665e41c42facedbcaf8","avatarUrl":"/avatars/3c1daed6469b74f67acf9606172bf974.svg","fullname":"Sven Killig","name":"sonic74","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2025-11-27T04:29:06.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"I wonder what ComfyUI uses. It inferences in about a minute - even with fp8 and a local text encoder.","html":"I wonder what ComfyUI uses. It inferences in about a minute - even with fp8 and a local text encoder.\n","updatedAt":"2025-11-27T04:32:11.650Z","author":{"_id":"683d1665e41c42facedbcaf8","avatarUrl":"/avatars/3c1daed6469b74f67acf9606172bf974.svg","fullname":"Sven Killig","name":"sonic74","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":4,"identifiedLanguage":{"language":"en","probability":0.8609533309936523},"editors":["sonic74"],"editorAvatarUrls":["/avatars/3c1daed6469b74f67acf9606172bf974.svg"],"reactions":[],"isReport":false,"parentCommentId":"692746e2cfcedf38b072c77f"}},{"id":"6927f49af265b7e9efc9e916","author":{"_id":"63df091910678851bb0cd0e0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63df091910678851bb0cd0e0/FUXFt0C-rUFSppIAu5ZDN.png","fullname":"Alvaro Somoza","name":"OzzyGT","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":241,"isUserFollowing":false},"createdAt":"2025-11-27T06:50:02.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"this is just a typo, change `transformer_id` for `repo_id`","html":"this is just a typo, change <code>transformer_id</code> for <code>repo_id</code>\n","updatedAt":"2025-11-27T06:50:02.273Z","author":{"_id":"63df091910678851bb0cd0e0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63df091910678851bb0cd0e0/FUXFt0C-rUFSppIAu5ZDN.png","fullname":"Alvaro Somoza","name":"OzzyGT","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":241,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.74115389585495},"editors":["OzzyGT"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/63df091910678851bb0cd0e0/FUXFt0C-rUFSppIAu5ZDN.png"],"reactions":[],"isReport":false,"parentCommentId":"692746e2cfcedf38b072c77f"}},{"id":"69285f771601303bb179ffd7","author":{"_id":"683d1665e41c42facedbcaf8","avatarUrl":"/avatars/3c1daed6469b74f67acf9606172bf974.svg","fullname":"Sven Killig","name":"sonic74","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2025-11-27T14:25:59.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"I already did, but wasn't sure, because then I got\n```\ntorch.AcceleratorError: CUDA error: an illegal memory access was encountered\nSearch for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.\n```","html":"I already did, but wasn't sure, because then I got\n<pre><code>torch.AcceleratorError: CUDA error: an illegal memory access was encountered\nSearch for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.\n</code></pre>\n","updatedAt":"2025-11-27T15:20:28.356Z","author":{"_id":"683d1665e41c42facedbcaf8","avatarUrl":"/avatars/3c1daed6469b74f67acf9606172bf974.svg","fullname":"Sven Killig","name":"sonic74","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":2,"identifiedLanguage":{"language":"en","probability":0.8433528542518616},"editors":["sonic74"],"editorAvatarUrls":["/avatars/3c1daed6469b74f67acf9606172bf974.svg"],"reactions":[],"isReport":false,"parentCommentId":"692746e2cfcedf38b072c77f"}},{"id":"6928717ca4a9a86eb355d157","author":{"_id":"63df091910678851bb0cd0e0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63df091910678851bb0cd0e0/FUXFt0C-rUFSppIAu5ZDN.png","fullname":"Alvaro Somoza","name":"OzzyGT","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":241,"isUserFollowing":false},"createdAt":"2025-11-27T15:42:52.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"do you have 32GB of free RAM? group offloading with cuda streams needs a lot of RAM but it's fast, anyway, here is really hard to give help, if you still have problems, please open an issue in the diffusers repo with the code and the error you're getting. ","html":"do you have 32GB of free RAM? group offloading with cuda streams needs a lot of RAM but it's fast, anyway, here is really hard to give help, if you still have problems, please open an issue in the diffusers repo with the code and the error you're getting. \n","updatedAt":"2025-11-27T15:42:52.158Z","author":{"_id":"63df091910678851bb0cd0e0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63df091910678851bb0cd0e0/FUXFt0C-rUFSppIAu5ZDN.png","fullname":"Alvaro Somoza","name":"OzzyGT","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":241,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9722614288330078},"editors":["OzzyGT"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/63df091910678851bb0cd0e0/FUXFt0C-rUFSppIAu5ZDN.png"],"reactions":[],"isReport":false,"parentCommentId":"692746e2cfcedf38b072c77f"}},{"id":"6928737d38380707431b32f3","author":{"_id":"683d1665e41c42facedbcaf8","avatarUrl":"/avatars/3c1daed6469b74f67acf9606172bf974.svg","fullname":"Sven Killig","name":"sonic74","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2025-11-27T15:51:25.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"I have 96 GB RAM, and indeed ComfyUI fills it up when inferencing flux2.\nb.t.w., [there's obvious spam](https://github.com/huggingface/blog/issues/3112) in the issue tracker since 2 months.","html":"I have 96 GB RAM, and indeed ComfyUI fills it up when inferencing flux2. b.t.w., <a href=\"https://github.com/huggingface/blog/issues/3112\" rel=\"nofollow\">there's obvious spam</a> in the issue tracker since 2 months.\n","updatedAt":"2025-11-27T15:51:25.988Z","author":{"_id":"683d1665e41c42facedbcaf8","avatarUrl":"/avatars/3c1daed6469b74f67acf9606172bf974.svg","fullname":"Sven Killig","name":"sonic74","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8643487691879272},"editors":["sonic74"],"editorAvatarUrls":["/avatars/3c1daed6469b74f67acf9606172bf974.svg"],"reactions":[],"isReport":false,"parentCommentId":"692746e2cfcedf38b072c77f"}}]},{"id":"6927f2cacfcedf38b072c796","author":{"_id":"6304c907bad6ce7fc02764d4","avatarUrl":"/avatars/d93fae5d31c8f76e97d8bdfb3e2a0d5e.svg","fullname":"Junjie","name":"Adenialzz","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false},"createdAt":"2025-11-27T06:42:18.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Hi, How can I deploy a text encoder privately?","html":"Hi, How can I deploy a text encoder privately?\n","updatedAt":"2025-11-27T06:42:18.203Z","author":{"_id":"6304c907bad6ce7fc02764d4","avatarUrl":"/avatars/d93fae5d31c8f76e97d8bdfb3e2a0d5e.svg","fullname":"Junjie","name":"Adenialzz","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9077998399734497},"editors":["Adenialzz"],"editorAvatarUrls":["/avatars/d93fae5d31c8f76e97d8bdfb3e2a0d5e.svg"],"reactions":[{"reaction":"😎","users":["ariG23498"],"count":1},{"reaction":"🔥","users":["ariG23498"],"count":1},{"reaction":"🚀","users":["ariG23498"],"count":1}],"isReport":false},"replies":[{"id":"6927f929bf5c52ab85acc2ac","author":{"_id":"63df091910678851bb0cd0e0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63df091910678851bb0cd0e0/FUXFt0C-rUFSppIAu5ZDN.png","fullname":"Alvaro Somoza","name":"OzzyGT","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":241,"isUserFollowing":false},"createdAt":"2025-11-27T07:09:29.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Hi, you can read this [nice repo](https://github.com/ariG23498/custom-inference-endpoint) with the process that @ariG23498 made","html":"Hi, you can read this <a href=\"https://github.com/ariG23498/custom-inference-endpoint\" rel=\"nofollow\">nice repo</a> with the process that <a href=\"/ariG23498\">@ariG23498</a> made\n","updatedAt":"2025-11-27T07:09:29.107Z","author":{"_id":"63df091910678851bb0cd0e0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63df091910678851bb0cd0e0/FUXFt0C-rUFSppIAu5ZDN.png","fullname":"Alvaro Somoza","name":"OzzyGT","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":241,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8557339906692505},"editors":["OzzyGT"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/63df091910678851bb0cd0e0/FUXFt0C-rUFSppIAu5ZDN.png"],"reactions":[],"isReport":false,"parentCommentId":"6927f2cacfcedf38b072c796"}},{"id":"6927fa0e8dde7713575455a9","author":{"_id":"6304c907bad6ce7fc02764d4","avatarUrl":"/avatars/d93fae5d31c8f76e97d8bdfb3e2a0d5e.svg","fullname":"Junjie","name":"Adenialzz","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false},"createdAt":"2025-11-27T07:13:18.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Thanks.","html":"Thanks.\n","updatedAt":"2025-11-27T07:13:18.404Z","author":{"_id":"6304c907bad6ce7fc02764d4","avatarUrl":"/avatars/d93fae5d31c8f76e97d8bdfb3e2a0d5e.svg","fullname":"Junjie","name":"Adenialzz","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8000723719596863},"editors":["Adenialzz"],"editorAvatarUrls":["/avatars/d93fae5d31c8f76e97d8bdfb3e2a0d5e.svg"],"reactions":[],"isReport":false,"parentCommentId":"6927f2cacfcedf38b072c796"}},{"id":"6927fdc12769abebd2f50121","author":{"_id":"608aabf24955d2bfc3cd99c6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/608aabf24955d2bfc3cd99c6/-YxmtpzEmf3NKOTktODRP.jpeg","fullname":"Aritra Roy Gosthipaty","name":"ariG23498","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":645,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583856921041-5dd96eb166059660ed1ee413.png","fullname":"Hugging Face","name":"huggingface","type":"org","isHf":true,"details":"The AI community building the future.","plan":"team"}},"createdAt":"2025-11-27T07:29:05.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Also, let me know if you face any issues! You can write an issue directly in the GitHub repo itself. Would love to help 🤗","html":"Also, let me know if you face any issues! You can write an issue directly in the GitHub repo itself. Would love to help 🤗\n","updatedAt":"2025-11-27T07:29:05.732Z","author":{"_id":"608aabf24955d2bfc3cd99c6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/608aabf24955d2bfc3cd99c6/-YxmtpzEmf3NKOTktODRP.jpeg","fullname":"Aritra Roy Gosthipaty","name":"ariG23498","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":645,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583856921041-5dd96eb166059660ed1ee413.png","fullname":"Hugging Face","name":"huggingface","type":"org","isHf":true,"details":"The AI community building the future.","plan":"team"}}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.848737359046936},"editors":["ariG23498"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/608aabf24955d2bfc3cd99c6/-YxmtpzEmf3NKOTktODRP.jpeg"],"reactions":[],"isReport":false,"parentCommentId":"6927f2cacfcedf38b072c796"}}]},{"id":"69dae29a51984581afba5e82","author":{"_id":"66c0e1af466dc6770ef31414","avatarUrl":"/avatars/8672f41ed26b0e7140c0117203b0ded5.svg","fullname":"dsa","name":"goodasdgood","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-04-12T00:08:58.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"how run flux on two gpu\n\ncode????","html":"how run flux on two gpu\ncode????\n","updatedAt":"2026-04-12T00:08:58.216Z","author":{"_id":"66c0e1af466dc6770ef31414","avatarUrl":"/avatars/8672f41ed26b0e7140c0117203b0ded5.svg","fullname":"dsa","name":"goodasdgood","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7250254154205322},"editors":["goodasdgood"],"editorAvatarUrls":["/avatars/8672f41ed26b0e7140c0117203b0ded5.svg"],"reactions":[],"isReport":false}},{"id":"69dae72f51984581afba5e95","author":{"_id":"66c0e1af466dc6770ef31414","avatarUrl":"/avatars/8672f41ed26b0e7140c0117203b0ded5.svg","fullname":"dsa","name":"goodasdgood","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-04-12T00:28:31.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"https://github.com/ayttop/xflux2gpu/blob/main/xxxxxx%20(1).ipynb\n\n\nhow run it on 2 gpu\n2*16gb\n\n\nhttps://huggingface.co/docs/diffusers/main/en/api/parallel\n\nhttps://huggingface.co/docs/diffusers/main/en/training/distributed_inference\n","html":"<a href=\"https://github.com/ayttop/xflux2gpu/blob/main/xxxxxx%20(1).ipynb\" rel=\"nofollow\">https://github.com/ayttop/xflux2gpu/blob/main/xxxxxx%20(1).ipynb</a>\nhow run it on 2 gpu 2*16gb\n<a href=\"https://huggingface.co/docs/diffusers/main/en/api/parallel\">https://huggingface.co/docs/diffusers/main/en/api/parallel</a>\n<a href=\"https://huggingface.co/docs/diffusers/main/en/training/distributed_inference\">https://huggingface.co/docs/diffusers/main/en/training/distributed_inference</a>\n","updatedAt":"2026-04-12T00:35:14.734Z","author":{"_id":"66c0e1af466dc6770ef31414","avatarUrl":"/avatars/8672f41ed26b0e7140c0117203b0ded5.svg","fullname":"dsa","name":"goodasdgood","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.3185204863548279},"editors":["goodasdgood"],"editorAvatarUrls":["/avatars/8672f41ed26b0e7140c0117203b0ded5.svg"],"reactions":[],"isReport":false}},{"id":"69daf8dd356d7881a860bd04","author":{"_id":"66c0e1af466dc6770ef31414","avatarUrl":"/avatars/8672f41ed26b0e7140c0117203b0ded5.svg","fullname":"dsa","name":"goodasdgood","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-04-12T01:43:57.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"import torch\nfrom transformers import Mistral3ForConditionalGeneration\n\nfrom diffusers import Flux2Pipeline, Flux2Transformer2DModel\n\nrepo_id = \"diffusers/FLUX.2-dev-bnb-4bit\"\ndevice = \"cuda:0\"\ntorch_dtype = torch.bfloat16\n\ntransformer = Flux2Transformer2DModel.from_pretrained(\n repo_id, subfolder=\"transformer\", torch_dtype=torch_dtype, device_map=\"cpu\"\n)\ntext_encoder = Mistral3ForConditionalGeneration.from_pretrained(\n repo_id, subfolder=\"text_encoder\", dtype=torch_dtype, device_map=\"cpu\"\n)\n\npipe = Flux2Pipeline.from_pretrained(\n repo_id, transformer=transformer, text_encoder=text_encoder, torch_dtype=torch_dtype\n)\npipe.enable_model_cpu_offload()\n\nprompt = \"Realistic macro photograph of a hermit crab using a soda can as its shell, partially emerging from the can, captured with sharp detail and natural colors, on a sunlit beach with soft shadows and a shallow depth of field, with blurred ocean waves in the background. The can has the text `BFL Diffusers` on it and it has a color gradient that start with #FF5733 at the top and transitions to #33FF57 at the bottom.\"\n\nimage = pipe(\n prompt=prompt,\n generator=torch.Generator(device=device).manual_seed(42),\n num_inference_steps=50, # 28 is a good trade-off\n guidance_scale=4,\n).images[0]\n\nimage.save(\"flux2_t2i_nf4.png\")\n\n\n\n\n\n Flax classes are deprecated and will be removed in Diffusers v1.0.0. We recommend migrating to PyTorch classes or pinning your version of Diffusers.\nFlax classes are deprecated and will be removed in Diffusers v1.0.0. We recommend migrating to PyTorch classes or pinning your version of Diffusers.\n/usr/local/lib/python3.12/dist-packages/huggingface_hub/utils/_validators.py:206: UserWarning: The `local_dir_use_symlinks` argument is deprecated and ignored in `hf_hub_download`. Downloading to a local directory does not use symlinks anymore.\n warnings.warn(\nDownload complete: 0.00/0.00 [00:00<?, ?B/s]Fetching 2 files: 100% 2/2 [00:00<00:00, 98.80it/s]Loading checkpoint shards: 100% 2/2 [00:01<00:00, 2.03it/s]Download complete: 0.00/0.00 [00:00<?, ?B/s]Fetching 4 files: 100% 4/4 [00:00<00:00, 167.57it/s]Loading weights: 100% 585/585 [00:02<00:00, 310.91it/s, Materializing param=model.vision_tower.transformer.layers.23.ffn_norm.weight]The tied weights mapping and config for this model specifies to tie model.language_model.embed_tokens.weight to lm_head.weight, but both are present in the checkpoints, so we will NOT tie them. You should update the config with `tie_word_embeddings=False` to silence this warning\nLoading pipeline components...: 100% 5/5 [00:03<00:00, 1.14it/s]---------------------------------------------------------------------------\nOutOfMemoryError Traceback (most recent call last)\n/tmp/ipykernel_18947/863729753.py in <cell line: 0>()\n 22 prompt = \"Realistic macro photograph of a hermit crab using a soda can as its shell, partially emerging from the can, captured with sharp detail and natural colors, on a sunlit beach with soft shadows and a shallow depth of field, with blurred ocean waves in the background. The can has the text `BFL Diffusers` on it and it has a color gradient that start with #FF5733 at the top and transitions to #33FF57 at the bottom.\"\n 23 \n---> 24 image = pipe(\n 25 prompt=prompt,\n 26 generator=torch.Generator(device=device).manual_seed(42),\n\n36 frames/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py in decorate_context(*args, **kwargs)\n 122 # pyrefly: ignore [bad-context-manager]\n 123 with ctx_factory():\n--> 124 return func(*args, **kwargs)\n 125 \n 126 return decorate_context\n\n/usr/local/lib/python3.12/dist-packages/diffusers/pipelines/flux2/pipeline_flux2.py in __call__(self, image, prompt, height, width, num_inference_steps, sigmas, guidance_scale, num_images_per_prompt, generator, latents, prompt_embeds, output_type, return_dict, attention_kwargs, callback_on_step_end, callback_on_step_end_tensor_inputs, max_sequence_length, text_encoder_out_layers, caption_upsample_temperature)\n 869 prompt, images=image, temperature=caption_upsample_temperature, device=device\n 870 )\n--> 871 prompt_embeds, text_ids = self.encode_prompt(\n 872 prompt=prompt,\n 873 prompt_embeds=prompt_embeds,\n\n/usr/local/lib/python3.12/dist-packages/diffusers/pipelines/flux2/pipeline_flux2.py in encode_prompt(self, prompt, device, num_images_per_prompt, prompt_embeds, max_sequence_length, text_encoder_out_layers)\n 586 \n 587 if prompt_embeds is None:\n--> 588 prompt_embeds = self._get_mistral_3_small_prompt_embeds(\n 589 text_encoder=self.text_encoder,\n 590 tokenizer=self.tokenizer,\n\n/usr/local/lib/python3.12/dist-packages/diffusers/pipelines/flux2/pipeline_flux2.py in _get_mistral_3_small_prompt_embeds(text_encoder, tokenizer, prompt, dtype, device, max_sequence_length, system_message, hidden_states_layers)\n 337 \n 338 # Forward pass through the model\n--> 339 output = text_encoder(\n 340 input_ids=input_ids,\n 341 attention_mask=attention_mask,\n\n/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py in _wrapped_call_impl(self, *args, **kwargs)\n 1774 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]\n 1775 else:\n-> 1776 return self._call_impl(*args, **kwargs)\n 1777 \n 1778 # torchrec tests the code consistency with the following code\n\n/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, **kwargs)\n 1785 or _global_backward_pre_hooks or _global_backward_hooks\n 1786 or _global_forward_hooks or _global_forward_pre_hooks):\n-> 1787 return forward_call(*args, **kwargs)\n 1788 \n 1789 result = None\n\n/usr/local/lib/python3.12/dist-packages/accelerate/hooks.py in new_forward(module, *args, **kwargs)\n 190 output = module._old_forward(*args, **kwargs)\n 191 else:\n--> 192 output = module._old_forward(*args, **kwargs)\n 193 return module._hf_hook.post_forward(module, output)\n 194 \n\n/usr/local/lib/python3.12/dist-packages/transformers/utils/generic.py in wrapper(self, *args, **kwargs)\n 1000 outputs = func(self, *args, **kwargs)\n 1001 else:\n-> 1002 outputs = func(self, *args, **kwargs)\n 1003 except TypeError as original_exception:\n 1004 # If we get a TypeError, it's possible that the model is not receiving the recordable kwargs correctly.\n\n/usr/local/lib/python3.12/dist-packages/transformers/models/mistral3/modeling_mistral3.py in forward(self, input_ids, pixel_values, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict, cache_position, logits_to_keep, image_sizes, **kwargs)\n 444 return_dict = return_dict if return_dict is not None else self.config.use_return_dict\n 445 \n--> 446 outputs = self.model(\n 447 input_ids=input_ids,\n 448 pixel_values=pixel_values,\n\n/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py in _wrapped_call_impl(self, *args, **kwargs)\n 1774 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]\n 1775 else:\n-> 1776 return self._call_impl(*args, **kwargs)\n 1777 \n 1778 # torchrec tests the code consistency with the following code\n\n/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, **kwargs)\n 1785 or _global_backward_pre_hooks or _global_backward_hooks\n 1786 or _global_forward_hooks or _global_forward_pre_hooks):\n-> 1787 return forward_call(*args, **kwargs)\n 1788 \n 1789 result = None\n\n/usr/local/lib/python3.12/dist-packages/transformers/utils/generic.py in wrapper(self, *args, **kwargs)\n 1000 outputs = func(self, *args, **kwargs)\n 1001 else:\n-> 1002 outputs = func(self, *args, **kwargs)\n 1003 except TypeError as original_exception:\n 1004 # If we get a TypeError, it's possible that the model is not receiving the recordable kwargs correctly.\n\n/usr/local/lib/python3.12/dist-packages/transformers/models/mistral3/modeling_mistral3.py in forward(self, input_ids, pixel_values, attention_mask, position_ids, past_key_values, inputs_embeds, vision_feature_layer, use_cache, output_attentions, output_hidden_states, return_dict, cache_position, image_sizes, **kwargs)\n 323 inputs_embeds = inputs_embeds.masked_scatter(special_image_mask, image_features)\n 324 \n--> 325 outputs = self.language_model(\n 326 attention_mask=attention_mask,\n 327 position_ids=position_ids,\n\n/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py in _wrapped_call_impl(self, *args, **kwargs)\n 1774 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]\n 1775 else:\n-> 1776 return self._call_impl(*args, **kwargs)\n 1777 \n 1778 # torchrec tests the code consistency with the following code\n\n/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, **kwargs)\n 1785 or _global_backward_pre_hooks or _global_backward_hooks\n 1786 or _global_forward_hooks or _global_forward_pre_hooks):\n-> 1787 return forward_call(*args, **kwargs)\n 1788 \n 1789 result = None\n\n/usr/local/lib/python3.12/dist-packages/transformers/utils/generic.py in wrapper(self, *args, **kwargs)\n 1000 outputs = func(self, *args, **kwargs)\n 1001 else:\n-> 1002 outputs = func(self, *args, **kwargs)\n 1003 except TypeError as original_exception:\n 1004 # If we get a TypeError, it's possible that the model is not receiving the recordable kwargs correctly.\n\n/usr/local/lib/python3.12/dist-packages/transformers/models/mistral/modeling_mistral.py in forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, use_cache, cache_position, **kwargs)\n 395 \n 396 for decoder_layer in self.layers[: self.config.num_hidden_layers]:\n--> 397 hidden_states = decoder_layer(\n 398 hidden_states,\n 399 attention_mask=causal_mask,\n\n/usr/local/lib/python3.12/dist-packages/transformers/modeling_layers.py in __call__(self, *args, **kwargs)\n 91 \n 92 return self._gradient_checkpointing_func(partial(super().__call__, **kwargs), *args)\n---> 93 return super().__call__(*args, **kwargs)\n 94 \n 95 \n\n/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py in _wrapped_call_impl(self, *args, **kwargs)\n 1774 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]\n 1775 else:\n-> 1776 return self._call_impl(*args, **kwargs)\n 1777 \n 1778 # torchrec tests the code consistency with the following code\n\n/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, **kwargs)\n 1785 or _global_backward_pre_hooks or _global_backward_hooks\n 1786 or _global_forward_hooks or _global_forward_pre_hooks):\n-> 1787 return forward_call(*args, **kwargs)\n 1788 \n 1789 result = None\n\n/usr/local/lib/python3.12/dist-packages/transformers/utils/generic.py in wrapped_forward(*args, **kwargs)\n 953 if key == \"hidden_states\" and len(collected_outputs[key]) == 0:\n 954 collected_outputs[key] += (args[0],)\n--> 955 output = orig_forward(*args, **kwargs)\n 956 if not isinstance(output, tuple):\n 957 collected_outputs[key] += (output,)\n\n/usr/local/lib/python3.12/dist-packages/transformers/models/mistral/modeling_mistral.py in forward(self, hidden_states, attention_mask, position_ids, past_key_values, use_cache, cache_position, position_embeddings, **kwargs)\n 228 hidden_states = self.input_layernorm(hidden_states)\n 229 # Self Attention\n--> 230 hidden_states, _ = self.self_attn(\n 231 hidden_states=hidden_states,\n 232 attention_mask=attention_mask,\n\n/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py in _wrapped_call_impl(self, *args, **kwargs)\n 1774 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]\n 1775 else:\n-> 1776 return self._call_impl(*args, **kwargs)\n 1777 \n 1778 # torchrec tests the code consistency with the following code\n\n/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, **kwargs)\n 1785 or _global_backward_pre_hooks or _global_backward_hooks\n 1786 or _global_forward_hooks or _global_forward_pre_hooks):\n-> 1787 return forward_call(*args, **kwargs)\n 1788 \n 1789 result = None\n\n/usr/local/lib/python3.12/dist-packages/transformers/models/mistral/modeling_mistral.py in forward(self, hidden_states, position_embeddings, attention_mask, past_key_values, cache_position, **kwargs)\n 151 hidden_shape = (*input_shape, -1, self.head_dim)\n 152 \n--> 153 query_states = self.q_proj(hidden_states).view(hidden_shape).transpose(1, 2)\n 154 key_states = self.k_proj(hidden_states).view(hidden_shape).transpose(1, 2)\n 155 value_states = self.v_proj(hidden_states).view(hidden_shape).transpose(1, 2)\n\n/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py in _wrapped_call_impl(self, *args, **kwargs)\n 1774 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]\n 1775 else:\n-> 1776 return self._call_impl(*args, **kwargs)\n 1777 \n 1778 # torchrec tests the code consistency with the following code\n\n/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, **kwargs)\n 1785 or _global_backward_pre_hooks or _global_backward_hooks\n 1786 or _global_forward_hooks or _global_forward_pre_hooks):\n-> 1787 return forward_call(*args, **kwargs)\n 1788 \n 1789 result = None\n\n/usr/local/lib/python3.12/dist-packages/bitsandbytes/nn/modules.py in forward(self, x)\n 554 weight = self.weight if getattr(quant_state, \"packing_format_for_cpu\", False) else self.weight.t()\n 555 \n--> 556 return bnb.matmul_4bit(x, weight, bias=bias, quant_state=quant_state).to(inp_dtype)\n 557 \n 558 \n\n/usr/local/lib/python3.12/dist-packages/bitsandbytes/autograd/_functions.py in matmul_4bit(A, B, quant_state, out, bias)\n 399 return out\n 400 else:\n--> 401 return MatMul4Bit.apply(A, B, out, bias, quant_state)\n\n/usr/local/lib/python3.12/dist-packages/torch/autograd/function.py in apply(cls, *args, **kwargs)\n 581 # See NOTE: [functorch vjp and autograd interaction]\n 582 args = _functorch.utils.unwrap_dead_wrappers(args)\n--> 583 return super().apply(*args, **kwargs) # type: ignore[misc]\n 584 \n 585 if not is_setup_ctx_defined:\n\n/usr/local/lib/python3.12/dist-packages/bitsandbytes/autograd/_functions.py in forward(ctx, A, B, out, bias, quant_state)\n 313 # 1. Dequantize\n 314 # 2. MatmulnN\n--> 315 output = torch.nn.functional.linear(A, F.dequantize_4bit(B, quant_state).to(A.dtype).t(), bias)\n 316 \n 317 # 3. Save state\n\n/usr/local/lib/python3.12/dist-packages/bitsandbytes/functional.py in dequantize_4bit(A, quant_state, absmax, out, blocksize, quant_type)\n 1048 )\n 1049 else:\n-> 1050 out = torch.ops.bitsandbytes.dequantize_4bit.default(\n 1051 A,\n 1052 absmax,\n\n/usr/local/lib/python3.12/dist-packages/torch/_ops.py in __call__(self, *args, **kwargs)\n 817 # that are named \"self\". This way, all the aten ops can be called by kwargs.\n 818 def __call__(self, /, *args: _P.args, **kwargs: _P.kwargs) -> _T:\n--> 819 return self._op(*args, **kwargs)\n 820 \n 821 # Use positional-only argument to avoid naming collision with aten ops arguments\n\n/usr/local/lib/python3.12/dist-packages/torch/_compile.py in inner(*args, **kwargs)\n 52 fn.__dynamo_disable = disable_fn # type: ignore[attr-defined]\n 53 \n---> 54 return disable_fn(*args, **kwargs)\n 55 \n 56 return inner\n\n/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py in _fn(*args, **kwargs)\n 1179 ):\n 1180 return fn(*args, **kwargs)\n-> 1181 return fn(*args, **kwargs)\n 1182 finally:\n 1183 set_eval_frame(None)\n\n/usr/local/lib/python3.12/dist-packages/torch/library.py in func_no_dynamo(*args, **kwargs)\n 740 @torch._disable_dynamo\n 741 def func_no_dynamo(*args, **kwargs):\n--> 742 return func(*args, **kwargs)\n 743 \n 744 for key in keys:\n\n/usr/local/lib/python3.12/dist-packages/bitsandbytes/backends/cuda/ops.py in _(A, absmax, blocksize, quant_type, shape, dtype)\n 361 dtype: torch.dtype,\n 362 ) -> torch.Tensor:\n--> 363 out = torch.empty(shape, dtype=dtype, device=A.device)\n 364 _dequantize_4bit_impl(A, absmax, blocksize, quant_type, dtype, out=out)\n 365 return out\n\nOutOfMemoryError: CUDA out of memory. Tried to allocate 40.00 MiB. GPU 0 has a total capacity of 14.56 GiB of which 17.81 MiB is free. Including non-PyTorch memory, this process has 14.54 GiB memory in use. Of the allocated memory 14.40 GiB is allocated by PyTorch, and 15.19 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)","html":"import torch from transformers import Mistral3ForConditionalGeneration\nfrom diffusers import Flux2Pipeline, Flux2Transformer2DModel\nrepo_id = \"diffusers/FLUX.2-dev-bnb-4bit\" device = \"cuda:0\" torch_dtype = torch.bfloat16\ntransformer = Flux2Transformer2DModel.from_pretrained( repo_id, subfolder=\"transformer\", torch_dtype=torch_dtype, device_map=\"cpu\" ) text_encoder = Mistral3ForConditionalGeneration.from_pretrained( repo_id, subfolder=\"text_encoder\", dtype=torch_dtype, device_map=\"cpu\" )\npipe = Flux2Pipeline.from_pretrained( repo_id, transformer=transformer, text_encoder=text_encoder, torch_dtype=torch_dtype ) pipe.enable_model_cpu_offload()\nprompt = \"Realistic macro photograph of a hermit crab using a soda can as its shell, partially emerging from the can, captured with sharp detail and natural colors, on a sunlit beach with soft shadows and a shallow depth of field, with blurred ocean waves in the background. The can has the text <code>BFL Diffusers</code> on it and it has a color gradient that start with #FF5733 at the top and transitions to #33FF57 at the bottom.\"\nimage = pipe( prompt=prompt, generator=torch.Generator(device=device).manual_seed(42), num_inference_steps=50, # 28 is a good trade-off guidance_scale=4, ).images[0]\nimage.save(\"flux2_t2i_nf4.png\")\n Flax classes are deprecated and will be removed in Diffusers v1.0.0. We recommend migrating to PyTorch classes or pinning your version of Diffusers. Flax classes are deprecated and will be removed in Diffusers v1.0.0. We recommend migrating to PyTorch classes or pinning your version of Diffusers. /usr/local/lib/python3.12/dist-packages/huggingface_hub/utils/_validators.py:206: UserWarning: The <code>local_dir_use_symlinks</code> argument is deprecated and ignored in <code>hf_hub_download</code>. Downloading to a local directory does not use symlinks anymore. warnings.warn( Download complete: 0.00/0.00 [00:00<?, ?B/s]Fetching 2 files: 100% 2/2 [00:00<00:00, 98.80it/s]Loading checkpoint shards: 100% 2/2 [00:01<00:00, 2.03it/s]Download complete: 0.00/0.00 [00:00<?, ?B/s]Fetching 4 files: 100% 4/4 [00:00<00:00, 167.57it/s]Loading weights: 100% 585/585 [00:02<00:00, 310.91it/s, Materializing param=model.vision_tower.transformer.layers.23.ffn_norm.weight]The tied weights mapping and config for this model specifies to tie model.language_model.embed_tokens.weight to lm_head.weight, but both are present in the checkpoints, so we will NOT tie them. You should update the config with <code>tie_word_embeddings=False</code> to silence this warning Loading pipeline components...: 100% 5/5 [00:03<00:00, 1.14it/s]--------------------------------------------------------------------------- OutOfMemoryError Traceback (most recent call last) /tmp/ipykernel_18947/863729753.py in <cell line: 0>() 22 prompt = \"Realistic macro photograph of a hermit crab using a soda can as its shell, partially emerging from the can, captured with sharp detail and natural colors, on a sunlit beach with soft shadows and a shallow depth of field, with blurred ocean waves in the background. The can has the text <code>BFL Diffusers</code> on it and it has a color gradient that start with #FF5733 at the top and transitions to #33FF57 at the bottom.\" 23 ---> 24 image = pipe( 25 prompt=prompt, 26 generator=torch.Generator(device=device).manual_seed(42),\n36 frames/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py in decorate_context(*args, **kwargs) 122 # pyrefly: ignore [bad-context-manager] 123 with ctx_factory(): --> 124 return func(*args, **kwargs) 125 126 return decorate_context\n/usr/local/lib/python3.12/dist-packages/diffusers/pipelines/flux2/pipeline_flux2.py in call(self, image, prompt, height, width, num_inference_steps, sigmas, guidance_scale, num_images_per_prompt, generator, latents, prompt_embeds, output_type, return_dict, attention_kwargs, callback_on_step_end, callback_on_step_end_tensor_inputs, max_sequence_length, text_encoder_out_layers, caption_upsample_temperature) 869 prompt, images=image, temperature=caption_upsample_temperature, device=device 870 ) --> 871 prompt_embeds, text_ids = self.encode_prompt( 872 prompt=prompt, 873 prompt_embeds=prompt_embeds,\n/usr/local/lib/python3.12/dist-packages/diffusers/pipelines/flux2/pipeline_flux2.py in encode_prompt(self, prompt, device, num_images_per_prompt, prompt_embeds, max_sequence_length, text_encoder_out_layers) 586 587 if prompt_embeds is None: --> 588 prompt_embeds = self._get_mistral_3_small_prompt_embeds( 589 text_encoder=self.text_encoder, 590 tokenizer=self.tokenizer,\n/usr/local/lib/python3.12/dist-packages/diffusers/pipelines/flux2/pipeline_flux2.py in _get_mistral_3_small_prompt_embeds(text_encoder, tokenizer, prompt, dtype, device, max_sequence_length, system_message, hidden_states_layers) 337 338 # Forward pass through the model --> 339 output = text_encoder( 340 input_ids=input_ids, 341 attention_mask=attention_mask,\n/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py in _wrapped_call_impl(self, *args, **kwargs) 1774 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc] 1775 else: -> 1776 return self._call_impl(*args, **kwargs) 1777 1778 # torchrec tests the code consistency with the following code\n/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, **kwargs) 1785 or _global_backward_pre_hooks or _global_backward_hooks 1786 or _global_forward_hooks or _global_forward_pre_hooks): -> 1787 return forward_call(*args, **kwargs) 1788 1789 result = None\n/usr/local/lib/python3.12/dist-packages/accelerate/hooks.py in new_forward(module, *args, **kwargs) 190 output = module._old_forward(*args, **kwargs) 191 else: --> 192 output = module._old_forward(*args, **kwargs) 193 return module._hf_hook.post_forward(module, output) 194 \n/usr/local/lib/python3.12/dist-packages/transformers/utils/generic.py in wrapper(self, *args, **kwargs) 1000 outputs = func(self, *args, **kwargs) 1001 else: -> 1002 outputs = func(self, *args, **kwargs) 1003 except TypeError as original_exception: 1004 # If we get a TypeError, it's possible that the model is not receiving the recordable kwargs correctly.\n/usr/local/lib/python3.12/dist-packages/transformers/models/mistral3/modeling_mistral3.py in forward(self, input_ids, pixel_values, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict, cache_position, logits_to_keep, image_sizes, **kwargs) 444 return_dict = return_dict if return_dict is not None else self.config.use_return_dict 445 --> 446 outputs = self.model( 447 input_ids=input_ids, 448 pixel_values=pixel_values,\n/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py in _wrapped_call_impl(self, *args, **kwargs) 1774 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc] 1775 else: -> 1776 return self._call_impl(*args, **kwargs) 1777 1778 # torchrec tests the code consistency with the following code\n/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, **kwargs) 1785 or _global_backward_pre_hooks or _global_backward_hooks 1786 or _global_forward_hooks or _global_forward_pre_hooks): -> 1787 return forward_call(*args, **kwargs) 1788 1789 result = None\n/usr/local/lib/python3.12/dist-packages/transformers/utils/generic.py in wrapper(self, *args, **kwargs) 1000 outputs = func(self, *args, **kwargs) 1001 else: -> 1002 outputs = func(self, *args, **kwargs) 1003 except TypeError as original_exception: 1004 # If we get a TypeError, it's possible that the model is not receiving the recordable kwargs correctly.\n/usr/local/lib/python3.12/dist-packages/transformers/models/mistral3/modeling_mistral3.py in forward(self, input_ids, pixel_values, attention_mask, position_ids, past_key_values, inputs_embeds, vision_feature_layer, use_cache, output_attentions, output_hidden_states, return_dict, cache_position, image_sizes, **kwargs) 323 inputs_embeds = inputs_embeds.masked_scatter(special_image_mask, image_features) 324 --> 325 outputs = self.language_model( 326 attention_mask=attention_mask, 327 position_ids=position_ids,\n/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py in _wrapped_call_impl(self, *args, **kwargs) 1774 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc] 1775 else: -> 1776 return self._call_impl(*args, **kwargs) 1777 1778 # torchrec tests the code consistency with the following code\n/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, **kwargs) 1785 or _global_backward_pre_hooks or _global_backward_hooks 1786 or _global_forward_hooks or _global_forward_pre_hooks): -> 1787 return forward_call(*args, **kwargs) 1788 1789 result = None\n/usr/local/lib/python3.12/dist-packages/transformers/utils/generic.py in wrapper(self, *args, **kwargs) 1000 outputs = func(self, *args, **kwargs) 1001 else: -> 1002 outputs = func(self, *args, **kwargs) 1003 except TypeError as original_exception: 1004 # If we get a TypeError, it's possible that the model is not receiving the recordable kwargs correctly.\n/usr/local/lib/python3.12/dist-packages/transformers/models/mistral/modeling_mistral.py in forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, use_cache, cache_position, **kwargs) 395 396 for decoder_layer in self.layers[: self.config.num_hidden_layers]: --> 397 hidden_states = decoder_layer( 398 hidden_states, 399 attention_mask=causal_mask,\n/usr/local/lib/python3.12/dist-packages/transformers/modeling_layers.py in call(self, *args, **kwargs) 91 92 return self._gradient_checkpointing_func(partial(super().call, **kwargs), *args) ---> 93 return super().call(*args, **kwargs) 94 95 \n/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py in _wrapped_call_impl(self, *args, **kwargs) 1774 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc] 1775 else: -> 1776 return self._call_impl(*args, **kwargs) 1777 1778 # torchrec tests the code consistency with the following code\n/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, **kwargs) 1785 or _global_backward_pre_hooks or _global_backward_hooks 1786 or _global_forward_hooks or _global_forward_pre_hooks): -> 1787 return forward_call(*args, **kwargs) 1788 1789 result = None\n/usr/local/lib/python3.12/dist-packages/transformers/utils/generic.py in wrapped_forward(*args, **kwargs) 953 if key == \"hidden_states\" and len(collected_outputs[key]) == 0: 954 collected_outputs[key] += (args[0],) --> 955 output = orig_forward(*args, **kwargs) 956 if not isinstance(output, tuple): 957 collected_outputs[key] += (output,)\n/usr/local/lib/python3.12/dist-packages/transformers/models/mistral/modeling_mistral.py in forward(self, hidden_states, attention_mask, position_ids, past_key_values, use_cache, cache_position, position_embeddings, **kwargs) 228 hidden_states = self.input_layernorm(hidden_states) 229 # Self Attention --> 230 hidden_states, _ = self.self_attn( 231 hidden_states=hidden_states, 232 attention_mask=attention_mask,\n/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py in _wrapped_call_impl(self, *args, **kwargs) 1774 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc] 1775 else: -> 1776 return self._call_impl(*args, **kwargs) 1777 1778 # torchrec tests the code consistency with the following code\n/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, **kwargs) 1785 or _global_backward_pre_hooks or _global_backward_hooks 1786 or _global_forward_hooks or _global_forward_pre_hooks): -> 1787 return forward_call(*args, **kwargs) 1788 1789 result = None\n/usr/local/lib/python3.12/dist-packages/transformers/models/mistral/modeling_mistral.py in forward(self, hidden_states, position_embeddings, attention_mask, past_key_values, cache_position, **kwargs) 151 hidden_shape = (*input_shape, -1, self.head_dim) 152 --> 153 query_states = self.q_proj(hidden_states).view(hidden_shape).transpose(1, 2) 154 key_states = self.k_proj(hidden_states).view(hidden_shape).transpose(1, 2) 155 value_states = self.v_proj(hidden_states).view(hidden_shape).transpose(1, 2)\n/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py in _wrapped_call_impl(self, *args, **kwargs) 1774 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc] 1775 else: -> 1776 return self._call_impl(*args, **kwargs) 1777 1778 # torchrec tests the code consistency with the following code\n/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, **kwargs) 1785 or _global_backward_pre_hooks or _global_backward_hooks 1786 or _global_forward_hooks or _global_forward_pre_hooks): -> 1787 return forward_call(*args, **kwargs) 1788 1789 result = None\n/usr/local/lib/python3.12/dist-packages/bitsandbytes/nn/modules.py in forward(self, x) 554 weight = self.weight if getattr(quant_state, \"packing_format_for_cpu\", False) else self.weight.t() 555 --> 556 return bnb.matmul_4bit(x, weight, bias=bias, quant_state=quant_state).to(inp_dtype) 557 558 \n/usr/local/lib/python3.12/dist-packages/bitsandbytes/autograd/_functions.py in matmul_4bit(A, B, quant_state, out, bias) 399 return out 400 else: --> 401 return MatMul4Bit.apply(A, B, out, bias, quant_state)\n/usr/local/lib/python3.12/dist-packages/torch/autograd/function.py in apply(cls, *args, **kwargs) 581 # See NOTE: [functorch vjp and autograd interaction] 582 args = _functorch.utils.unwrap_dead_wrappers(args) --> 583 return super().apply(*args, **kwargs) # type: ignore[misc] 584 585 if not is_setup_ctx_defined:\n/usr/local/lib/python3.12/dist-packages/bitsandbytes/autograd/_functions.py in forward(ctx, A, B, out, bias, quant_state) 313 # 1. Dequantize 314 # 2. MatmulnN --> 315 output = torch.nn.functional.linear(A, F.dequantize_4bit(B, quant_state).to(A.dtype).t(), bias) 316 317 # 3. Save state\n/usr/local/lib/python3.12/dist-packages/bitsandbytes/functional.py in dequantize_4bit(A, quant_state, absmax, out, blocksize, quant_type) 1048 ) 1049 else: -> 1050 out = torch.ops.bitsandbytes.dequantize_4bit.default( 1051 A, 1052 absmax,\n/usr/local/lib/python3.12/dist-packages/torch/_ops.py in call(self, *args, **kwargs) 817 # that are named \"self\". This way, all the aten ops can be called by kwargs. 818 def call(self, /, *args: _P.args, **kwargs: _P.kwargs) -> _T: --> 819 return self._op(*args, **kwargs) 820 821 # Use positional-only argument to avoid naming collision with aten ops arguments\n/usr/local/lib/python3.12/dist-packages/torch/_compile.py in inner(*args, **kwargs) 52 fn.__dynamo_disable = disable_fn # type: ignore[attr-defined] 53 ---> 54 return disable_fn(*args, **kwargs) 55 56 return inner\n/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py in _fn(*args, **kwargs) 1179 ): 1180 return fn(*args, **kwargs) -> 1181 return fn(*args, **kwargs) 1182 finally: 1183 set_eval_frame(None)\n/usr/local/lib/python3.12/dist-packages/torch/library.py in func_no_dynamo(*args, **kwargs) 740 <a href=\"/torch\">@torch</a> ._disable_dynamo 741 def func_no_dynamo(*args, **kwargs): --> 742 return func(*args, **kwargs) 743 744 for key in keys:\n/usr/local/lib/python3.12/dist-packages/bitsandbytes/backends/cuda/ops.py in _(A, absmax, blocksize, quant_type, shape, dtype) 361 dtype: torch.dtype, 362 ) -> torch.Tensor: --> 363 out = torch.empty(shape, dtype=dtype, device=A.device) 364 _dequantize_4bit_impl(A, absmax, blocksize, quant_type, dtype, out=out) 365 return out\nOutOfMemoryError: CUDA out of memory. Tried to allocate 40.00 MiB. GPU 0 has a total capacity of 14.56 GiB of which 17.81 MiB is free. Including non-PyTorch memory, this process has 14.54 GiB memory in use. Of the allocated memory 14.40 GiB is allocated by PyTorch, and 15.19 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (<a href=\"https://pytorch.org/docs/stable/notes/cuda.html#environment-variables\" rel=\"nofollow\">https://pytorch.org/docs/stable/notes/cuda.html#environment-variables</a>)\n","updatedAt":"2026-04-12T01:43:57.338Z","author":{"_id":"66c0e1af466dc6770ef31414","avatarUrl":"/avatars/8672f41ed26b0e7140c0117203b0ded5.svg","fullname":"dsa","name":"goodasdgood","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.4948948323726654},"editors":["goodasdgood"],"editorAvatarUrls":["/avatars/8672f41ed26b0e7140c0117203b0ded5.svg"],"reactions":[],"isReport":false}},{"id":"69db010ea455b785a0514fef","author":{"_id":"66c0e1af466dc6770ef31414","avatarUrl":"/avatars/8672f41ed26b0e7140c0117203b0ded5.svg","fullname":"dsa","name":"goodasdgood","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-04-12T02:18:54.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"https://github.com/huggingface/diffusers/blob/main/docs/source/en/using-diffusers/loading.md","html":"<a href=\"https://github.com/huggingface/diffusers/blob/main/docs/source/en/using-diffusers/loading.md\" rel=\"nofollow\">https://github.com/huggingface/diffusers/blob/main/docs/source/en/using-diffusers/loading.md</a>\n","updatedAt":"2026-04-12T02:18:54.570Z","author":{"_id":"66c0e1af466dc6770ef31414","avatarUrl":"/avatars/8672f41ed26b0e7140c0117203b0ded5.svg","fullname":"dsa","name":"goodasdgood","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5550499558448792},"editors":["goodasdgood"],"editorAvatarUrls":["/avatars/8672f41ed26b0e7140c0117203b0ded5.svg"],"reactions":[],"isReport":false}}],"status":"open","isReport":false,"pinned":false,"locked":false,"collection":"community_blogs"},"contextAuthors":["YiYiXu","dg845","sayakpaul","OzzyGT","dn6","ariG23498","linoyts","multimodalart"],"primaryEmailConfirmed":false,"discussionRole":0,"acceptLanguages":["en"],"withThread":true,"cardDisplay":false,"repoDiscussionsLocked":false}">

charchits7

Nov 25, 2025

Amazing! thanks team.

MaziyarPanahi

Nov 25, 2025

amazing! great work! 👏
is there a support for multi-gpus? (device_map=auto)

sayakpaul

Article author Nov 26, 2025

You should be able to incorporate that in different forms. Check this out:
https://huggingface.co/docs/diffusers/main/en/training/distributed_inference

muntedslunt

Nov 26, 2025

lol... nope

sayakpaul

Article author Nov 26, 2025

What's that supposed to mean?

NhuGiap

Nov 26, 2025

Hi, can you tell me a bit about your motivations behind omitting all bias parameters in network architecture? Thanks!

sayakpaul

Article author Nov 26, 2025

That's a question for the Black Forest Labs team, not us.

deleted

Nov 26, 2025

This comment has been hidden

guilhermevaz

Nov 26, 2025

Amazing work! Can you tell me when the depth-maps model will be released?
Has anyone already tried giving a depth map as a normal image? How does the model behave?

sonic74

Nov 26, 2025

•

edited Nov 27, 2025

https://github.com/huggingface/blog/blob/main/flux-2.md?plain=1#L283 causes

    transformer_id, subfolder="transformer", torch_dtype=torch_dtype, device_map="cpu"
    ^^^^^^^^^^^^^^
NameError: name 'transformer_id' is not defined

ariG23498

Article author Nov 27, 2025

Probably an installation error?

pip install git+https://github.com/huggingface/diffusers -U should help you with this.

Adenialzz

Nov 27, 2025

Hi, How can I deploy a text encoder privately?

OzzyGT

Article author Nov 27, 2025

Hi, you can read this nice repo with the process that @ariG23498 made

goodasdgood

Apr 12

how run flux on two gpu

code????

goodasdgood

Apr 12

•

edited Apr 12

https://github.com/ayttop/xflux2gpu/blob/main/xxxxxx%20(1).ipynb

how run it on 2 gpu
2*16gb

https://huggingface.co/docs/diffusers/main/en/api/parallel

https://huggingface.co/docs/diffusers/main/en/training/distributed_inference

goodasdgood

Apr 12

import torch
from transformers import Mistral3ForConditionalGeneration

from diffusers import Flux2Pipeline, Flux2Transformer2DModel

repo_id = "diffusers/FLUX.2-dev-bnb-4bit"
device = "cuda:0"
torch_dtype = torch.bfloat16

transformer = Flux2Transformer2DModel.from_pretrained(
repo_id, subfolder="transformer", torch_dtype=torch_dtype, device_map="cpu"
)
text_encoder = Mistral3ForConditionalGeneration.from_pretrained(
repo_id, subfolder="text_encoder", dtype=torch_dtype, device_map="cpu"
)

pipe = Flux2Pipeline.from_pretrained(
repo_id, transformer=transformer, text_encoder=text_encoder, torch_dtype=torch_dtype
)
pipe.enable_model_cpu_offload()

prompt = "Realistic macro photograph of a hermit crab using a soda can as its shell, partially emerging from the can, captured with sharp detail and natural colors, on a sunlit beach with soft shadows and a shallow depth of field, with blurred ocean waves in the background. The can has the text BFL Diffusers on it and it has a color gradient that start with #FF5733 at the top and transitions to #33FF57 at the bottom."

image = pipe(
prompt=prompt,
generator=torch.Generator(device=device).manual_seed(42),
num_inference_steps=50, # 28 is a good trade-off
guidance_scale=4,
).images[0]

image.save("flux2_t2i_nf4.png")

Flax classes are deprecated and will be removed in Diffusers v1.0.0. We recommend migrating to PyTorch classes or pinning your version of Diffusers.
Flax classes are deprecated and will be removed in Diffusers v1.0.0. We recommend migrating to PyTorch classes or pinning your version of Diffusers.
/usr/local/lib/python3.12/dist-packages/huggingface_hub/utils/_validators.py:206: UserWarning: The local_dir_use_symlinks argument is deprecated and ignored in hf_hub_download. Downloading to a local directory does not use symlinks anymore.
warnings.warn(
Download complete: 0.00/0.00 [00:00<?, ?B/s]Fetching 2 files: 100% 2/2 [00:00<00:00, 98.80it/s]Loading checkpoint shards: 100% 2/2 [00:01<00:00, 2.03it/s]Download complete: 0.00/0.00 [00:00<?, ?B/s]Fetching 4 files: 100% 4/4 [00:00<00:00, 167.57it/s]Loading weights: 100% 585/585 [00:02<00:00, 310.91it/s, Materializing param=model.vision_tower.transformer.layers.23.ffn_norm.weight]The tied weights mapping and config for this model specifies to tie model.language_model.embed_tokens.weight to lm_head.weight, but both are present in the checkpoints, so we will NOT tie them. You should update the config with tie_word_embeddings=False to silence this warning
Loading pipeline components...: 100% 5/5 [00:03<00:00, 1.14it/s]---------------------------------------------------------------------------
OutOfMemoryError Traceback (most recent call last)
/tmp/ipykernel_18947/863729753.py in <cell line: 0>()
22 prompt = "Realistic macro photograph of a hermit crab using a soda can as its shell, partially emerging from the can, captured with sharp detail and natural colors, on a sunlit beach with soft shadows and a shallow depth of field, with blurred ocean waves in the background. The can has the text BFL Diffusers on it and it has a color gradient that start with #FF5733 at the top and transitions to #33FF57 at the bottom."
23
---> 24 image = pipe(
25 prompt=prompt,
26 generator=torch.Generator(device=device).manual_seed(42),

36 frames/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py in decorate_context(*args, **kwargs)
122 # pyrefly: ignore [bad-context-manager]
123 with ctx_factory():
--> 124 return func(*args, **kwargs)
125
126 return decorate_context

/usr/local/lib/python3.12/dist-packages/diffusers/pipelines/flux2/pipeline_flux2.py in call(self, image, prompt, height, width, num_inference_steps, sigmas, guidance_scale, num_images_per_prompt, generator, latents, prompt_embeds, output_type, return_dict, attention_kwargs, callback_on_step_end, callback_on_step_end_tensor_inputs, max_sequence_length, text_encoder_out_layers, caption_upsample_temperature)
869 prompt, images=image, temperature=caption_upsample_temperature, device=device
870 )
--> 871 prompt_embeds, text_ids = self.encode_prompt(
872 prompt=prompt,
873 prompt_embeds=prompt_embeds,

/usr/local/lib/python3.12/dist-packages/diffusers/pipelines/flux2/pipeline_flux2.py in encode_prompt(self, prompt, device, num_images_per_prompt, prompt_embeds, max_sequence_length, text_encoder_out_layers)
586
587 if prompt_embeds is None:
--> 588 prompt_embeds = self._get_mistral_3_small_prompt_embeds(
589 text_encoder=self.text_encoder,
590 tokenizer=self.tokenizer,

/usr/local/lib/python3.12/dist-packages/diffusers/pipelines/flux2/pipeline_flux2.py in _get_mistral_3_small_prompt_embeds(text_encoder, tokenizer, prompt, dtype, device, max_sequence_length, system_message, hidden_states_layers)
337
338 # Forward pass through the model
--> 339 output = text_encoder(
340 input_ids=input_ids,
341 attention_mask=attention_mask,

/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py in _wrapped_call_impl(self, *args, **kwargs)
1774 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1775 else:
-> 1776 return self._call_impl(*args, **kwargs)
1777
1778 # torchrec tests the code consistency with the following code

/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, **kwargs)
1785 or _global_backward_pre_hooks or _global_backward_hooks
1786 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1787 return forward_call(*args, **kwargs)
1788
1789 result = None

/usr/local/lib/python3.12/dist-packages/accelerate/hooks.py in new_forward(module, *args, **kwargs)
190 output = module._old_forward(*args, **kwargs)
191 else:
--> 192 output = module._old_forward(*args, **kwargs)
193 return module._hf_hook.post_forward(module, output)
194

/usr/local/lib/python3.12/dist-packages/transformers/utils/generic.py in wrapper(self, *args, **kwargs)
1000 outputs = func(self, *args, **kwargs)
1001 else:
-> 1002 outputs = func(self, *args, **kwargs)
1003 except TypeError as original_exception:
1004 # If we get a TypeError, it's possible that the model is not receiving the recordable kwargs correctly.

/usr/local/lib/python3.12/dist-packages/transformers/models/mistral3/modeling_mistral3.py in forward(self, input_ids, pixel_values, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict, cache_position, logits_to_keep, image_sizes, **kwargs)
444 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
445
--> 446 outputs = self.model(
447 input_ids=input_ids,
448 pixel_values=pixel_values,

/usr/local/lib/python3.12/dist-packages/transformers/models/mistral3/modeling_mistral3.py in forward(self, input_ids, pixel_values, attention_mask, position_ids, past_key_values, inputs_embeds, vision_feature_layer, use_cache, output_attentions, output_hidden_states, return_dict, cache_position, image_sizes, **kwargs)
323 inputs_embeds = inputs_embeds.masked_scatter(special_image_mask, image_features)
324
--> 325 outputs = self.language_model(
326 attention_mask=attention_mask,
327 position_ids=position_ids,

/usr/local/lib/python3.12/dist-packages/transformers/models/mistral/modeling_mistral.py in forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, use_cache, cache_position, **kwargs)
395
396 for decoder_layer in self.layers[: self.config.num_hidden_layers]:
--> 397 hidden_states = decoder_layer(
398 hidden_states,
399 attention_mask=causal_mask,

/usr/local/lib/python3.12/dist-packages/transformers/modeling_layers.py in call(self, *args, **kwargs)
91
92 return self._gradient_checkpointing_func(partial(super().call, **kwargs), *args)
---> 93 return super().call(*args, **kwargs)
94
95

/usr/local/lib/python3.12/dist-packages/transformers/utils/generic.py in wrapped_forward(*args, **kwargs)
953 if key == "hidden_states" and len(collected_outputs[key]) == 0:
954 collected_outputs[key] += (args[0],)
--> 955 output = orig_forward(*args, **kwargs)
956 if not isinstance(output, tuple):
957 collected_outputs[key] += (output,)

/usr/local/lib/python3.12/dist-packages/transformers/models/mistral/modeling_mistral.py in forward(self, hidden_states, attention_mask, position_ids, past_key_values, use_cache, cache_position, position_embeddings, **kwargs)
228 hidden_states = self.input_layernorm(hidden_states)
229 # Self Attention
--> 230 hidden_states, _ = self.self_attn(
231 hidden_states=hidden_states,
232 attention_mask=attention_mask,

/usr/local/lib/python3.12/dist-packages/transformers/models/mistral/modeling_mistral.py in forward(self, hidden_states, position_embeddings, attention_mask, past_key_values, cache_position, **kwargs)
151 hidden_shape = (*input_shape, -1, self.head_dim)
152
--> 153 query_states = self.q_proj(hidden_states).view(hidden_shape).transpose(1, 2)
154 key_states = self.k_proj(hidden_states).view(hidden_shape).transpose(1, 2)
155 value_states = self.v_proj(hidden_states).view(hidden_shape).transpose(1, 2)

/usr/local/lib/python3.12/dist-packages/bitsandbytes/nn/modules.py in forward(self, x)
554 weight = self.weight if getattr(quant_state, "packing_format_for_cpu", False) else self.weight.t()
555
--> 556 return bnb.matmul_4bit(x, weight, bias=bias, quant_state=quant_state).to(inp_dtype)
557
558

/usr/local/lib/python3.12/dist-packages/bitsandbytes/autograd/_functions.py in matmul_4bit(A, B, quant_state, out, bias)
399 return out
400 else:
--> 401 return MatMul4Bit.apply(A, B, out, bias, quant_state)

/usr/local/lib/python3.12/dist-packages/torch/autograd/function.py in apply(cls, *args, **kwargs)
581 # See NOTE: [functorch vjp and autograd interaction]
582 args = _functorch.utils.unwrap_dead_wrappers(args)
--> 583 return super().apply(*args, **kwargs) # type: ignore[misc]
584
585 if not is_setup_ctx_defined:

/usr/local/lib/python3.12/dist-packages/bitsandbytes/autograd/_functions.py in forward(ctx, A, B, out, bias, quant_state)
313 # 1. Dequantize
314 # 2. MatmulnN
--> 315 output = torch.nn.functional.linear(A, F.dequantize_4bit(B, quant_state).to(A.dtype).t(), bias)
316
317 # 3. Save state

/usr/local/lib/python3.12/dist-packages/bitsandbytes/functional.py in dequantize_4bit(A, quant_state, absmax, out, blocksize, quant_type)
1048 )
1049 else:
-> 1050 out = torch.ops.bitsandbytes.dequantize_4bit.default(
1051 A,
1052 absmax,

/usr/local/lib/python3.12/dist-packages/torch/_ops.py in call(self, *args, **kwargs)
817 # that are named "self". This way, all the aten ops can be called by kwargs.
818 def call(self, /, *args: _P.args, **kwargs: _P.kwargs) -> _T:
--> 819 return self._op(*args, **kwargs)
820
821 # Use positional-only argument to avoid naming collision with aten ops arguments

/usr/local/lib/python3.12/dist-packages/torch/_compile.py in inner(*args, **kwargs)
52 fn.__dynamo_disable = disable_fn # type: ignore[attr-defined]
53
---> 54 return disable_fn(*args, **kwargs)
55
56 return inner

/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py in _fn(*args, **kwargs)
1179 ):
1180 return fn(*args, **kwargs)
-> 1181 return fn(*args, **kwargs)
1182 finally:
1183 set_eval_frame(None)

/usr/local/lib/python3.12/dist-packages/torch/library.py in func_no_dynamo(*args, **kwargs)
740 @torch ._disable_dynamo
741 def func_no_dynamo(*args, **kwargs):
--> 742 return func(*args, **kwargs)
743
744 for key in keys:

/usr/local/lib/python3.12/dist-packages/bitsandbytes/backends/cuda/ops.py in _(A, absmax, blocksize, quant_type, shape, dtype)
361 dtype: torch.dtype,
362 ) -> torch.Tensor:
--> 363 out = torch.empty(shape, dtype=dtype, device=A.device)
364 _dequantize_4bit_impl(A, absmax, blocksize, quant_type, dtype, out=out)
365 return out

OutOfMemoryError: CUDA out of memory. Tried to allocate 40.00 MiB. GPU 0 has a total capacity of 14.56 GiB of which 17.81 MiB is free. Including non-PyTorch memory, this process has 14.54 GiB memory in use. Of the allocated memory 14.40 GiB is allocated by PyTorch, and 15.19 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)