Open image generation models are closer to closed-source quality than this sub thinks [D]
Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.
I run evaluations on generative image models as part of my workflow, mostly comparing coherence, prompt adherence, and compositional accuracy across different architectures. The consensus here seems to be that open models are still a generation behind closed APIs. Based on my recent benchmarks, that gap is way smaller than people assume.
On compositional control specifically, the latest open checkpoints handle multi-object scenes with spatial relationships about as reliably as the paid endpoints I've tested. Not perfect, but close enough that the failure modes are comparable. The thing that surprised me was text rendering in images, which used to be a disaster on open models. Recent architectures actually get it right roughly 70-80% of the time on short strings.
Generation speed is another misconception. People complain about inference time but I'm getting 2MP outputs in under two minutes on a single consumer GPU. Drop resolution and step count and you're at 30 seconds. Fine for iteration.
The structured prompting argument also falls flat. Everyone acts like having explicit scene control is a downside when it's literally what production pipelines need. Unstructured text prompts are the hack, not the other way around.
These models ship without community optimizations, no fine-tuning, no custom pipelines. The baseline is already competitive.
[link] [comments]
More from r/MachineLearning
-
Greater than 80% of researchers at CVPR are chinese. This speak volumes on the chinese nexus in research, and something needs to be done about it. [D]
Jun 8
-
How to find research opportunities in area of interest? [D]
Jun 8
-
ICML rejected paper visibility [D]
Jun 8
-
Software and ops skills for data scientists[D]
Jun 8
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.