Tag

Video Gen

80 articles archived under #video-gen · RSS

Hugging Face Daily Papers research 1mo ago

Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion

Abstract Pantheon360 enables high-fidelity 360° video generation for digital twins by combining 3D-aware diffusion with explicit geometric caching to ensure spatial-temporal consistency. AI-generated summary Generating complete digital twins from videos requires precise camera…

14
Hugging Face Daily Papers research 1mo ago

Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration

Abstract A multi-agent framework called Soap2Soap is presented for long-horizon video-to-video generation that maintains narrative structure and character identity across extended sequences through consistent semantic backbone and visual reference anchors. AI-generated summary…

25
Hugging Face Daily Papers research 1mo ago

Geo-Align: Video Generation Alignment via Metric Geometry Reward

Abstract Geo-Align presents a reinforcement learning framework for camera-controlled video re-rendering that improves generalization through scale-aware perceptual rewards and metric 3D estimation for camera trajectory extraction. AI-generated summary Camera-controlled video…

20
r/LocalLLaMA community 1mo ago

meituan-longcat/LongCat-Video-Avatar-1.5 · Hugging Face

🚀 Model Introduction We are excited to announce the release of LongCat-Video-Avatar 1.5, an upgraded open-source framework that prioritizes extreme empirical optimization and production-readiness for audio-driven human video generation. Built upon the LongCat-Video foundation…

21
Hugging Face Daily Papers research 1mo ago

FlowLong: Inference-time Long Video Generation via Manifold-constrained Tweedie Matching

Abstract A novel inference-time method for long video generation using overlapping sliding windows with Tweedie matching and stochastic early-phase sampling to improve temporal consistency and visual quality. AI-generated summary Extending the generation horizon of video…

11
Hugging Face Daily Papers research 1mo ago

Bernini: Latent Semantic Planning for Video Diffusion

Abstract A unified video generation and editing framework combines multimodal large language models for semantic planning with diffusion models for pixel rendering, achieving state-of-the-art performance through semantic interface separation and enhanced positional embeddings.…

32
Hugging Face Daily Papers research 1mo ago

Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos

Abstract MIGA addresses long video generation challenges by reducing training-inference gaps and enhancing temporal consistency through dual consistency mechanisms. AI-generated summary Without incurring significant computational overhead, train-free long video generation aims…

5
Hugging Face Daily Papers research 1mo ago

Video Models Can Reason with Verifiable Rewards

Abstract VideoRLVR optimizes video diffusion models for verifiable reasoning tasks using reinforcement learning with rule-based rewards, achieving better performance than supervised methods in constraint-satisfying video generation. AI-generated summary Video diffusion models…

11
Hugging Face Daily Papers research 1mo ago

CogOmniControl: Reasoning-Driven Controllable Video Generation via Creative Intent Cognition

Abstract Diffusion models applied in compressed image space generate high-quality images with lower computational cost and support flexible inputs like text or boxes. AI-generated summary Recent diffusion models achieve strong photorealism and fluency in video generation, yet…

37
Hugging Face Daily Papers research 1mo ago

MSAVBench: Towards Comprehensive and Reliable Evaluation of Multi-Shot Audio-Video Generation

Abstract MSAVBench presents the first comprehensive benchmark and adaptive evaluation framework for multi-shot audio-video generation, addressing limitations in existing benchmarks through diverse task settings and advanced evaluation mechanisms. AI-generated summary Video…

25
Hugging Face Daily Papers research 1mo ago

Echo-Forcing: A Scene Memory Framework for Interactive Long Video Generation

Abstract Echo-Forcing addresses limitations in interactive long-video generation by decoupling historical memory and recent dynamics through hierarchical temporal memory, scene recall frames, and difference-aware memory decay mechanisms. AI-generated summary Autoregressive video…

5
Hugging Face official-blog 1mo ago

Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation

Back to Articles Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation Enterprise + Article Published May 18, 2026 Upvote - Ting-Yun Chang ting-yunc nvidia Miguel Martin miguelmartin-nv nvidia Jonathan Allen nv-spectralflight nvidia Ke Ding kding1…

11
Hugging Face Daily Papers research 1mo ago

OmniHumanoid: Streaming Cross-Embodiment Video Generation with Paired-Free Adaptation

Abstract OmniHumanoid enables cross-embodiment video generation by factorizing motion transfer and embodiment-specific adaptation, allowing scalable adaptation to new humanoid embodiments using unpaired data. AI-generated summary Cross-embodiment video generation aims to…

7
TechCrunch — AI news-outlet 1mo ago

Runway started by helping filmmakers. Now it wants to beat Google at AI.

AI video generation startup Runway is betting that video generation is the path to world models. And that being an AI outsider is an advantage, not a liability.

20
Hugging Face Daily Papers research 1mo ago

Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation

Abstract A novel causal consistency distillation method enables efficient frame-wise video generation with reduced latency and improved quality compared to existing chunk-wise approaches. AI-generated summary Real-time interactive video generation requires low-latency,…

6
Hugging Face Daily Papers research 1mo ago

Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video

Abstract A novel approach called Warp-as-History enables camera-controlled video generation by transforming camera-induced warps into pseudo-history representations, achieving zero-shot capability without training or test-time optimization. AI-generated summary Camera-controlled…

18
Hugging Face Daily Papers research 1mo ago

PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation

Abstract PhyMotion introduces a physics-grounded reward system for human motion generation that evaluates kinematic plausibility, contact consistency, and dynamic feasibility to improve video quality. AI-generated summary Generating realistic human motion is a central yet…

12
Hugging Face Daily Papers research 1mo ago

RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO

Abstract RAVEN enables real-time video generation through causal autoregressive extrapolation with improved training alignment, while CM-GRPO enhances performance via reinforcement learning applied to consistency model sampling. AI-generated summary Causal autoregressive video…

20
Hugging Face Daily Papers research 1mo ago

RoboEvolve: Co-Evolving Planner-Simulator for Robotic Manipulation with Limited Data

Abstract RoboEvolve combines vision-language and video generation models in a co-evolutionary framework to enable scalable robotic manipulation with improved data efficiency and continuous learning capabilities. AI-generated summary The scalability of robotic manipulation is…

29
The Information — AI news-outlet 1mo ago

Rent the Runway Cofounder to Step Down as CEO

Rent the Runway cofounder Jennifer Hyman will step down from the CEO role at the end of this week, the clothing rental company said Wednesday. Hyman will remain an advisor to the company through early 2027, and current board member and former Nordstrom executive Teri Bariquit…

35
Hugging Face Daily Papers research 1mo ago

FaithfulFaces: Pose-Faithful Facial Identity Preservation for Text-to-Video Generation

Abstract FaithfulFaces is a pose-faithful facial identity preservation framework that improves identity consistency in text-to-video generation through pose-shared alignment and explicit Euler angle embeddings. AI-generated summary Identity-preserving text-to-video generation…

38
arXiv — NLP / Computation & Language research 1mo ago

PresentAgent-2: Towards Generalist Multimodal Presentation Agents

arXiv:2605.11363v1 Announce Type: cross Abstract: Presentation generation is moving beyond static slide creation toward end-to-end presentation video generation with research grounding, multimodal media, and interactive delivery. We introduce PresentAgent-2, an agentic framework…

30
Hugging Face Daily Papers research 1mo ago

CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives

Abstract CausalCine enables interactive, multi-shot video generation by addressing limitations of autoregressive models through causal modeling, dynamic memory routing, and real-time distillation techniques. AI-generated summary Autoregressive video generation aims at real-time,…

38
Vercel — AI dev-tools 2mo ago

Seedance 2.0 Video Generation on AI Gateway

You can now access Bytedance's latest state-of-the-art video generation model, Seedance 2.0, via AI Gateway with no other provider accounts required. Seedance 2.0 is available on AI Gateway in two variants: Standard and Fast. Both share the same capabilities. Standard produces…

10
Google DeepMind official-blog 5mo ago

Veo 3.1 Ingredients to Video: More consistency, creativity and control

Our latest Veo update generates lively, dynamic clips that feel natural and engaging — and supports vertical video generation.

5
Google DeepMind official-blog 8mo ago

Behind “ANCESTRA”: combining Veo with live-action filmmaking

We partnered with Darren Aronofsky, Eliza McNitt and a team of more than 200 people to make a film using Veo and live-action filmmaking.

10
Google DeepMind official-blog 8mo ago

Introducing Veo 3.1 and advanced creative capabilities

We’re rolling out significant updates to Veo that give people even more creative control.

30
Google DeepMind official-blog 13mo ago

Fuel your creativity with new generative media models and tools

Introducing Veo 3 and Imagen 4, and a new tool for filmmaking called Flow.

12
Google DeepMind official-blog 14mo ago

Generate videos in Gemini and Whisk with Veo 2

Transform text-based prompts into high-resolution eight-second videos in Gemini Advanced and use Whisk Animate to turn images into eight-second animated clips.

26
Lil'Log (Lilian Weng) research 26mo ago

Diffusion Models for Video Generation

Diffusion models have demonstrated strong results on image synthesis in past years. Now the research community has started working on a harder task—using it for video generation. The task itself is a superset of the image case, since an image is a video of 1 frame, and it…

12

Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion

Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration

Geo-Align: Video Generation Alignment via Metric Geometry Reward

meituan-longcat/LongCat-Video-Avatar-1.5 · Hugging Face

FlowLong: Inference-time Long Video Generation via Manifold-constrained Tweedie Matching

Bernini: Latent Semantic Planning for Video Diffusion

Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos

Video Models Can Reason with Verifiable Rewards

CogOmniControl: Reasoning-Driven Controllable Video Generation via Creative Intent Cognition

MSAVBench: Towards Comprehensive and Reliable Evaluation of Multi-Shot Audio-Video Generation

Echo-Forcing: A Scene Memory Framework for Interactive Long Video Generation

Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation

OmniHumanoid: Streaming Cross-Embodiment Video Generation with Paired-Free Adaptation

Runway started by helping filmmakers. Now it wants to beat Google at AI.

Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation

Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video

PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation

RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO

RoboEvolve: Co-Evolving Planner-Simulator for Robotic Manipulation with Limited Data

Rent the Runway Cofounder to Step Down as CEO

FaithfulFaces: Pose-Faithful Facial Identity Preservation for Text-to-Video Generation

PresentAgent-2: Towards Generalist Multimodal Presentation Agents

CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives

Seedance 2.0 Video Generation on AI Gateway

Veo 3.1 Ingredients to Video: More consistency, creativity and control

Behind “ANCESTRA”: combining Veo with live-action filmmaking

Introducing Veo 3.1 and advanced creative capabilities

Fuel your creativity with new generative media models and tools

Generate videos in Gemini and Whisk with Veo 2

Diffusion Models for Video Generation