搜索 — ResearchTracker

We introduce Reangle-A-Video, a unified framework for generating synchronized multi-view videos from a single input video. Unlike mainstream approaches that train multi-view video diffusion models on large-scale 4D datasets, our method reframes the multi-view video generation task as video-to-videos translation, leveraging publicly available image and video diffusion priors. In essence, Reangle-A-Video operates in two stages. (1) Multi-View Motion Learning: An image-to-video diffusion transformer is synchronously fine-tuned in a self-supervised manner to distill view-invariant motion from a set of warped videos. (2) Multi-View Consistent Image-to-Images Translation: The first frame of the input video is warped and inpainted into various camera perspectives under an inference-time cross-view consistency guidance using DUSt3R, generating multi-view consistent starting images. Extensive experiments on static view transport and dynamic camera control show that Reangle-A-Video surpasses existing methods, establishing a new solution for multi-view video generation. We will publicly release our code and data. Project page: https://hyeonho99.github.io/reangle-a-video/

Video-R1: Reinforcing Video Reasoning in MLLMs

arXiv2025-03-27作者：Kaituo Feng, Kaixiong Gong, Bohao Li

Inspired by DeepSeek-R1's success in eliciting reasoning abilities through rule-based reinforcement learning (RL), we introduce Video-R1 as the first attempt to systematically explore the R1 paradigm for incentivizing video reasoning within multimodal large language models (MLLMs). However, directly applying RL training with the GRPO algorithm to video reasoning presents two primary challenges: (i) a lack of temporal modeling for video reasoning, and (ii) the scarcity of high-quality video-reasoning data. To address these issues, we first propose the T-GRPO algorithm, which encourages models to utilize temporal information in videos for reasoning. Additionally, instead of relying solely on video data, we incorporate high-quality image-reasoning data into the training process. We have constructed two datasets: Video-R1-CoT-165k for SFT cold start and Video-R1-260k for RL training, both comprising image and video data. Experimental results demonstrate that Video-R1 achieves significant improvements on video reasoning benchmarks such as VideoMMMU and VSI-Bench, as well as on general video benchmarks including MVBench and TempCompass, etc. Notably, Video-R1-7B attains a 37.1% accuracy

搜索结果：Video

Reangle-A-Video: 4D Video Generation as Video-to-Video Translation

Video-R1: Reinforcing Video Reasoning in MLLMs

Unified Video Action Model

Video-Oasis: Rethinking Evaluation of Video Understanding

S^2VG: 3D Stereoscopic and Spatial Video Generation via Denoising Frame Matrix

Moment Sampling in Video LLMs for Long-Form Video QA

ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning

SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix

Progressive Growing of Video Tokenizers for Temporally Compact Latent Spaces

DQ-Ladder: A Deep Reinforcement Learning-based Bitrate Ladder for Adaptive Video Streaming

Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO

CompressedVQA-HDR: Generalized Full-reference and No-reference Quality Assessment Models for Compressed High Dynamic Range Videos

Imagen Video: High Definition Video Generation with Diffusion Models

Prompting Video-Language Foundation Models with Domain-specific Fine-grained Heuristics for Video Question Answering

Video-T1: Test-Time Scaling for Video Generation

FreeMask: Rethinking the Importance of Attention Masks for Zero-Shot Video Editing

Video Language Planning

Learned Scalable Video Coding For Humans and Machines

Learning to Compress Unmanned Aerial Vehicle (UAV) Captured Video: Benchmark and Analysis

Video-As-Prompt: Unified Semantic Control for Video Generation