搜索结果：Youre

共找到 20 条结果

高级筛选 ▾

World In Your Hands: A Large-Scale and Open-Source Ecosystem for Learning Human-Centric Manipulation in the Wild

arXiv2025-12-30作者：Yupeng Zheng, Jichao Peng, Weize Li

We introduce World In Your Hands (WIYH), a large-scale open-source ecosystem comprising over 1,000 hours of human manipulation data collected in-the-wild with millimeter-scale motion accuracy. Specifically, WIYH includes (1) the Oracle Suite, a wearable data collection kit with an auto-labeling pipeline for accurate motion capture; (2) the WIYH Dataset, featuring over 1,000 hours of multimodal manipulation data across hundreds of skills in diverse real-world scenarios; and (3) extensive annotations and benchmarks supporting tasks from perception to action. Furthermore, experiments based on the WIYH ecosystem show that integrating WIYH's human-centric data improves robotic manipulation success rates from 8% to 60% in cluttered scenes. World In Your Hands provides a foundation for advancing human-centric data collection and cross-embodiment policy learning. All data and hardware design will be open-source.

Follow-Your-Instruction: A Comprehensive MLLM Agent for World Data Synthesis

arXiv2025-08-07作者：Kunyu Feng, Yue Ma, Xinhua Zhang

With the growing demands of AI-generated content (AIGC), the need for high-quality, diverse, and scalable data has become increasingly crucial. However, collecting large-scale real-world data remains costly and time-consuming, hindering the development of downstream applications. While some works attempt to collect task-specific data via a rendering process, most approaches still rely on manual scene construction, limiting their scalability and accuracy. To address these challenges, we propose Follow-Your-Instruction, a Multimodal Large Language Model (MLLM)-driven framework for automatically synthesizing high-quality 2D, 3D, and 4D data. Our \textbf{Follow-Your-Instruction} first collects assets and their associated descriptions through multimodal inputs using the MLLM-Collector. Then it constructs 3D layouts, and leverages Vision-Language Models (VLMs) for semantic refinement through multi-view scenes with the MLLM-Generator and MLLM-Optimizer, respectively. Finally, it uses MLLM-Planner to generate temporally coherent future frames. We evaluate the quality of the generated data through comprehensive experiments on the 2D, 3D, and 4D generative tasks. The results show that our sy

搜索结果：Youre

World In Your Hands: A Large-Scale and Open-Source Ecosystem for Learning Human-Centric Manipulation in the Wild

Follow-Your-Instruction: A Comprehensive MLLM Agent for World Data Synthesis

10 Simple Rules for Improving Your Standardized Fields and Terms

Follow-Your-Emoji-Faster: Towards Efficient, Fine-Controllable, and Expressive Freestyle Portrait Animation

Talk to Your Slides: High-Efficiency Slide Editing via Language-Driven Structured Data Manipulation

Video, How Do Your Tokens Merge?

Follow-Your-Motion: Video Motion Transfer via Efficient Spatial-Temporal Decoupled Finetuning

Control Your Robot: A Unified System for Robot Control and Policy Deployment

Your Reward Function for RL is Your Best PRM for Search: Unifying RL and Search-Based TTS

Edit-Your-Interest: Efficient Video Editing via Feature Most-Similar Propagation

Follow-Your-Color: Multi-Instance Sketch Colorization

Hear-Your-Click: Interactive Object-Specific Video-to-Audio Generation

Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing

Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework

Not Sure Your Car Withstands Cyberwarfare

Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation

DeepOps &amp; SLURM: Your GPU Cluster Guide

Wanna hear your voice? A sample is all we need!

Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation

Ten (or more!) reasons to register your software with the Astrophysics Source Code Library

DeepOps & SLURM: Your GPU Cluster Guide