搜索结果：400k

共找到 20 条结果

高级筛选 ▾

EgoCS-400K: An Egocentric Gameplay Dataset for World Models

arXiv

The shift from video generation to interactive world modeling places new demands on data: beyond captioned videos, world models require temporally aligned video-action-language trajectories grounded in the actions, camera motion, states, and events that drive future scene changes. However, such data is difficult to obtain at scale. Web video datasets offer broad visual coverage but lack executable actions and reliable states; robotic datasets provide action and state supervision but are costly and limited in scene diversity; and existing simulators often lack large-scale human-driven interaction trajectories. In this paper, we introduce EgoCS-400K, a large-scale replay-grounded egocentric Counter-Strike dataset for world models, built from public professional CS and CS2 match demos that preserve human gameplay trajectories and enable parsing, replaying, rendering, and temporal alignment. We extract player states, view directions, movements, keyboard/button inputs, view-angle changes, weapon usage, game events, and round-level context, and render clean first-person videos from the same trajectories. EgoCS-400K contains over 400,000 first-person videos and 10,000 hours of gameplay fr

Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing

arXiv2025-10-22作者：Yusu Qian, Eli Bocek-Rivele, Liangchen Song

Recent advances in multimodal models have demonstrated remarkable text-guided image editing capabilities, with systems like GPT-4o and Nano-Banana setting new benchmarks. However, the research community's progress remains constrained by the absence of large-scale, high-quality, and openly accessible datasets built from real images. We introduce Pico-Banana-400K, a comprehensive 400K-image dataset for instruction-based image editing. Our dataset is constructed by leveraging Nano-Banana to generate diverse edit pairs from real photographs in the OpenImages collection. What distinguishes Pico-Banana-400K from previous synthetic datasets is our systematic approach to quality and diversity. We employ a fine-grained image editing taxonomy to ensure comprehensive coverage of edit types while maintaining precise content preservation and instruction faithfulness through MLLM-based quality scoring and careful curation. Beyond single turn editing, Pico-Banana-400K enables research into complex editing scenarios. The dataset includes three specialized subsets: (1) a 72K-example multi-turn collection for studying sequential editing, reasoning, and planning across consecutive modifications; (2)

搜索结果：400k

EgoCS-400K: An Egocentric Gameplay Dataset for World Models

Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing

ArchCAD-400K: A Large-Scale CAD drawings Dataset and New Baseline for Panoptic Symbol Spotting

ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of Video

Pledging another $400k to the Zig software foundation

Transformers for molecular property prediction: Domain adaptation efficiently improves performance

4DP-QA: Scalable QA for 4D Perception in Vision Language Models

EAGLE: Egocentric AGgregated Language-video Engine

Speedrunning ImageNet Diffusion

When Code Smells Meet ML: On the Lifecycle of ML-specific Code Smells in ML-enabled Systems

When People are Floods: Analyzing Dehumanizing Metaphors in Immigration Discourse with Large Language Models

MathWriting: A Dataset For Handwritten Mathematical Expression Recognition

Arctic-TILT. Business Document Understanding at Sub-Billion Scale

LLMBind: A Unified Modality-Task Integration Framework

SCORE: Syntactic Code Representations for Static Script Malware Detection

OBJECT 3DIT: Language-guided 3D-aware Image Editing

Low-Temperature Thermoelectric Performance and Optoelectronic Properties of Monolayer of WX2N4(X = Si, Ge)

HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling

Hidden Granular Superconductivity Above 500K in off-the-shelf graphite materials

Naamapadam: A Large-Scale Named Entity Annotated Data for Indic Languages