搜索结果：Understanding

共找到 20 条结果

高级筛选 ▾

Video Understanding: From Geometry and Semantics to Unified Models

arXiv

Video understanding aims to enable models to perceive, reason about, and interact with the dynamic visual world. In contrast to image understanding, video understanding inherently requires modeling temporal dynamics and evolving visual context, placing stronger demands on spatiotemporal reasoning and making it a foundational problem in computer vision. In this survey, we present a structured overview of video understanding by organizing the literature into three complementary perspectives: low-level video geometry understanding, high-level semantic understanding, and unified video understanding models. We further highlight a broader shift from isolated, task-specific pipelines toward unified modeling paradigms that can be adapted to diverse downstream objectives, enabling a more systematic view of recent progress. By consolidating these perspectives, this survey provides a coherent map of the evolving video understanding landscape, summarizes key modeling trends and design principles, and outlines open challenges toward building robust, scalable, and unified video foundation models.

A Contextual Approach to Technological Understanding and Its Assessment

arXiv2025-03-27作者：Eline de Jong, Sebastian De Haro

Technological understanding is not a singular concept but varies depending on context. Building on De Jong and De Haro's (2025) notion of technological understanding as the ability to realise an aim through the use of a technological artefact, this paper refines the concept as an ability that differs by context and degree. We extend the original specification developed for a design context by introducing two additional contexts: operation and innovation. Each context represents a distinct way of realising an aim through technology, yielding three types of technological understanding. To clarify the nature of technological understanding further, we propose an assessment framework based on counterfactual reasoning. Each type of understanding is associated with the ability to answer a specific set of what-if questions concerning changes in an artefact's structure, performance, or appropriateness. Distinguishing these different types helps focus efforts to improve technological understanding, clarifies the epistemic requirements of different forms of engagement with technology, and supports a pluralistic perspective on expertise.

搜索结果：Understanding

Video Understanding: From Geometry and Semantics to Unified Models

A Contextual Approach to Technological Understanding and Its Assessment

Harnessing Synthetic Preference Data for Enhancing Temporal Understanding of Video-LLMs

A Simple Baseline for Unifying Understanding, Generation, and Editing via Vanilla Next-token Prediction

EHWGesture -- A dataset for multimodal understanding of clinical gestures

Pisces: An Auto-regressive Foundation Model for Image Understanding and Generation

Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation

HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering

CoS: Chain-of-Shot Prompting for Long Video Understanding

TopoMaskV3: 3D Mask Head with Dense Offset and Height Predictions for Road Topology Understanding

Deep Learning Perspective of Scene Understanding in Autonomous Robots

One missing piece in Vision and Language: A Survey on Comics Understanding

Vision-Based Natural Language Scene Understanding for Autonomous Driving: An Extended Dataset and a New Model for Traffic Scene Description Generation

Probing Vision-Language Understanding through the Visual Entailment Task: promises and pitfalls

Incorporating Contextual Paralinguistic Understanding in Large Speech-Language Models

QueSTMaps: Queryable Semantic Topological Maps for 3D Scene Understanding

VLM@school -- Evaluation of AI image understanding on German middle school knowledge

Towards Fine-Grained Emotion Understanding via Skeleton-Based Micro-Gesture Recognition

CHATTER: A Character Attribution Dataset for Narrative Understanding

How the Solar Dynamics Observatory Revolutionized our Physical Understanding of the Sun