搜索 — ResearchTracker

Multimodal Large Language Models (MLLMs) have demonstrated strong performance on the UI-to-code task, which aims to generate UI code from design mock-ups. However, when applied to long and complex websites, they often struggle with fragmented segmentation, redundant code generation for repetitive components, and frequent UI inconsistencies. To systematically investigate and address these challenges, we introduce ComUIBench, a new multi-page complex webpage benchmark with component annotations, designed to evaluate MLLMs' ability to generate reusable UI code in realistic website scenarios. Building upon this benchmark, we propose ComUICoder, a component-based UI code generation framework that emphasizes semantic-aware segmentation, code reuse, and fine-grained refinement. Specifically, ComUICoder incorporates (1) Hybrid Semantic-aware Block Segmentation for accurate UI semantic coherent block detection, (2) Visual-aware Graph-based Block Merge to consolidate structurally similar components within and across webpages for reusable implementation, and (3) Priority-based Element-wise Feedback to refine generated code and reduce element-level inconsistencies. Extensive experiments demons

What's Missing in Screen-to-Action? Towards a UI-in-the-Loop Paradigm for Multimodal GUI Reasoning

arXiv2026-04-08作者：Songze Li, Xiaoke Guo, Tianqi Liu

Existing Graphical User Interface (GUI) reasoning tasks remain challenging, particularly in UI understanding. Current methods typically rely on direct screen-based decision-making, which lacks interpretability and overlooks a comprehensive understanding of UI elements, ultimately leading to task failure. To enhance the understanding and interaction with UIs, we propose an innovative GUI reasoning paradigm called UI-in-the-Loop (UILoop). Our approach treats the GUI reasoning task as a cyclic Screen-UI elements-Action process. By enabling Multimodal Large Language Models (MLLMs) to explicitly learn the localization, semantic functions, and practical usage of key UI elements, UILoop achieves precise element discovery and performs interpretable reasoning. Furthermore, we introduce a more challenging UI Comprehension task centered on UI elements with three evaluation metrics. Correspondingly, we contribute a benchmark of 26K samples (UI Comprehension-Bench) to comprehensively evaluate existing methods' mastery of UI elements. Extensive experiments demonstrate that UILoop achieves state-of-the-art UI understanding performance while yielding superior results in GUI reasoning tasks.

搜索结果：Performative-UI

ComUICoder: Component-based Reusable UI Code Generation for Complex Websites via Semantic Segmentation and Element-wise Feedback

What's Missing in Screen-to-Action? Towards a UI-in-the-Loop Paradigm for Multimodal GUI Reasoning

UI-Venus-1.5 Technical Report

UI Placement as a Critical Design Factor for Augmented Reality During Locomotion

FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection

Beyond Screenshots: Evaluating VLMs' Understanding of UI Animations

UIS-Digger: Towards Comprehensive Research Agent Systems for Real-world Unindexed Information Seeking

UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience

UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning

LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training

RWKV-UI: UI Understanding with Enhanced Perception and Reasoning

UI2Code^N: UI-to-Code Generation as Interactive Visual Optimization

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction

UI-Venus Technical Report: Building High-performance UI Agents with RFT

Magentic-UI: Towards Human-in-the-loop Agentic Systems

UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning

From User Interface to Agent Interface: Efficiency Optimization of UI Representations for LLM Agents

UI-Evol: Automatic Knowledge Evolving for Computer Use Agents

Toward Autonomous UI Exploration: The UIExplorer Benchmark