搜索 — ResearchTracker

Existing Graphical User Interface (GUI) reasoning tasks remain challenging, particularly in UI understanding. Current methods typically rely on direct screen-based decision-making, which lacks interpretability and overlooks a comprehensive understanding of UI elements, ultimately leading to task failure. To enhance the understanding and interaction with UIs, we propose an innovative GUI reasoning paradigm called UI-in-the-Loop (UILoop). Our approach treats the GUI reasoning task as a cyclic Screen-UI elements-Action process. By enabling Multimodal Large Language Models (MLLMs) to explicitly learn the localization, semantic functions, and practical usage of key UI elements, UILoop achieves precise element discovery and performs interpretable reasoning. Furthermore, we introduce a more challenging UI Comprehension task centered on UI elements with three evaluation metrics. Correspondingly, we contribute a benchmark of 26K samples (UI Comprehension-Bench) to comprehensively evaluate existing methods' mastery of UI elements. Extensive experiments demonstrate that UILoop achieves state-of-the-art UI understanding performance while yielding superior results in GUI reasoning tasks.

AlignUI: A Method for Designing LLM-Generated UIs Aligned with User Preferences

arXiv2026-01-24作者：Yimeng Liu, Misha Sra, Chang Xiao

Designing user interfaces that align with user preferences is a time-consuming process, which requires iterative cycles of prototyping, user testing, and refinement. Recent advancements in LLM-based UI generation have enabled efficient UI generation to assist the UI design process. We introduce AlignUI, a method that aligns LLM-generated UIs with user tasks and preferences by using a user preference dataset to guide the LLM's reasoning process. The dataset was crowdsourced from 50 general users (the target users of generated UIs) and contained 720 UI control preferences on eight image-editing tasks. We evaluated AlignUI by generating UIs for six unseen tasks and conducting a user study with 72 additional general users. The results showed that the generated UIs closely align with multiple dimensions of user preferences. We conclude by discussing the applicability of our method to support user-aligned UI design for multiple task domains and user groups, as well as personalized user needs.

搜索结果：Ui sahak

What's Missing in Screen-to-Action? Towards a UI-in-the-Loop Paradigm for Multimodal GUI Reasoning

AlignUI: A Method for Designing LLM-Generated UIs Aligned with User Preferences

A Rule-Based Approach for UI Migration from Android to iOS

Macaron-A2UI: A Model for Generative UI in Personal Agents

UISim: An Interactive Image-Based UI Simulator for Dynamic Mobile Environments

UI-Mem: Self-Evolving Experience Memory for Online Reinforcement Learning in Mobile GUI Agents

UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

Lexi: Self-Supervised Learning of the UI Language

Agent+P: Guiding UI Agents via Symbolic Planning

UI Semantic Group Detection: Grouping UI Elements with Similar Semantics in Mobile Graphical User Interface

Generative UI: LLMs are Effective UI Generators

UI-Venus Technical Report: Building High-performance UI Agents with RFT

MUD: Towards a Large-Scale and Noise-Filtered UI Dataset for Modern Style UI Modeling

UI2Code^N: UI-to-Code Generation as Interactive Visual Optimization

Toward the Automated Localization of Buggy Mobile App UIs from Bug Descriptions

CrowdGenUI: Aligning LLM-Based UI Generation with Crowdsourced User Preferences

Automating UI Optimization through Multi-Agentic Reasoning

UI Layers Merger: Merging UI layers via Visual Learning and Boundary Prior

Bridging Gulfs in UI Generation through Semantic Guidance

Exploring the Impact of Integrating UI Testing in CI/CD Workflows on GitHub