搜索 — ResearchTracker

Robotic manipulation, a key frontier in robotics and embodied AI, requires precise motor control and multimodal understanding, yet traditional rule-based methods fail to scale or generalize in unstructured, novel environments. In recent years, Vision-Language-Action (VLA) models, built upon Large Vision-Language Models (VLMs) pretrained on vast image-text datasets, have emerged as a transformative paradigm. This survey provides the first systematic, taxonomy-oriented review of large VLM-based VLA models for robotic manipulation. We begin by clearly defining large VLM-based VLA models and delineating two principal architectural paradigms: (1) monolithic models, encompassing single-system and dual-system designs with differing levels of integration; and (2) hierarchical models, which explicitly decouple planning from execution via interpretable intermediate representations. Building on this foundation, we present an in-depth examination of large VLM-based VLA models: (1) integration with advanced domains, including reinforcement learning, training-free optimization, learning from human videos, and world model integration; (2) synthesis of distinctive characteristics, consolidating ar

Compute Only Once: UG-Separation for Efficient Large Recommendation Models

arXiv2026-02-11作者：Hui Lu, Zheng Chai, Shipeng Bai

Driven by scaling laws, recommender systems increasingly rely on larger-scale models to capture complex feature interactions and user behaviors, but this trend also leads to prohibitive training and inference costs. While long-sequence models can reuse user-side computation through KV Caching, such reuse is difficult in TokenMixer-based dense feature interaction architectures, where user and group features are deeply entangled and mixed-up across layers. In this work, we present User-Group Separation (UG-Sep), an industrial large-scale framework that enables user-side computation reusable in TokenMixer-based dense interaction models for the first time. UG-Sep explicitly disentangles user-side and item-side information flows within token-mixing layers, ensuring that a subset of tokens preserves purely user-side representations across layers. This design allows the corresponding per-token computations to be reused across multiple samples, significantly reducing redundant inference cost. To compensate for the potential expressive capacity loss induced by masking, we further propose an Information Compensation strategy that adaptively reconstructs suppressed user-item interactions. Mor

搜索结果：Large

Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey

Compute Only Once: UG-Separation for Efficient Large Recommendation Models

Large Deviations Principle for Isoperimetry and Its Equivalence to Nonlinear Log-Sobolev Inequalities

Large Language Models Reasoning Abilities Under Non-Ideal Conditions After RL-Fine-Tuning

Item Level Exploration Traffic Allocation in Large-scale Recommendation Systems

ActionStudio: A Lightweight Framework for Data and Training of Large Action Models

Lecture notes on large deviations in non-equilibrium diffusive systems

Enhancing Human-Like Responses in Large Language Models

OphGLM: Training an Ophthalmology Large Language-and-Vision Assistant based on Instructions and Dialogue

Large Language Models Lack Understanding of Character Composition of Words

Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents

Fine-tuning with Very Large Dropout

A Critical Review of Causal Reasoning Benchmarks for Large Language Models

Is Self-knowledge and Action Consistent or Not: Investigating Large Language Model's Personality

Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models

Self-Cognition in Large Language Models: An Exploratory Study

Unmasking the Shadows of AI: Investigating Deceptive Capabilities in Large Language Models

PruneVid: Visual Token Pruning for Efficient Video Large Language Models

Sparks of Large Audio Models: A Survey and Outlook

Causal Reasoning in Large Language Models: A Knowledge Graph Approach