搜索 — ResearchTracker

Learning to Defer (L2D) enables a classifier to abstain from predictions and defer to an expert, and has recently been extended to multi-expert settings. In this work, we show that multi-expert L2D is fundamentally more challenging than the single-expert case. With multiple experts, the classifier's underfitting becomes inherent, which seriously degrades prediction performance, whereas in the single-expert setting it arises only under specific conditions. We theoretically reveal that this stems from an intrinsic expert identifiability issue: learning which expert to trust from a diverse pool, a problem absent in the single-expert case and renders existing underfitting remedies failed. To tackle this issue, we propose PiCCE (Pick the Confident and Correct Expert), a surrogate-based method that adaptively identifies a reliable expert based on empirical evidence. PiCCE effectively reduces multi-expert L2D to a single-expert-like learning problem, thereby resolving multi expert underfitting. We further prove its statistical consistency and ability to recover class probabilities and expert accuracies. Extensive experiments across diverse settings, including real-world expert scenarios,

Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

arXiv2025-12-29作者：Ang Lv, Jin Ma, Yiyuan Ma

Mixture-of-Experts (MoE) models lack explicit constraints to ensure the router's decisions align well with the experts' capabilities, which ultimately limits model performance. To address this, we propose expert-router coupling (ERC) loss, a lightweight auxiliary loss that tightly couples the router's decisions with expert capabilities. Our approach treats each expert's router embedding as a proxy token for the tokens assigned to that expert, and feeds perturbed router embeddings through the experts to obtain intermediate activations. The ERC loss enforces two constraints on these activations: (1) Each expert must exhibit higher activation for its own proxy token than for the proxy tokens of any other expert. (2) Each proxy token must elicit stronger activation from its corresponding expert than from any other expert. These constraints jointly ensure that each router embedding faithfully represents its corresponding expert's capability, while each expert specializes in processing the tokens actually routed to it. The ERC loss is computationally efficient, operating only on $n^2$ activations, where $n$ is the number of experts. This represents a fixed cost independent of batch size,

搜索结果：Expert

When More Experts Hurt: Underfitting in Multi-Expert Learning to Defer

Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging

TENP: Trapezoidal Expert Neuron Pruning For Mixture-of-Experts

Beyond Routing: Characterising Expert Tuning and Representation in Vision Mixture-of-Experts

EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models

MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts

The Expert Strikes Back: Interpreting Mixture-of-Experts Language Models at Expert Level

ReXMoE: Reusing Experts with Minimal Overhead in Mixture-of-Experts

GatePro: Parameter-Free Expert Selection Optimization for Mixture-of-Experts Models

LightMoE: Reducing Mixture-of-Experts Redundancy through Expert Replacing

An Expert is Worth One Token: Synergizing Multiple Expert LLMs as Generalist via Expert Token Routing

Expert Merging in Sparse Mixture of Experts with Nash Bargaining

Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference

Expert Mind: A Retrieval-Augmented Architecture for Expert Knowledge Preservation in the Energy Sector

Learning More Generalized Experts by Merging Experts in Mixture-of-Experts

Mixture-of-Experts with Expert Choice Routing

A hybrid approach for building fuzzy numbers based on data and expert knowledge

Expert Preference-based Evaluation of Automated Related Work Generation

Improving Expert Specialization in Mixture of Experts