搜索 — ResearchTracker

LLM routing aims to achieve a favorable quality--cost trade-off by dynamically assigning easy queries to smaller models and harder queries to stronger ones. However, across both unimodal and multimodal settings, we uncover a pervasive yet underexplored failure mode in existing routers: as the user's cost budget increases, routers systematically default to the most capable and most expensive model even when cheaper models already suffice. As a result, current routers under-utilize small models, wasting computation and monetary cost and undermining the core promise of routing; we term this phenomenon routing collapse. We attribute routing collapse to an objective--decision mismatch: many routers are trained to predict scalar performance scores, whereas routing decisions ultimately depend on discrete comparisons among candidate models. Consequently, small prediction errors can flip relative orderings and trigger suboptimal selections. To bridge this gap, we propose EquiRouter, a decision-aware router that directly learns model rankings, restoring the role of smaller models and mitigating routing collapse. On RouterBench, EquiRouter reduces cost by about 17\% at GPT-4-level performance

Routers in Vision Mixture of Experts: An Empirical Study

arXiv2024-01-29作者：Tianlin Liu, Mathieu Blondel, Carlos Riquelme

Mixture-of-Experts (MoE) models are a promising way to scale up model capacity without significantly increasing computational cost. A key component of MoEs is the router, which decides which subset of parameters (experts) process which feature embeddings (tokens). In this paper, we present a comprehensive study of routers in MoEs for computer vision tasks. We introduce a unified MoE formulation that subsumes different MoEs with two parametric routing tensors. This formulation covers both sparse MoE, which uses a binary or hard assignment between experts and tokens, and soft MoE, which uses a soft assignment between experts and weighted combinations of tokens. Routers for sparse MoEs can be further grouped into two variants: Token Choice, which matches experts to each token, and Expert Choice, which matches tokens to each expert. We conduct head-to-head experiments with 6 different routers, including existing routers from prior work and new ones we introduce. We show that (i) many routers originally developed for language modeling can be adapted to perform strongly in vision tasks, (ii) in sparse MoE, Expert Choice routers generally outperform Token Choice routers, and (iii) soft Mo

搜索结果：Routers

When Routing Collapses: On the Degenerate Convergence of LLM Routers

Routers in Vision Mixture of Experts: An Empirical Study

Rerouting LLM Routers

Attending to Routers Aids Indoor Wireless Localization

Router Upcycling: Leveraging Mixture-of-Routers in Mixture-of-Experts Upcycling

Optimizing MoE Routers: Design, Implementation, and Evaluation in Transformer Models

RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers

Routers Learn the Geometry of Their Experts: Geometric Coupling in Sparse Mixture-of-Experts

Federate the Router: Learning Language Model Routers with Sparse and Decentralized Evaluations

Scaling Routers with In-Package Optics and High-Bandwidth Memories

Mixture of Routers

The Proxy Knows Too Much: Sealing LLM API Routers with Attested TEEs

The Routing Plateau: Understanding and Breaking the Accuracy Limits of LLM Routers

Redesign Mixture-of-Experts Routers with Manifold Power Iteration

Route to Rome Attack: Directing LLM Routers to Expensive Models via Adversarial Suffix Optimization

Load Balancing Mixture of Experts with Similarity Preserving Routers

XDRI Attacks - and - How to Enhance Resilience of Residential Routers

Faulty towers: recovering a functioning quantum random access memory in the presence of defective routers

Open Source Routers: A Survey

Multipartite multiplexing strategies for quantum routers