搜索 — ResearchTracker

Training Mixture-of-Experts (MoE) models introduces sparse and highly imbalanced all-to-all communication that dominates iteration time. Conventional load-balancing methods fail to exploit the deterministic topology of Rail architectures, leaving multi-NIC bandwidth underutilized. We present RailS, a distributed load-balancing framework that minimizes all-to-all completion time in MoE training. RailS leverages the Rail topology's symmetry to prove that uniform sending ensures uniform receiving, transforming global coordination into local scheduling. Each node independently executes a Longest Processing Time First (LPT) spraying scheduler to proactively balance traffic using local information. RailS activates N parallel rails for fine-grained, topology-aware multipath transmission. Across synthetic and real-world MoE workloads, RailS improves bus bandwidth by 20%--78% and reduces completion time by 17%--78%. For Mixtral workloads, it shortens iteration time by 18%--40% and achieves near-optimal load balance, fully exploiting architectural parallelism in distributed training.

Photonic Rails in ML Datacenters with Opus

arXiv2026-02-13作者：Eric Ding, Barry Lyu, Bhaskar Kataria

Rail-optimized network fabrics have become the de facto datacenter scale-out fabric for large-scale ML training. However, the use of high-radix electrical switches to provide all-to-all connectivity in rails imposes massive power and cost. We propose a rethinking of the rail abstraction by retaining its communication semantics, but realizing it using optical circuit switches. The key challenge is that optical switches support one-to-one connectivity at a time, limiting the fan-out of traffic in ML workloads using hybrid parallelisms. We overcome this through \emph{parallelism-driven rail reconfiguration}, which exploits the non-overlapping communication phases of different parallelism dimensions. This time-multiplexes a single set of physical ports across circuit configurations tailored to each phase within a training iteration. We design and implement Opus, a control plane that orchestrates this in-job reconfiguration of photonic rails at parallelism phase boundaries, and evaluate it on a physical OCS testbed, the Perlmutter supercomputer, and in simulation at up to 2,048 GPUs. Our results show that photonic rails can achieve over $23\times$ network power reduction and $4\times$ c

搜索结果：Rails

RailS: Load Balancing for All-to-All Communication in Distributed Mixture-of-Experts Training

Photonic Rails in ML Datacenters with Opus

RAILS: Retrieval-Augmented Intelligence for Learning Software Development

NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails

RAILS: Risk-Aware Iterated Local Search for Joint SLA Decomposition and Service Provider Management in Multi-Domain Networks

Compliance-Aware Agentic Payments on Stablecoin Rails

StableSleep: Source-Free Test-Time Adaptation for Sleep Staging with Lightweight Safety Rails

Photonic Rails in ML Datacenters

Can One Safety Loop Guard Them All? Agentic Guard Rails for Federated Computing

Droplets sliding on soft solids shed elastocapillary rails

RAILS: A Robust Adversarial Immune-inspired Learning System

Accelerating, guiding, and compressing skyrmions by defect rails

Organ Shape Sensing using Pneumatically Attachable Flexible Rails in Robotic-Assisted Laparoscopic Surgery

Learning to drive from a world on rails

Traffic flow of interacting self-driven particles: rails and trails, vehicles and vesicles

Content Management in Ruby on Rails

Boosted linear-optical measurements on single-rail qubits with unentangled ancillas

Tradeoff between Efficiency and Melting for a High-Performance Electromagnetic Rail Gun

Doubling Qubits in a Trapped-Ion System via Vibrational Dual-Rail Encoding

Non-Destructive Rail Monitoring for Defect Identification