搜索 — ResearchTracker

Self-improvement at scale has been a longstanding goal for reasoning models, and there are two natural places to do it: at test time, through verification-refinement (V-R) loops; and at training time, through self-training methods. Both are gated by the same bottleneck: the verifier. V-R loops stall when verifier scores inflate while accuracy stagnates, and when feedback is too generic to act on; self-training fails similarly when bad self-generated data are added to training. Better verification would unlock both, but the capability we want to train, i.e., catching self-generated errors, lacks training signal. To address this challenge, we propose self-trained verification (STV). Our key observation is that, while a model cannot catch these errors alone, it can when shown the reference solution. We turn this asymmetry into a supervision target and train the verifier to imitate a more informed version of itself. At test time, STV substantially improves V-R loops on hard problems, while alternatives (e.g., SFT, RL on verifier scores, and even meta-verifiers) do not. STV roughly doubles accuracy on hard math and lifts it 14x on scientific reasoning tasks (1.5% to 21%). At training ti

Training Dynamics Impact Post-Training Quantization Robustness

arXiv2025-10-07作者：Albert Catalan-Tatjer, Niccolò Ajroldi, Jonas Geiping

While post-training quantization is widely adopted for efficient deployment of large language models, the mechanisms underlying quantization robustness remain unclear. We conduct a comprehensive analysis of quantization degradation across open-source language model training trajectories up to 32B parameters and 15T training tokens to accurately assess the relationship between training dynamics and quantization performance. Our key finding is that quantization errors in large-scale training runs are driven by a complex interplay between learning rate and other training hyperparameters. Specifically, once learning rates decay, validation loss and quantization error diverge, largely independent of training data scale. To investigate interventions on the training dynamics and identify specific configurations that can modulate quantization robustness favorably, we train our own models in controlled experiments up to 100B tokens. Our results challenge the assumption that increasing dataset scale inherently compromises quantization effectiveness, demonstrating instead that strategic training hyperparameter interventions can improve quantization quality at scale.

搜索结果：Train

Self-Trained Verification for Training- and Test-Time Self-Improvement

Training Dynamics Impact Post-Training Quantization Robustness

Comparing Unit Trains versus Manifest Trains for the Risk of Rail Transport of Hazardous Materials -- Part I: Risk Analysis Methodology

Tucker Tensor Train Taylor Series

pyTRAIN -- a modern TRAIN implementation

UnifiedNN: Efficient Neural Network Training on the Cloud

Train Unit Scheduling Optimization Considering Unit Ordering

Design Tasks and Their Complexity for the European Train Control System with Hybrid Train Detection

Regurgitative Training: The Value of Real Data in Training Large Language Models

Replica Tensor Train

Fixing the train-test resolution discrepancy

Train Tracks with Gaps: Applying the Probabilistic Method to Trains

Normalized tensor train decomposition

Utilizing Evolution Strategies to Train Transformers in Reinforcement Learning

Measuring Train Driver Performance as Key to Approval of Driverless Trains

The Influence of Macroscopic Pedestrian Structures on Train Boarding Efficiency

Topology-preserving Adversarial Training for Alleviating Natural Accuracy Degradation

Selfish Sparse RNN Training

A New Method for Inserting Train Paths into a Timetable

Real-time control of metro train dynamics with minimization of train time-headway variance