搜索 — ResearchTracker

User preferences evolve across months of interaction, and tracking them requires inferring when a stated preference has been changed by a subsequent life event. We define this problem as long-horizon personalization and observe that progress on it is limited by data availability and measurement, with no existing resource providing both naturalistic long-horizon interactions and the ground-truth provenance needed to diagnose why models fail. We introduce a data generator that produces conversations from a structured mental state graph, yielding ground-truth provenance for every preference change across 6-month timelines, and from it construct HorizonBench, a benchmark of 4,245 items from 360 simulated users with 6-month conversation histories averaging ~4,300 turns and ~163K tokens. HorizonBench provides a testbed for long-context modeling, memory-augmented architectures, theory-of-mind reasoning, and user modeling. Across 25 frontier models, the best model reaches 52.8% and most score at or below the 20% chance baseline. When these models err on evolved preferences, over a third of the time they select the user's originally stated value without tracking the updated user state. This

Why Do Few-Step Text Latents Fail When Image Latents Work? Non-Commitment at Sharp Categorical Readouts

arXiv2026-06-29作者：Zhongyao Wang

Deterministic few-step generation succeeds on continuous image latents but collapses to incoherent text on continuous text latents, and we show the cause is geometric rather than a training or scaling deficiency: a smooth, regularity-limited deterministic map cannot resolve a discrete branch choice before a sharp categorical readout, so few-step failure is governed by decoder sharpness, not transport accuracy. In the overlapping regime of real text autoencoders, we prove (Theorem 3) that the posterior-mean terminal step flips tokens at the rate of the latent mass in an $O(s(t))$ tube around decision boundaries. Two diagnostics, DABI (readout sharpness) and CCI (categorical commitment), measured on published checkpoints show that four independently built continuous-text decoders amplify a boundary-aligned perturbation far beyond a norm-matched isotropic one (DABI from $5\times10^{2}$ to $>10^{5}$), while image decoders have DABI $\approx 1$. Two mechanisms escape the continuous bound: categorical commitment (autoregressive decoders succeed despite sharper readouts) and stochastic re-injection (deterministic ODE at $K=4$ gives PPL 294 versus SDE 50 on the same model). In the ideal

搜索结果：zhongyao

HorizonBench: Long-Horizon Personalization with Evolving Preferences

Why Do Few-Step Text Latents Fail When Image Latents Work? Non-Commitment at Sharp Categorical Readouts

Reinforcement Learning-Enabled Agent for Transmitter Optimization in Digital-Analog Radio-over-Fiber Fronthaul

CharacterFlywheel: Scaling Iterative Improvement of Engaging and Steerable LLMs in Production

Seeing Across Skies and Streets: Feedforward 3D Reconstruction from Satellite, Drone, and Ground Images

Self-Adaptive Scale Handling for Forecasting Time Series with Scale Heterogeneity

A unified resource-pool architecture for high-dimensional direct-detection optical communication

Equivariant Evidential Deep Learning for Interatomic Potentials

Privacy-Aware State Estimation: From Coarse to Precise Privacy Protection

How Humans Help LLMs: Assessing and Incentivizing Human Preference Annotators

Multilineage-differentiating stress-enduring cells alleviate neuropathic pain in mice by secreting TGF-b and IL-10

Covariance-Intersection-based Distributed Kalman Filtering: Stability Problems Revisited

A Physics-Informed Neural Network Framework for Simulating Creep Buckling in Growing Viscoelastic Biological Tissues

Global-Aware Monocular Semantic Scene Completion with State Space Models

Real-Time Distributed Optical Fiber Vibration Recognition via Extreme Lightweight Model and Cross-Domain Distillation

Iterative Pretraining Framework for Interatomic Potentials

Incentivizing High-Quality Human Annotations with Golden Questions

Multi-period Learning for Financial Time Series Forecasting

Refine-and-Contrast: Adaptive Instance-Aware BEV Representations for Multi-UAV Collaborative Object Detection

Energy-based physics-informed neural network for frictionless contact problems under large deformation