搜索结果：Simple

共找到 20 条结果

高级筛选 ▾

SimpleGPT: Improving GPT via A Simple Normalization Strategy

arXiv

In this work, we revisit Transformer optimization through the lens of second-order geometry and establish a direct connection between architectural design, activation scale, the Hessian matrix, and the maximum tolerable learning rate. We introduce a simple normalization strategy, termed SimpleNorm, which stabilizes intermediate activation scales by construction. Then, by analyzing the Hessian of the loss with respect to network activations, we theoretically show that SimpleNorm significantly reduces the spectral norm of the Hessian, thereby permitting larger stable learning rates. We validate our theoretical findings through extensive experiments on large GPT models at parameter scales 1B, 1.4B, 7B and 8B. Empirically, SimpleGPT, our SimpleNorm-based network, tolerates learning rates 3$\times$-10$\times$ larger than standard convention, consistently demonstrates strong optimization stability, and achieves substantially better performance than well-established baselines. Specifically, when training 7B-scale models for 60K steps, SimpleGPT achieves a training loss that is 0.08 lower than that of LLaMA2 with QKNorm, reducing the loss from 2.290 to 2.208. Our source code will be releas

A Simple Baseline for Streaming Video Understanding

arXiv2026-04-02作者：Yujiao Shen, Shulin Tian, Jingkang Yang

Recent streaming video understanding methods increasingly rely on complex memory mechanisms to handle long video streams. We challenge this trend with a simple finding: a sliding-window baseline that feeds only the most recent N frames to an off-the-shelf VLM already matches or surpasses published streaming models. We formalize this baseline as SimpleStream and evaluate it against 13 major offline and online video LLM baselines on OVO-Bench and StreamingBench. Despite its simplicity, SimpleStream delivers consistently strong performance. With only 4 recent frames, it reaches 67.7% average accuracy on OVO-Bench and 80.59% on StreamingBench. Controlled ablations further show that the value of longer context is backbone-dependent rather than uniformly increasing with model scale, and reveal a consistent perception-memory trade-off: adding more historical context can improve recall, but often weakens real-time perception. This suggests that stronger memory, retrieval, or compression modules should not be taken as evidence of progress unless they clearly outperform SimpleStream under the same protocol. We therefore argue that future streaming benchmarks should separate recent-scene perc

搜索结果：Simple

SimpleGPT: Improving GPT via A Simple Normalization Strategy

A Simple Baseline for Streaming Video Understanding

A New Simple-to-Configure Self-Perturbing Multivariable Extremum-Seeking Controller

Simple Tilings of Nilpotent Lie Groups

A New Overture to Classical Simple Type Theory, Ketonen-type Gentzen and Tableau Systems

An algorithm for accurate and simple-looking metaphorical maps

A Simple Deterministic Reduction From Gomory-Hu Tree to Maxflow and Expander Decomposition

Classification of simple quandles of small order

How simple can you go? An off-the-shelf transformer approach to molecular dynamics

A simple LAD-LASSO coordinate descent algorithm for interactive browser-based GPU applications

SimpleFusion: A Simple Fusion Framework for Infrared and Visible Images

Optimal Offline ORAM with Perfect Security via Simple Oblivious Priority Queues

Comparison of compression vs shearing near jamming, for a simple model of athermal frictionless disks in suspension

A Simple Framework for Open-Vocabulary Segmentation and Detection

A Simple and Efficient Baseline for Data Attribution on Images

Recognizing Simple-Triangle Graphs by Restricted 2-Chain Subgraph Cover

On groups of smooth maps into a simple compact Lie group, revisited

Posets arising as 1-skeleta of simple polytopes, the nonrevisiting path conjecture, and poset topology

PL 4-manifolds admitting simple crystallizations: framed links and regular genus

$L^1$-flat polynomials and simple Lebesgue spectrum for conservative maps exist: A simple proof