搜索 — ResearchTracker

The quadratic complexity of self-attention during the prefill phase impedes long-context inference in large language models. Existing sparse attention methods face a trade-off among context adaptivity, sampling overhead, and fine-tuning costs. We propose VSPrefill, a mechanism requiring lightweight training that uses the vertical-slash structural pattern in attention distributions. Our compact VSIndexer module predicts context-aware importance scores for vertical columns and slash diagonals from key-value representations augmented with RoPE. This approach constructs sparse masks with linear complexity without modifying the backbone parameters. During inference, an adaptive cumulative-threshold strategy allocates sparsity budgets per layer, while a fused kernel executes attention with on-the-fly index merging. Evaluated on Qwen3-4B-Instruct and LLaMA-3.1-8B-Instruct across the LongBench and RULER benchmarks, VSPrefill preserves 98.35% of the full attention accuracy while delivering a 4.95x average speedup at a context length of 128k. These results establish a new Pareto frontier in the trade-off between accuracy and efficiency.

Demystifying the Slash Pattern in Attention: The Role of RoPE

arXiv2026-01-13作者：Yuan Cheng, Fengzhuo Zhang, Yunlong Hou

Large Language Models (LLMs) often exhibit slash attention patterns, where attention scores concentrate along the $Δ$-th sub-diagonal for some offset $Δ$. These patterns play a key role in passing information across tokens. But why do they emerge? In this paper, we demystify the emergence of these Slash-Dominant Heads (SDHs) from both empirical and theoretical perspectives. First, by analyzing open-source LLMs, we find that SDHs are intrinsic to models and generalize to out-of-distribution prompts. To explain the intrinsic emergence, we analyze the queries, keys, and Rotary Position Embedding (RoPE), which jointly determine attention scores. Our empirical analysis reveals two characteristic conditions of SDHs: (1) Queries and keys are almost rank-one, and (2) RoPE is dominated by medium- and high-frequency components. Under these conditions, queries and keys are nearly identical across tokens, and interactions between medium- and high-frequency components of RoPE give rise to SDHs. Beyond empirical evidence, we theoretically show that these conditions are sufficient to ensure the emergence of SDHs by formalizing them as our modeling assumptions. Particularly, we analyze the trainin

搜索结果：slash

VSPrefill: Vertical-Slash Sparse Attention with Lightweight Indexing for Long-Context Prefilling

Demystifying the Slash Pattern in Attention: The Role of RoPE

Optimal Multilevel Slashing for Blockchains

SLASh: Simulation of LISLs Aboard LEO Satellite Shells

SLASH: Self-Supervised Speech Pitch Estimation Leveraging DSP-derived Absolute Pitch

Bounds on SMEFT affecting multi gauge and Higgs-gauge couplings using two and three body spin correlations in $e^-e^+\to 3l2j\slashed{E}$ process

slash: A Technique for Static Configuration-Logic Identification

PDStream: Slashing Long-Tail Delay in Interactive Video Streaming via Pseudo-Dual Streaming

Stable equivalences with endopermutation source, slash functors, and functorial equivalences

Strong cocomparability graphs and Slash-free orderings of matrices

SLASH: Embracing Probabilistic Circuits into Neural Answer Set Programming

Brauer-friendly modules and slash functors

Planar Heyting Algebras for Children 2: Local Operators, J-Operators, and Slashings

COmPOSER: Circuit Optimization of mm-wave/RF circuits with Performance-Oriented Synthesis for Efficient Realizations

FlashPrefill: Instantaneous Pattern Discovery and Thresholding for Ultra-Fast Long-Context Prefilling

Topology of a uniform spanning tree on a cylinder

Prism: Spectral-Aware Block-Sparse Attention

Distinguishing two dark matter component particles at $e^+e^-$ colliders

Bayesian estimation of a multivariate TAR model when the noise process distribution belongs to the class of Gaussian variance mixtures

On the concept of determinant for the differential operators of Quantum Physics