搜索 — ResearchTracker

Current VLMs have demonstrated capabilities across a wide range of multimodal tasks. Typically, in a pretrained VLM, all layers are engaged by default to make predictions on downstream tasks. We find that intervening on a single layer, such as by zeroing its parameters, can improve the performance on certain tasks, indicating that some layers hinder rather than help downstream tasks. We systematically investigate how individual layers influence different tasks via layer intervention. Specifically, we measure the change in performance relative to the base model after intervening on each layer and observe improvements when bypassing specific layers. This improvement can be generalizable across models and datasets, indicating the presence of Task-Interfering Layers that harm downstream tasks' performance. We introduce Task-Layer Interaction Vector, which quantifies the effect of intervening on each layer of a VLM given a task. These task-interfering layers exhibit task-specific sensitivity patterns: tasks requiring similar capabilities show consistent response trends under layer interventions, as evidenced by the high similarity in their task-layer interaction vectors. Inspired by the

Transformer Layers as Painters

arXiv2024-07-12作者：Qi Sun, Marc Pickett, Aakash Kumar Nain

Despite their nearly universal adoption for large language models, the internal workings of transformers are not well understood. We aim to better understand the impact of removing or reorganizing information throughout the layers of a pretrained transformer. Such an understanding could both yield better usage of existing models as well as to make architectural improvements to produce new variants. We present a series of empirical studies on frozen models that show that the lower and final layers of pretrained transformers differ from middle layers, but that middle layers have a surprising amount of uniformity. We further show that some classes of problems have robustness to skipping layers, running the layers in an order different from how they were trained, or running the layers in parallel. Our observations suggest that even frozen pretrained models may gracefully trade accuracy for latency by skipping layers or running layers in parallel.

搜索结果：Layers

Do All Individual Layers Help? An Empirical Study of Task-Interfering Layers in Vision-Language Models

Transformer Layers as Painters

Fast Clifford Neural Layers

Skip a Layer or Loop It? Learning Program-of-Layers in LLMs

Safety Layers in Aligned Large Language Models: The Key to LLM Security

The Unreasonable Ineffectiveness of the Deeper Layers

Multichannel Orthogonal Transform-Based Perceptron Layers for Efficient ResNets

Unsupervised Learning Layers for Video Analysis

Instability of shear layers and Prandtl's boundary layers

Memory Layers at Scale

UI Layers Merger: Merging UI layers via Visual Learning and Boundary Prior

My model, it has three layers: a reduced model of the smectic transition in two dimensions

Colossal Layer Nernst Effect in Twisted Moiré Layers

Layers and stability

Layers of classicality in the compatibility of measurements

Not all layers are equally as important: Every Layer Counts BERT

Rethinking the adaptive relationship between Encoder Layers and Decoder Layers

Magnetization of ultrathin (Ga,Mn)As layers

Beyond Layer Importance in Layer-wise Sparsity: An Inter-Layer Perturbation-Absorption Perspective

Layer by Layer: Uncovering Hidden Representations in Language Models