搜索 — ResearchTracker

Despite their success, Large Vision-Language Models (LVLMs) remain vulnerable to hallucinations. While existing studies attribute the cause of hallucinations to insufficient visual attention to image tokens, our findings indicate that hallucinations also arise from interference from instruction tokens during decoding. Intuitively, certain instruction tokens continuously distort LVLMs' visual perception during decoding, hijacking their visual attention toward less discriminative visual regions. This distortion prevents them integrating broader contextual information from images, ultimately leading to hallucinations. We term this phenomenon 'Attention Hijacking', where disruptive instruction tokens act as 'Attention Hijackers'. To address this, we propose a novel, training-free strategy namely Attention HIjackers Detection and Disentanglement (AID), designed to isolate the influence of Hijackers, enabling LVLMs to rely on their context-aware intrinsic attention map. Specifically, AID consists of three components: First, Attention Hijackers Detection identifies Attention Hijackers by calculating instruction-driven visual salience. Next, Attention Disentanglement mechanism is proposed

Vera Verto: Multimodal Hijacking Attack

arXiv2024-07-31作者：Minxing Zhang, Ahmed Salem, Michael Backes

The increasing cost of training machine learning (ML) models has led to the inclusion of new parties to the training pipeline, such as users who contribute training data and companies that provide computing resources. This involvement of such new parties in the ML training process has introduced new attack surfaces for an adversary to exploit. A recent attack in this domain is the model hijacking attack, whereby an adversary hijacks a victim model to implement their own -- possibly malicious -- hijacking tasks. However, the scope of the model hijacking attack is so far limited to the homogeneous-modality tasks. In this paper, we transform the model hijacking attack into a more general multimodal setting, where the hijacking and original tasks are performed on data of different modalities. Specifically, we focus on the setting where an adversary implements a natural language processing (NLP) hijacking task into an image classification model. To mount the attack, we propose a novel encoder-decoder based framework, namely the Blender, which relies on advanced image and language models. Experimental results show that our modal hijacking attack achieves strong performances in different

搜索结果：hijack

Attention Hijackers: Detect and Disentangle Attention Hijacking in LVLMs for Hallucination Mitigation

Vera Verto: Multimodal Hijacking Attack

Reasoning Hijacking: The Fragility of Reasoning Alignment in Large Language Models

Osmosis Distillation: Model Hijacking with the Fewest Samples

Unreal Thinking: Chain-of-Thought Hijacking via Two-stage Backdoor

Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection

SnatchML: Hijacking ML models without Training Access

Is Crunching Public Data the Right Approach to Detect BGP Hijacks?

Model Hijacking Attack in Federated Learning

Automating Agent Hijacking via Structural Template Injection

Phi: Preference Hijacking in Multi-modal Large Language Models at Inference Time

CAMH: Advancing Model Hijacking Attack in Machine Learning

Universal Jailbreak Suffixes Are Strong Attention Hijackers

Beyond Crash: Hijacking Your Autonomous Vehicle for Fun and Profit

Breaking MCP with Function Hijacking Attacks: Novel Threats for Function Calling and Agentic Models

TRAP: Hijacking VLA CoT-Reasoning via Adversarial Patches

Goal Hijacking Attack on Large Language Models via Pseudo-Conversation Injection

On the Robustness of Transformers against Context Hijacking for Linear Classification

Semantic Router: On the Feasibility of Hijacking MLLMs via a Single Adversarial Perturbation

HijackRAG: Hijacking Attacks against Retrieval-Augmented Large Language Models