搜索结果：triggers

共找到 20 条结果

高级筛选 ▾

Removing the Trigger, Not the Backdoor: Alternative Triggers and Latent Backdoors

arXiv2026-03-10作者：Gorka Abad, Ermes Franch, Stefanos Koffas

Current backdoor defenses assume that neutralizing a known trigger removes the backdoor. We show this trigger-centric view is incomplete: \emph{alternative triggers}, patterns perceptually distinct from training triggers, reliably activate the same backdoor. We estimate the alternative trigger backdoor direction in feature space by contrasting clean and triggered representations, and then develop a feature-guided attack that jointly optimizes target prediction and directional alignment. First, we theoretically prove that alternative triggers exist and are an inevitable consequence of backdoor training. Then, we verify this empirically. Additionally, defenses that remove training triggers often leave backdoors intact, and alternative triggers can exploit the latent backdoor feature-space. Our findings motivate defenses targeting backdoor directions in representation space rather than input-space triggers.

Universal Adversarial Triggers

arXiv2026-05-18作者：Benedict Florance Arockiaraj, Alexander Feng, Jianxiong Cai

Recent works have illustrated that modern NLP models trained for diverse tasks ranging from sentiment analysis to language generation succumb to universal adversarial attacks, a class of input-agnostic attacks where a common trigger sequence is used to attack the model. Although these attacks are successful, the triggers generated by such attacks are ungrammatical and unnatural. Our work proposes a novel technique combining parts-of-speech filtering and perplexity based loss function to generate sensible triggers that are closer to natural phrases. For the task of sentiment analysis on the SST dataset, the method produces sensible triggers that achieve accuracies as low as 0.04 and 0.12 for flipping positive to negative predictions and vice-versa. To build robust models, we also perform adversarial training using the generated triggers that increases the accuracy of the model from 0.12 to 0.48. We aim to illustrate that adversarial attacks can be made difficult to detect by generating sensible triggers, and to facilitate robust model development through relevant defenses.

搜索结果：triggers

Removing the Trigger, Not the Backdoor: Alternative Triggers and Latent Backdoors

Universal Adversarial Triggers

The Trigger in the Haystack: Extracting and Reconstructing LLM Backdoor Triggers

Are Triggers Needed for Document-Level Event Extraction?

PG-Triggers: Triggers for Property Graphs

DECADE: Decorrelated anomaly detection triggers to enhance the low-mass discovery potential of the LHC

Weak Scale Triggers in the SMEFT

Is It Possible to Backdoor Face Forgery Detection with Natural Triggers?

Backdoor Attack with Invisible Triggers Based on Model Architecture Modification

From Shortcuts to Triggers: Backdoor Defense with Denoised PoE

Backdoor Attacks with Input-unique Triggers in NLP

Why Do You Feel This Way? Summarizing Triggers of Emotions in Social Media Posts

Versatile Backdoor Attack with Visible, Semantic, Sample-Specific, and Compatible Triggers

Unsupervised Extractive Summarization of Emotion Triggers

MINIMAL: Mining Models for Data Free Universal Adversarial Triggers

Cause or Trigger? From Philosophy to Causal Modeling

GREAT: Generalizable Backdoor Attacks in RLHF via Emotion-Aware Trigger Synthesis

BadLLM-TG: A Backdoor Defender powered by LLM Trigger Generator

Multi-Trigger Poisoning Amplifies Backdoor Vulnerabilities in LLMs

Investigating Adversarial Trigger Transfer in Large Language Models