搜索结果：Before

共找到 20 条结果

排序：按相关性按时间按热度

来源：全部 arXiv PubMed OpenAlex 新闻/报道

高级筛选 ▾

Parenteral nutrition before gastrointestinal surgery.

PubMed1982-03-20作者：Tweedle De

暂无摘要（点击查看原文获取完整内容）

Nutrition and Dietetics Pulmonary and Respiratory Medicine

Refusal Before Decoding: Detecting and Exploiting Refusal Signals in Intermediate LLM Activations

arXiv2026-05-27作者：Matteo Gioele Collu, Riccardo Conte, Alberto Giaretta

In this paper, we investigate whether refusal behavior can be predicted from LLM intermediate activations before decoding using linear probes trained on residual stream activations at each transformer block. We find that refusal is linearly decodable well before the final layer, indicating that safety-relevant behavior is represented in intermediate activations before output generation. To test whether this signal is actionable, we introduce Mechanistic AutoDAN, a probe-guided variant of AutoDAN that replaces full-model fitness evaluation with partial forward passes and probe-based scoring inside a genetic prompt search loop. Across the evaluated models, our method achieves attack success rates competitive with vanilla AutoDAN while reducing per-iteration search time by up to 72%, and probe-guided prompts match or exceed AutoDAN's cross-model transfer in several configurations. We further find that the usefulness of probe guidance increases with model scale. Our results show that refusal is not only observable at the output level, but is encoded as a structured and actionable signal in intermediate LLM activations.

搜索结果：Before

Parenteral nutrition before gastrointestinal surgery.

Refusal Before Decoding: Detecting and Exploiting Refusal Signals in Intermediate LLM Activations

Train-before-Test Harmonizes Language Model Rankings

Preference for redistribution and institutional trust: Comparison before and after COVID-19

AI-exposed jobs deteriorated before ChatGPT

Marginals Before Conditionals

What Information Should Be Shared with Whom "Before and During Training"?

MuonEq: Balancing Before Orthogonalization with Lightweight Equilibration

NLP Case Study on Predicting the Before and After of the Ukraine-Russia and Hamas-Israel Conflicts

Knowing Before Saying: LLM Representations Encode Information About Chain-of-Thought Success Before Completion

On the mapping after and before truncation in the boson expansion theory

Inhomogeneous stellar mixing in the final hours before the Cassiopeia A supernova

GTP before ATP: The energy currency at the origin of genes

Optical spectropolarimetry of binary asteroid Didymos-Dimorphos before and after the DART impact

Before-after safety analysis of a shared space implementation

Higgs-like field interactions before symmetry breaking

Coronal Behavior Before the Large Flare Onset

Time and nonlocal realism: Consequences of the before-before experiment

Weather conditions several hours before the strong earthquake

Neural Network Panning: Screening the Optimal Sparse Network Before Training