搜索 — ResearchTracker

Deploying LLMs in multi-turn dialogues facilitates jailbreak attacks that distribute harmful intent across seemingly benign turns. Recent training-based multi-turn jailbreak methods learn long-horizon attack strategies from interaction feedback, but often rely on coarse trajectory-level outcome signals that broadcast uniformly to every turn. However, we find that turn-level contributions in multi-turn jailbreaking are non-uniform, phase-dependent, and target-specific. Such coarse outcome supervision induces a credit assignment problem, leading to over-rewarding redundant turns in successful trajectories and under-crediting useful intermediate turns in failed ones. To address this, we propose TRACE, a turn-aware credit assignment framework for reinforcement learning (RL)-based multi-turn jailbreaking. For successful trajectories, TRACE estimates turn-level contributions via leave-one-turn-out semantic masking; for failed ones, TRACE assigns penalties based on prompt harmfulness and semantic relevance, with an additional local refusal-aware penalty. Furthermore, we reuse the attack-side credit signal for multi-turn defense alignment. Extensive experiments on open-source and closed-so

Confidence Should Be Calibrated More Than One Turn Deep

arXiv2026-04-07作者：Zhaohan Zhang, Chengzhengxu Li, Xiaoming Liu

Large Language Models (LLMs) are increasingly applied in high-stakes domains such as finance, healthcare, and education, where reliable multi-turn interactions with users are essential. However, existing work on confidence estimation and calibration, a major approach to building trustworthy LLM systems, largely focuses on single-turn settings and overlooks the risks and potential of multi-turn conversations. In this work, we introduce the task of multi-turn calibration to reframe calibration from a static property into a dynamic challenge central to reliable multi-turn conversation, where calibrating model confidence at each turn conditioned on the conversation history is required. We first reveal the risks of this setting: using Expected Calibration Error at turn T (ECE@T), a new metric that tracks calibration dynamics over turns, we show that user feedback (e.g., persuasion) can degrade multi-turn calibration. To address this, we propose MTCal, which minimises ECE@T via a surrogate calibration target, and further leverage calibrated confidence in ConfChat, a decoding strategy that improves both factuality and consistency of the model response in multi-turn interactions. Extensive

搜索结果：turn

Not All Turns Matter: Credit Assignment for Multi-Turn Jailbreaking

Confidence Should Be Calibrated More Than One Turn Deep

Easy Turn: Integrating Acoustic and Linguistic Modalities for Robust Turn-Taking in Full-Duplex Spoken Dialogue Systems

TED: Turn Emphasis with Dialogue Feature Attention for Emotion Recognition in Conversation

Another Turn, Better Output? A Turn-Wise Analysis of Iterative LLM Prompting

One Turn Too Late: Response-Aware Defense Against Hidden Malicious Intent in Multi-Turn Dialogue

CAMP: Cumulative Agentic Masking and Pruning for Privacy Protection in Multi-Turn LLM Conversations

A$^2$TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping

Empowering Multi-Turn Tool-Integrated Agentic Reasoning with Group Turn Policy Optimization

Frenet turns

Discrete turn strategies emerge in information-limited navigation

Turn: A Language for Agentic Computation

Speak Out of Turn: Safety Vulnerability of Large Language Models in Multi-turn Dialogue

The Omega Turn: A General Turning Template for Elongate Robots

Ultra slow-turn inflation

Beyond the Assistant Turn: User Turn Generation as a Probe of Interaction Awareness in Language Models

FlowKV: Enhancing Multi-Turn Conversational Coherence in LLMs via Isolated Key-Value Cache Management

Building Math Agents with Multi-Turn Iterative Preference Learning

Your Turn: At Home Turning Angle Estimation for Parkinson's Disease Severity Assessment

A Static and Dynamic Attention Framework for Multi Turn Dialogue Generation