搜索 — ResearchTracker

Although Large Language Models (LLMs) have demonstrated strong ability, they are further supposed to be controlled and guided by in real-world scenarios to be safe, accurate, and intelligent. This demands the possession of capability of LLMs. However, no prior work has made a clear evaluation of the inferential rule-following capability of LLMs. Previous studies that try to evaluate the inferential rule-following capability of LLMs fail to distinguish the inferential rule-following scenarios from the instruction-following scenarios. Therefore, this paper first clarifies the concept of inferential rule-following and proposes a comprehensive benchmark, RuleBench, to evaluate a diversified range of inferential rule-following abilities. Our experimental results on a variety of LLMs show that they are still limited in following rules. Our analysis based on the evaluation results provides insights into the improvements for LLMs toward a better inferential rule-following intelligent agent. We further propose Inferential Rule-Following Tuning (IRFT). The experimental results show that through IRFT, LLMs can learn abstract rule-following abilities from purely synthetic data and then general

RIFT: Reordered Instruction Following Testbed To Evaluate Instruction Following in Singular Multistep Prompt Structures

arXiv2026-01-26作者：Andrew Jaffe, Noah Reicin, Jinho D. Choi

Large Language Models (LLMs) are increasingly relied upon for complex workflows, yet their ability to maintain flow of instructions remains underexplored. Existing benchmarks conflate task complexity with structural ordering, making it difficult to isolate the impact of prompt topology on performance. We introduce RIFT, Reordered Instruction Following Testbed, to assess instruction following by disentangling structure from content. Using rephrased Jeopardy! question-answer pairs, we test LLMs across two prompt structures: linear prompts, which progress sequentially, and jumping prompts, which preserve identical content but require non-sequential traversal. Across 10,000 evaluations spanning six state-of-the-art open-source LLMs, accuracy dropped by up to 72% under jumping conditions (compared to baseline), revealing a strong dependence on positional continuity. Error analysis shows that approximately 50% of failures stem from instruction-order violations and semantic drift, indicating that current architectures internalize instruction following as a sequential pattern rather than a reasoning skill. These results reveal structural sensitivity as a fundamental limitation in current a

搜索结果：Following

Beyond Instruction Following: Evaluating Inferential Rule Following of Large Language Models

RIFT: Reordered Instruction Following Testbed To Evaluate Instruction Following in Singular Multistep Prompt Structures

WildIFEval: Instruction Following in the Wild

Musical Score Following using Statistical Inference

Generalizing Verifiable Instruction Following

Multilingual Long-Form Speech Instruction Following: KIT's Submission to IWSLT 2026

Analysis of the long-term behavior of the "Bando--follow-the-leader'' car-following model

Multi-Modal Gaze Following in Conversational Scenarios

Leader-Follower Identification with Vehicle-Following Calibration for Non-Lane-Based Traffic

Training with Pseudo-Code for Instruction Following

Follow Everything: A Leader-Following and Obstacle Avoidance Framework with Goal-Aware Adaptation

EP 250108a/SN 2025kg: Observations of the most nearby Broad-Line Type Ic Supernova following an Einstein Probe Fast X-ray Transient

Situated Instruction Following

Follow-Bench: A Unified Motion Planning Benchmark for Socially-Aware Robot Person Following

Boosting Instruction Following at Scale

Instruction Following without Instruction Tuning

Attentive Reasoning Queries: A Systematic Method for Optimizing Instruction-Following in Large Language Models

Evaluating the Instruction-Following Robustness of Large Language Models to Prompt Injection

KCIF: Knowledge-Conditioned Instruction Following

Robot Person Following Under Partial Occlusion