搜索 — ResearchTracker

Retrieval systems often fail when user queries differ stylistically or semantically from the language used in domain documents. Query rewriting has been proposed to bridge this gap, improving retrieval by reformulating user queries into semantically equivalent forms. However, most existing approaches overlook the stylistic characteristics of target documents-their domain-specific phrasing, tone, and structure-which are crucial for matching real-world data distributions. We introduce a retrieval feedback-driven dataset generation framework that automatically identifies failed retrieval cases, leverages large language models to rewrite queries in the style of relevant documents, and verifies improvement through re-retrieval. The resulting corpus of (original, rewritten) query pairs enables the training of rewriter models that are explicitly aware of document style and retrieval feedback. This work highlights a new direction in data-centric information retrieval, emphasizing how feedback loops and document-style alignment can enhance the reasoning and adaptability of RAG systems in real-world, domain-specific contexts.

DS@GT at TREC TOT 2025: Bridging Vague Recollection with Fusion Retrieval and Learned Reranking

arXiv2026-01-21作者：Wenxin Zhou, Ritesh Mehta, Anthony Miyaguchi

We develop a two-stage retrieval system that combines multiple complementary retrieval methods with a learned reranker and LLM-based reranking, to address the TREC Tip-of-the-Tongue (ToT) task. In the first stage, we employ hybrid retrieval that merges LLM-based retrieval, sparse (BM25), and dense (BGE-M3) retrieval methods. We also introduce topic-aware multi-index dense retrieval that partitions the Wikipedia corpus into 24 topical domains. In the second stage, we evaluate both a trained LambdaMART reranker and LLM-based reranking. To support model training, we generate 5000 synthetic ToT queries using LLMs. Our best system achieves recall of 0.66 and NDCG@1000 of 0.41 on the test set by combining hybrid retrieval with Gemini-2.5-flash reranking, demonstrating the effectiveness of fusion retrieval.

搜索结果：Retrieval

ReFeed: Retrieval Feedback-Guided Dataset Construction for Style-Aware Query Rewriting

DS@GT at TREC TOT 2025: Bridging Vague Recollection with Fusion Retrieval and Learned Reranking

When to Retrieve During Reasoning: Adaptive Retrieval for Large Reasoning Models

Spike Hijacking in Late-Interaction Retrieval

Retrieval Models Aren't Tool-Savvy: Benchmarking Tool Retrieval for Large Language Models

Scalable Music Cover Retrieval Using Lyrics-Aligned Audio Embeddings

IGMiRAG: Intuition-Guided Retrieval-Augmented Generation with Adaptive Mining of In-Depth Memory

Utilizing Metadata for Better Retrieval-Augmented Generation

LLandMark: A Multi-Agent Framework for Landmark-Aware Multimodal Interactive Video Retrieval

Reconstructing Context: Evaluating Advanced Chunking Strategies for Retrieval-Augmented Generation

Task-Adaptive Retrieval over Agentic Multi-Modal Web Histories via Learned Graph Memory

Identity-Decoupled Anonymization for Visual Evidence in Multi-modal Retrieval-Augmented Generation

Constrained Auto-Regressive Decoding Constrains Generative Retrieval

Lost in Decoding? Reproducing and Stress-Testing the Look-Ahead Prior in Generative Retrieval

RAGPart &amp; RAGMask: Retrieval-Stage Defenses Against Corpus Poisoning in Retrieval-Augmented Generation

Do the Findings of Document and Passage Retrieval Generalize to the Retrieval of Responses for Dialogues?

A Surprisingly Simple yet Effective Multi-Query Rewriting Method for Conversational Passage Retrieval

How important is Recall for Measuring Retrieval Quality?

FiCo-ITR: bridging fine-grained and coarse-grained image-text retrieval for comparative performance analysis

Satisfactory Medical Consultation based on Terminology-Enhanced Information Retrieval and Emotional In-Context Learning

RAGPart & RAGMask: Retrieval-Stage Defenses Against Corpus Poisoning in Retrieval-Augmented Generation