搜索 — ResearchTracker

Ontologies enable scalable energy services in buildings by supporting interoperability and automation. Project Haystack is a building ontology that is widely adopted due to its flexible, tag-based semantic model, openness, and extensibility, but suffers from ambiguous tag usage and limited automated validation. Although Project Haystack is formally open, its reliance on custom file formats and domain-specific languages that originate from the Haxall ecosystem creates a de facto barrier to integration. In this paper, we address these limitations by introducing a Python-based toolchain for Haystack. We present (i) a parser for Haystack definition files (Trio file format), and (ii) a code generator that derives Pydantic models and JSON Schema definitions from these parsed specifications. The resulting models enable static type checking and enable structural validation of Haystack grids within Python, as well as schema-based validation of JSON representations outside the Python ecosystem. All tools, generated models, and schemas are released publicly under an open-source license, with the goal of strengthening the Haystack ecosystem and opening a practical pathway beyond its current te

Haystack Engineering: Context Engineering for Heterogeneous and Agentic Long-Context Evaluation

arXiv2025-10-08作者：Mufei Li, Dongqi Fu, Limei Wang

Modern long-context large language models (LLMs) perform well on synthetic "needle-in-a-haystack" (NIAH) benchmarks, but such tests overlook how noisy contexts arise from biased retrieval and agentic workflows. We argue that haystack engineering is necessary to construct noisy long contexts that faithfully capture key real-world factors -- distraction from heterogeneous biased retrievers and cascading errors in agentic workflows -- to test models' long-context robustness. We instantiate it through HaystackCraft, a new NIAH benchmark built on the full English Wikipedia hyperlink network with multi-hop questions. HaystackCraft evaluates how heterogeneous retrieval strategies (e.g., sparse, dense, hybrid, and graph-based) affect distractor composition, haystack ordering, and downstream LLM performance. HaystackCraft further extends NIAH to dynamic, LLM-dependent settings that simulate agentic operations, where models refine queries, reflect on their past reasonings, and decide when to stop. Experiments with 15 long-context models show that (1) while stronger dense retrievers can introduce more challenging distractors, graph-based reranking simultaneously improves retrieval effectivene

搜索结果：Haystack

Type Checking Project Haystack Grids using JSON Schema and Pydantic

Haystack Engineering: Context Engineering for Heterogeneous and Agentic Long-Context Evaluation

Invoice Haystack: Benchmarking Document Retrieval and Visual Question Answering Under Strong Visual Homogeneity

Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems

Needle in a Haystack: Tracking UAVs from Massive Noise in Real-World 5G-A Base Station Data

Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents

Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark

Finding Needles in the Haystack: Transductive Active Labeling in Ecology

GOLD PANNING: Strategic Context Shuffling for Needle-in-Haystack Reasoning

Stress-Testing Long-Context Language Models with Lifelong ICL and Task Haystack

Humans Hallucinate Too: Language Models Identify and Correct Subjective Annotation Errors With Label-in-a-Haystack Prompts

Reasoning on Multiple Needles In A Haystack

Semantic Needles in Document Haystacks: Sensitivity Testing of LLM-as-a-Judge Similarity Scoring

Finding Needles in Emb(a)dding Haystacks: Legal Document Retrieval via Bagging and SVR Ensembles

TS-Haystack: A Multi-Task Retrieval Benchmark for Long-Context Time-Series Reasoning

Jailbreaking in the Haystack

Beyond Needle(s) in the Embodied Haystack: Environment, Architecture, and Training Considerations for Long Context Reasoning

U-NIAH: Unified RAG and LLM Evaluation for Long Context Needle-In-A-Haystack

Hidden in the Haystack: Smaller Needles are More Difficult for LLMs to Find

Large Language Models Struggle to Describe the Haystack without Human Help: Human-in-the-loop Evaluation of Topic Models