搜索 — ResearchTracker

Long-term memory is significant for agents, in which insights play a crucial role. However, the emergence of irrelevant insight and the lack of general insight can greatly undermine the effectiveness of insight. To solve this problem, in this paper, we introduce Multi-Scale Insight Agent (MSI-Agent), an embodied agent designed to improve LLMs' planning and decision-making ability by summarizing and utilizing insight effectively across different scales. MSI achieves this through the experience selector, insight generator, and insight selector. Leveraging a three-part pipeline, MSI can generate task-specific and high-level insight, store it in a database, and then use relevant insight from it to aid in decision-making. Our experiments show that MSI outperforms another insight strategy when planning by GPT3.5. Moreover, We delve into the strategies for selecting seed experience and insight, aiming to provide LLM with more useful and relevant insight for better decision-making. Our observations also indicate that MSI exhibits better robustness when facing domain-shifting scenarios.

InsightEval: An Expert-Curated Benchmark for Assessing Insight Discovery in LLM-Driven Data Agents

arXiv2025-11-28作者：Zhenghao Zhu, Yuanfeng Song, Xin Chen

Data analysis has become an indispensable part of scientific research. To discover the latent knowledge and insights hidden within massive datasets, we need to perform deep exploratory analysis to realize their full value. With the advent of large language models (LLMs) and multi-agent systems, more and more researchers are making use of these technologies for insight discovery. However, there are few benchmarks for evaluating insight discovery capabilities. As one of the most comprehensive existing frameworks, InsightBench also suffers from many critical flaws: format inconsistencies, poorly conceived objectives, and redundant insights. These issues may significantly affect the quality of data and the evaluation of agents. To address these issues, we thoroughly investigate shortcomings in InsightBench and propose essential criteria for a high-quality insight benchmark. Regarding this, we develop a data-curation pipeline to construct a new dataset named InsightEval. We further introduce a novel metric to measure the exploratory performance of agents. Through extensive experiments on InsightEval, we highlight prevailing challenges in automated insight discovery and raise some key fi

搜索结果：JCI insight

MSI-Agent: Incorporating Multi-Scale Insight into Embodied Agents for Superior Planning and Decision-Making

InsightEval: An Expert-Curated Benchmark for Assessing Insight Discovery in LLM-Driven Data Agents

Learning to Reason with Insight for Informal Theorem Proving

Data Insights as Data: Quick Overview and Exploration of Automated Data Insights

DataSage: Multi-agent Collaboration for Insight Discovery with External Knowledge Retrieval, Multi-role Debating, and Multi-path Reasoning

MedInsightBench: Evaluating Medical Analytics Agents Through Multi-Step Insight Discovery in Multimodal Medical Data

Representing Visualization Insights as a Dense Insight Network

An LLM-Based Approach for Insight Generation in Data Analysis

Beyond Description: A Multimodal Agent Framework for Insightful Chart Summarization

InsightLens: Augmenting LLM-Powered Data Analysis with Interactive Insight Management and Navigation

GIANTS: Generative Insight Anticipation from Scientific Literature

Leveraging Speech to Identify Signatures of Insight and Transfer in Problem Solving

Insight Agents: An LLM-Based Multi-Agent System for Data Insights

Insight: Enhancing Mobile Accessibility for Blind and Visually Impaired Users with LLMs

What Exactly is an Insight? A Literature Review

INSIGHT: Bridging the Student-Teacher Gap in Times of Large Language Models

Federation over Text: Insight Sharing for Multi-Agent Reasoning

INSIGHT: Universal Neural Simulator for Analog Circuits Harnessing Autoregressive Transformers

New insight into the Rapid Burster by Insight-HXMT

Japanese Children's Riddles as a Benchmark for Machine Insight and Metacognition