Long-term memory is significant for agents, in which insights play a crucial role. However, the emergence of irrelevant insight and the lack of general insight can greatly undermine the effectiveness of insight. To solve this problem, in this paper, we introduce Multi-Scale Insight Agent (MSI-Agent), an embodied agent designed to improve LLMs' planning and decision-making ability by summarizing and utilizing insight effectively across different scales. MSI achieves this through the experience selector, insight generator, and insight selector. Leveraging a three-part pipeline, MSI can generate task-specific and high-level insight, store it in a database, and then use relevant insight from it to aid in decision-making. Our experiments show that MSI outperforms another insight strategy when planning by GPT3.5. Moreover, We delve into the strategies for selecting seed experience and insight, aiming to provide LLM with more useful and relevant insight for better decision-making. Our observations also indicate that MSI exhibits better robustness when facing domain-shifting scenarios.
Data analysis has become an indispensable part of scientific research. To discover the latent knowledge and insights hidden within massive datasets, we need to perform deep exploratory analysis to realize their full value. With the advent of large language models (LLMs) and multi-agent systems, more and more researchers are making use of these technologies for insight discovery. However, there are few benchmarks for evaluating insight discovery capabilities. As one of the most comprehensive existing frameworks, InsightBench also suffers from many critical flaws: format inconsistencies, poorly conceived objectives, and redundant insights. These issues may significantly affect the quality of data and the evaluation of agents. To address these issues, we thoroughly investigate shortcomings in InsightBench and propose essential criteria for a high-quality insight benchmark. Regarding this, we develop a data-curation pipeline to construct a new dataset named InsightEval. We further introduce a novel metric to measure the exploratory performance of agents. Through extensive experiments on InsightEval, we highlight prevailing challenges in automated insight discovery and raise some key fi
Although most of the automated theorem-proving approaches depend on formal proof systems, informal theorem proving can align better with large language models' (LLMs) strength in natural language processing. In this work, we identify a primary bottleneck in informal theorem proving as a lack of insight, namely the difficulty of recognizing the core techniques required to solve complex problems. To address this, we propose $\texttt{DeepInsight}$, a unified training framework designed to cultivate this essential reasoning skill and enable LLMs to perform insightful reasoning. Our framework consists of three components: (1) $\texttt{DeepInsightTheorem}$, a hierarchical dataset that structures informal proofs by explicitly extracting core techniques and proof sketches alongside the final proof; (2) a Progressive Multi-Stage SFT strategy that mimics the human learning process, teaching the model proof writing, planning, and insight identification; and (3) $\texttt{InsightPO}$, a policy optimization method that assigns structured rewards over this insight hierarchy. Our experiments on challenging mathematical benchmarks demonstrate that this insight-aware generation strategy significantl
Automated data insight mining and visualization have been widely used in various business intelligence applications (e.g., market analysis and product promotion). However, automated insight mining techniques often output the same mining results to different analysts without considering their personal preferences, while interactive insight discovery requires significant manual effort. This paper fills the gap by integrating automated insight mining with interactive data visualization and striking a proper balance between them to facilitate insight discovery and exploration. Specifically, we regard data insights as a special type of data and further present InsightMap, a novel visualization approach that uses the map metaphor to provide a quick overview and in-depth exploration of different data insights, where a metric is proposed to measure the similarity between different insights. The effectiveness and usability of InsightMap are demonstrated through extensive case studies and in-depth user interviews.
In today's data-driven era, fully automated end-to-end data analytics, particularly insight discovery, is critical for discovering actionable insights that assist organizations in making effective decisions. With the rapid advancement of large language models (LLMs), LLM-driven agents have emerged as a promising paradigm for automating data analysis and insight discovery. However, existing data insight agents remain limited in several key aspects, often failing to deliver satisfactory results due to: (1) insufficient utilization of domain knowledge, (2) shallow analytical depth, and (3) error-prone code generation during insight generation. To address these issues, we propose DataSage, a novel multi-agent framework that incorporates three innovative features including external knowledge retrieval to enrich the analytical context, a multi-role debating mechanism to simulate diverse analytical perspectives and deepen analytical depth, and multi-path reasoning to improve the accuracy of the generated code and insights. Extensive experiments on InsightBench demonstrate that DataSage consistently outperforms existing data insight agents across all difficulty levels, offering an effectiv
In medical data analysis, extracting deep insights from complex, multi-modal datasets is essential for improving patient care, increasing diagnostic accuracy, and optimizing healthcare operations. However, there is currently a lack of high-quality datasets specifically designed to evaluate the ability of large multi-modal models (LMMs) to discover medical insights. In this paper, we introduce MedInsightBench, the first benchmark that comprises 332 carefully curated medical cases, each annotated with thoughtfully designed insights. This benchmark is intended to evaluate the ability of LMMs and agent frameworks to analyze multi-modal medical image data, including posing relevant questions, interpreting complex findings, and synthesizing actionable insights and recommendations. Our analysis indicates that existing LMMs exhibit limited performance on MedInsightBench, which is primarily attributed to their challenges in extracting multi-step, deep insights and the absence of medical expertise. Therefore, we propose MedInsightAgent, an automated agent framework for medical data analysis, composed of three modules: Visual Root Finder, Analytical Insight Agent, and Follow-up Question Compo
We propose a dense insight network framework to encode the relationships between automatically generated insights from a complex dashboard based on their shared characteristics. Our insight network framework includes five high-level categories of relationships (e.g., type, topic, value, metadata, and compound scores). The goal of this insight network framework is to provide a foundation for implementing new insight interpretation and exploration strategies, including both user-driven and automated approaches. To illustrate the complexity and flexibility of our framework, we first describe a visualization playground to directly visualize key network characteristics; this playground also demonstrates potential interactive capabilities for decomposing the dense insight network. Then, we discuss a case study application for ranking insights based on the underlying network characteristics captured by our framework, before prompting a large language model to generate a concise, natural language summary. Finally, we reflect on next steps for leveraging our insight network framework to design and evaluate new systems.
Generating insightful and actionable information from databases is critical in data analysis. This paper introduces a novel approach using Large Language Models (LLMs) to automatically generate textual insights. Given a multi-table database as input, our method leverages LLMs to produce concise, text-based insights that reflect interesting patterns in the tables. Our framework includes a Hypothesis Generator to formulate domain-relevant questions, a Query Agent to answer such questions by generating SQL queries against a database, and a Summarization module to verbalize the insights. The insights are evaluated for both correctness and subjective insightfulness using a hybrid model of human judgment and automated metrics. Experimental results on public and enterprise databases demonstrate that our approach generates more insightful insights than other approaches while maintaining correctness.
Chart summarization is crucial for enhancing data accessibility and the efficient consumption of information. However, existing methods, including those with Multimodal Large Language Models (MLLMs), primarily focus on low-level data descriptions and often fail to capture the deeper insights which are the fundamental purpose of data visualization. To address this challenge, we propose Chart Insight Agent Flow, a plan-and-execute multi-agent framework effectively leveraging the perceptual and reasoning capabilities of MLLMs to uncover profound insights directly from chart images. Furthermore, to overcome the lack of suitable benchmarks, we introduce ChartSummInsights, a new dataset featuring a diverse collection of real-world charts paired with high-quality, insightful summaries authored by human data analysis experts. Experimental results demonstrate that our method significantly improves the performance of MLLMs on the chart summarization task, producing summaries with deep and diverse insights.
The proliferation of large language models (LLMs) has revolutionized the capabilities of natural language interfaces (NLIs) for data analysis. LLMs can perform multi-step and complex reasoning to generate data insights based on users' analytic intents. However, these insights often entangle with an abundance of contexts in analytic conversations such as code, visualizations, and natural language explanations. This hinders efficient recording, organization, and navigation of insights within the current chat-based LLM interfaces. In this paper, we first conduct a formative study with eight data analysts to understand their general workflow and pain points of insight management during LLM-powered data analysis. Accordingly, we introduce InsightLens, an interactive system to overcome such challenges. Built upon an LLM-agent-based framework that automates insight recording and organization along with the analysis process, InsightLens visualizes the complex conversational contexts from multiple aspects to facilitate insight navigation. A user study with twelve data analysts demonstrates the effectiveness of InsightLens, showing that it significantly reduces users' manual and cognitive ef
Scientific breakthroughs often emerge from synthesizing prior ideas into novel contributions. While language models (LMs) show promise in scientific discovery, their ability to perform this targeted, literature-grounded synthesis remains underexplored. We introduce insight anticipation, a generation task in which a model predicts a downstream paper's core insight from its foundational parent papers. To evaluate this capability, we develop GiantsBench, a benchmark of 17k examples across eight scientific domains, where each example consists of a set of parent papers paired with the core insight of a downstream paper. We evaluate models using an LM judge that scores similarity between generated and ground-truth insights, and show that these similarity scores correlate with expert human ratings. Finally, we present GIANTS-4B, an LM trained via reinforcement learning (RL) to optimize insight anticipation using these similarity scores as a proxy reward. Despite its smaller open-source architecture, GIANTS-4B outperforms proprietary baselines and generalizes to unseen domains, achieving a 34% relative improvement in similarity score over gemini-3-pro. Human evaluations further show that G
Many problems seem to require a flash of insight to solve. What form do these sudden insights take, and what impact do they have on how people approach similar problems in the future? In this work, we prompted participants (N = 189) to think aloud as they attempted to solve a sequence of five "matchstick-arithmetic" problems. These problems either all relied on the same kind of non-obvious solution (Same group) or a different kind each time (Different group). Our first observation was that Same participants improved more rapidly than Different participants. We then leveraged techniques from natural language processing to analyze participants' speech, and found that this accelerated improvement for Same participants was accompanied by changes in both how much they spoke and what they said. In particular, they were more likely to spontaneously label the kind of problem they were working on. Taken together, these findings suggest that a hallmark of transferable insights is their accessibility for verbal report, even if the underlying precursors of insight remain difficult to articulate.
Today, E-commerce sellers face several key challenges, including difficulties in discovering and effectively utilizing available programs and tools, and struggling to understand and utilize rich data from various tools. We therefore aim to develop Insight Agents (IA), a conversational multi-agent Data Insight system, to provide E-commerce sellers with personalized data and business insights through automated information retrieval. Our hypothesis is that IA will serve as a force multiplier for sellers, thereby driving incremental seller adoption by reducing the effort required and increase speed at which sellers make good business decisions. In this paper, we introduce this novel LLM-backed end-to-end agentic system built on a plan-and-execute paradigm and designed for comprehensive coverage, high accuracy, and low latency. It features a hierarchical multi-agent structure, consisting of manager agent and two worker agents: data presentation and insight generation, for efficient information retrieval and problem-solving. We design a simple yet effective ML solution for manager agent that combines Out-of-Domain (OOD) detection using a lightweight encoder-decoder model and agent routin
This research paper addresses the limitations of current mobile accessibility services like TalkBack, which provide manual gesture-based sequential feedback to BVI users. Motivated by the promise of large language models (LLMs), this paper introduces Insight, an Android accessibility service that provides natural language interaction and real-time summarization of the screen. The paper performs a within-subject experimental study with users to compare Insight and TalkBack on usability factors. Results show Insight reduced mental effort and task time, and was preferred because of its dialogue interface, but users felt the need for interruption management. Results show LLM-based interfaces can significantly improve mobile accessibility, and describe the potential of hybrid solutions combining gesture and dialogue modalities towards more inclusive design.
Insights are often considered the ideal outcome of visual analysis sessions. However, there is no single definition of what an insight is. Some scholars define insights as correlations, while others define them as hypotheses or aha moments. This lack of a clear definition can make it difficult to build visualization tools that effectively support insight discovery. In this paper, we contribute a comprehensive literature review that maps the landscape of existing insight definitions. We summarize key themes regarding how insight is defined, with the goal of helping readers identify which definitions of insight align closely with their research and tool development goals. Based on our review, we also suggest interesting research directions, such as synthesizing a unified formalism for insight and connecting theories of insight to other critical concepts in visualization research.
The rise of AI, especially Large Language Models, presents challenges and opportunities to integrate such technology into the classroom. AI has the potential to revolutionize education by helping teaching staff with various tasks, such as personalizing their teaching methods, but it also raises concerns, for example, about the degradation of student-teacher interactions and user privacy. Based on interviews with teaching staff, this paper introduces INSIGHT, a proof of concept to combine various AI tools to assist teaching staff and students in the process of solving exercises. INSIGHT has a modular design that allows it to be integrated into various higher education courses. We analyze students' questions to an LLM by extracting keywords, which we use to dynamically build an FAQ from students' questions and provide new insights for the teaching staff to use for more personalized face-to-face support. Future work could build upon INSIGHT by using the collected data to provide adaptive learning and adjust content based on student progress and learning styles to offer a more interactive and inclusive learning experience.
We propose a federated learning-like framework, Federation over Text (FoT), that enables multiple clients solving different tasks to collectively generate a shared library of metacognitive insights by iteratively federating their local reasoning processes without sharing actual problem instances or task instructions. Instead of federation over gradients (e.g., as in distributed training), FoT operates at the semantic level without any gradient optimization or supervision signal. Iteratively, each client runs an LLM agent that does local thinking and self-improvement on their specific tasks independently, and shares reasoning traces with a central server, which aggregates and distills them into a cross-task (and cross-domain) insight library that existing and future agents can leverage to improve performance on related tasks. Experiments show that FoT improves reasoning effectiveness and efficiency across a wide range of challenging applications, including mathematical problem solving, cross-domain collaboration, real-world daily tasks, and machine learning research insight discovery. Specifically, it improves average performance scores by 25% while reducing the reasoning tokens by
Analog front-end design heavily relies on specialized human expertise and costly trial-and-error simulations, which motivated many prior works on analog design automation. However, efficient and effective exploration of the vast and complex design space remains constrained by the time-consuming nature of SPICE simulations, making effective design automation a challenging endeavor. In this paper, we introduce INSIGHT, a GPU-powered, technology-agnostic, effective universal neural simulator in the analog front-end design automation loop. INSIGHT accurately predicts the performance metrics of analog circuits across various technologies with just a few microseconds of inference time. Notably, its autoregressive capabilities enable INSIGHT to accurately predict simulation-costly critical transient specifications leveraging less expensive performance metric information. The low cost and high fidelity feature make INSIGHT a good substitute for standard simulators in analog front-end optimization frameworks. INSIGHT is compatible with any optimization framework, facilitating enhanced design space exploration for sample efficiency through sophisticated offline learning and adaptation techni
We report the timing and spectral analyses upon of the type II X-ray bursts from the Rapid Burster (MXB 1730--335) observed by Insight-HXMT and Swift/XRT. By stacking the long-duration bursts, we find for the first time that the hard X-rays are lagging than the soft X-rays by 3 seconds. However, such a lag is not visible for the short-duration bursts, probably because of the poor statistics. For all bursts the energy spectrum is found to be non-thermal, thanks to the broad band coverage of Insight-HXMT. These findings put new insights into the type-II bursts and require a temporally showing-up corona for possible interpretation.
Benchmark saturation and contamination have obscured genuine advances in reasoning for large language models (LLMs). We introduce NazoNazo Benchmark, a low-cost, renewable test built from Japanese children's riddles that demand insight-based reasoning, or representational shifts rather than knowledge recall. We evaluate 38 frontier LLMs (2023-2025) on 201 riddles and a 120-item human-comparison subset, finding that non-reasoning models average 7.6%, reasoning models 17.6%, and humans ~53% accuracy. Importantly, thought-log analysis reveals that reasoning in Japanese did not necessarily improve accuracy, indicating that language understanding alone is insufficient for insight reasoning. Notably, models sometimes generated correct candidates but failed to endorse them, suggesting weak metacognitive control rather than a lack of knowledge. This "verification failure" indicates that CoT outputs can reflect genuine intermediate reasoning states rather than post-hoc rationalizations. By exposing this metacognitive bottleneck - models' inability to recognize when they are right - the benchmark provides a scalable, cross-linguistic testbed for studying machine insight, confidence calibrati