共找到 20 条结果
The psychological science of artificial intelligence (AI) can be broadly defined as an emerging field of psychology that examines all AI-related mental and behavioral processes from the perspective of psychology. This field has been growing exponentially in the recent decade. This review synthesizes the existing literature on the psychological science of AI with a goal to provide a comprehensive conceptual framework for planning, conducting, and assessing scientific research in the field. It consists of six parts, starting with an overview of the entire field of the psychological science of artificial intelligence, then synthesizing the literature in each of the four specific areas (i.e., Psychology of designing AI, psychology of using AI, AI for examining psychological processes, and AI for advancing psychological methods), and concluding with an outlook on the field in the future.
Foundation models have been increasingly applied to behavioral science domains such as psychology, sociology, and economics. While these models show promise in individual tasks such as survey response prediction and human-subject experiment simulation, there remains no systematic understanding of how well they perform across diverse behavioral science tasks, contexts, and populations. We introduce BehaviorBench, a comprehensive benchmark that evaluates foundation models along four core capabilities: (1) behavior prediction and simulation, (2) strategic decision-making, (3) subject-trait inference, and (4) behavioral knowledge application. Crucially, BehaviorBench evaluates model outputs at both the individual and distributional levels, capturing not only per-subject accuracy but also population-level alignment, an essential requirement for behavioral validity. Leveraging the tasks in BehaviorBench, we further develop Be.FM-1.5, extending the Be.FM family of behavioral foundation models fine-tuned on behavioral data. Our results reveal a considerable gap: proprietary general-purpose models excel at individual-level prediction and knowledge-intensive tasks, whereas behavioral foundat
Traditional in-person psychological counseling remains primarily niche, often chosen by individuals with psychological issues, while online automated counseling offers a potential solution for those hesitant to seek help due to feelings of shame. Cognitive Behavioral Therapy (CBT) is an essential and widely used approach in psychological counseling. The advent of large language models (LLMs) and agent technology enables automatic CBT diagnosis and treatment. However, current LLM-based CBT systems use agents with a fixed structure, limiting their self-optimization capabilities, or providing hollow, unhelpful suggestions due to redundant response patterns. In this work, we utilize Quora-like and YiXinLi single-round consultation models to build a general agent framework that generates high-quality responses for single-turn psychological consultation scenarios. We use a bilingual dataset to evaluate the quality of single-response consultations generated by each framework. Then, we incorporate dynamic routing and supervisory mechanisms inspired by real psychological counseling to construct a CBT-oriented autonomous multi-agent framework, demonstrating its general applicability. Experim
Behavioral evaluation is the dominant paradigm for assessing alignment in large language models (LLMs). In current practice, observed compliance under finite evaluation protocols is treated as evidence of latent alignment. However, the inference from bounded behavioral evidence to claims about global latent properties is rarely analyzed as an identifiability problem. In this paper, we study alignment evaluation through the lens of statistical identifiability under partial observability. We allow agent policies to condition their behavior on observable signals correlated with the evaluation regime, a phenomenon we term evaluation awareness. Within this framework, we formalize the Alignment Verifiability Problem and introduce Normative Indistinguishability, which arises when distinct latent alignment hypotheses induce identical distributions over evaluator-accessible observations. Our main theoretical contribution is a conditional impossibility result: under finite behavioral evaluation and evaluation-aware policies, observed compliance does not uniquely identify latent alignment, but only membership in an equivalence class of conditionally compliant policies, under explicit assumpti
Data science pipelines inform and influence many daily decisions, from what we buy to who we work for and even where we live. When designed incorrectly, these pipelines can easily propagate social inequity and harm. Traditional solutions are technical in nature; e.g., mitigating biased algorithms. In this vision paper, we introduce a novel lens for promoting responsible data science using theories of behavior change that emphasize not only technical solutions but also the behavioral responsibility of practitioners. By integrating behavior change theories from cognitive psychology with data science workflow knowledge and ethics guidelines, we present a new perspective on responsible data science. We present example data science interventions in machine learning and visual data analysis, contextualized in behavior change theories that could be implemented to interrupt and redirect potentially suboptimal or negligent practices while reinforcing ethically conscious behaviors. We conclude with a call to action to our community to explore this new research area of behavior change interventions for responsible data science.
The intersection of artificial intelligence and psychological science has experienced remarkable growth, with annual publications expanding from 859 papers in 2000 to 29,979 by 2025. However, this rapid evolution has created methodological fragmentation where similar computational techniques are independently developed across isolated psychological domains. This survey introduces the first systematic taxonomy that organizes AI-driven psychology tasks by computational processing patterns rather than application domains, categorizing them into four fundamental types: classification, regression, structured relational, and generative interactive tasks. Through analysis of over 300 representative works spanning the pre-trained model era and large language model era, we examine how computational approaches evolved from task-specific feature engineering to transfer learning and few-shot adaptation. We provide systematic coverage of datasets, evaluation metrics, and benchmarks while addressing fundamental challenges including interpretability, label uncertainty, privacy constraints, and cross-cultural validity. This computational perspective reveals transferable methodological patterns pre
Mauve is a low-cost small satellite developed and operated by Blue Skies Space Ltd. The payload features a 13 cm telescope connected with a fibre that feeds into a UV-Vis spectrometer. The detector covers the 200-700 nm range in a single shot, obtaining low resolution spectra at R~20-65. Mauve has launched on 28th November 2025, reaching a 510 km Low-Earth Sun-synchronous orbit. The satellite will enable UV and visible observations of a variety of stellar objects in our Galaxy, filling the gaps in the ultraviolet space-based data. The researchers that have already joined the mission have defined the science themes, observational strategy and targets that Mauve will observe in the first year of operations. To date 10 science themes have been developed by the Mauve science collaboration for year 1, with observational strategies that include both long duration monitoring and short cadence snapshots. Here, we describe these themes and the science that Mauve will undertake in its first year of operations.
The large instantaneous sensitivity, a wide frequency coverage and flexible observation modes with large number of beams in the sky are the main features of the SKA observatory's two telescopes, the SKA-Low and the SKA-Mid, which are located on two different continents. Owing to these capabilities, the SKAO telescopes are going to be a game-changer for radio astronomy in general and pulsar astronomy in particular. The eleven articles in this special issue on pulsar science with the SKA Observatory describe its impact on different areas of pulsar science. In this lead article, a brief description of the two telescopes highlighting the relevant features for pulsar science is presented followed by an overview of each accompanying article, exploring the inter-relationship between different pulsar science use cases.
Large language models (LLMs) offer emerging opportunities for psychological and behavioral research, but methodological guidance is lacking. This article provides a framework for using LLMs as psychological simulators across two primary applications: simulating roles and personas to explore diverse contexts, and serving as computational models to investigate cognitive processes. For simulation, we present methods for developing psychologically grounded personas that move beyond demographic categories, with strategies for validation against human data and use cases ranging from studying inaccessible populations to prototyping research instruments. For cognitive modeling, we synthesize emerging approaches for probing internal representations, methodological advances in causal interventions, and strategies for relating model behavior to human cognition. We address overarching challenges including prompt sensitivity, temporal limitations from training data cutoffs, and ethical considerations that extend beyond traditional human subjects review. Throughout, we emphasize the need for transparency about model capabilities and constraints. Together, this framework integrates emerging empir
GREX-PLUS (Galaxy Reionization EXplorer and PLanetary Universe Spectrometer) is a mission candidate for a JAXA strategic L-class mission to be launched in the 2030s. Its primary science goals are two-fold: galaxy formation and evolution, and planetary system formation and evolution. The GREX-PLUS spacecraft will carry a telescope with a 1 m primary mirror aperture cooled down to 50 K. The two science instruments will be onboard: a wide-field camera in the 2--8 $μ$m wavelength band and a high-resolution spectrometer with a wavelength resolution of 30,000 in the 10--18 $μ$m band. The GREX-PLUS wide-field camera aims to detect the first generation of galaxies at redshift $z>15$. The GREX-PLUS high-resolution spectrometer aims to identify the location of the water ``snowline'' in protoplanetary disks. Both instruments will provide unique datasets for a broad range of scientific topics, including galaxy mass assembly, the origin of supermassive blackholes, infrared background radiation, molecular spectroscopy in the interstellar medium, transit spectroscopy of exoplanet atmospheres, planetary atmospheres in the Solar System, and so on. This document is the second version of a collect
The psychology of science is the least developed member of the family of science studies. It is growing, however, increasingly into a promising discipline. After a very brief review of this emerging sub-field of psychology, we call for it to be invited into the collection of social sciences that constitute the interdisciplinary field of science policy. Discussing the classic issue of resource allocation, this paper tries to indicate how prolific a new psychological conceptualization of this problem would be. Further, from a psychological perspective, this research will argue in favor of a more realistic conception of science which would be a complement to the existing one in science policy.
Training and education in human-centered fields require authentic practice, yet realistic simulations of human behavior have remained limited. We present a multi-agent psychological simulation system that models internal cognitive-affective processes to generate believable human behaviors. In contrast to black-box neural models, this system is grounded in established psychological theories (e.g., self-efficacy, mindset, social constructivism) and explicitly simulates an ``inner parliament'' of agents corresponding to key psychological factors. These agents deliberate and interact to determine the system's output behavior, enabling unprecedented transparency and alignment with human psychology. We describe the system's architecture and theoretical foundations, illustrate its use in teacher training and research, and discuss how it embodies principles of social learning, cognitive apprenticeship, deliberate practice, and meta-cognition.
Using intelligent systems to perceive psychological and social behaviors, that is, the underlying affective, cognitive, and pathological states that are manifested through observable behaviors and social interactions, remains a challenge due to their complex, multifaceted, and personalized nature. Existing work tackling these dimensions through specialized datasets and single-task systems often miss opportunities for scalability, cross-task transfer, and broader generalization. To address this gap, we curate Human Behavior Atlas, a unified benchmark of diverse behavioral tasks designed to support the development of foundation models for understanding psychological and social behaviors. Human Behavior Atlas comprises over 100,000 samples spanning text, audio, and visual modalities, covering tasks on affective states, cognitive states, pathologies, and social processes. Our unification efforts can reduce redundancy and cost, enable training to scale efficiently across tasks, and enhance generalization of behavioral features across domains. On Human Behavior Atlas, we train three models: Omnisapiens-7B SFT, Omnisapiens-7B BAM, and Omnisapiens-7B RL. We show that training on Human Beha
Generative agents have made significant progress in simulating human behavior, but existing frameworks often simplify emotional modeling and focus primarily on specific tasks, limiting the authenticity of the simulation. Our work proposes the Psychological-mechanism Agent (PSYA) framework, based on the Cognitive Triangle (Feeling-Thought-Action), designed to more accurately simulate human behavior. The PSYA consists of three core modules: the Feeling module (using a layer model of affect to simulate changes in short-term, medium-term, and long-term emotions), the Thought module (based on the Triple Network Model to support goal-directed and spontaneous thinking), and the Action module (optimizing agent behavior through the integration of emotions, needs and plans). To evaluate the framework's effectiveness, we conducted daily life simulations and extended the evaluation metrics to self-influence, one-influence, and group-influence, selection five classic psychological experiments for simulation. The results show that the PSYA framework generates more natural, consistent, diverse, and credible behaviors, successfully replicating human experimental outcomes. Our work provides a riche
The deployment of large language models (LLMs) in diverse applications requires a thorough understanding of their decision-making strategies and behavioral patterns. As a supplement to a recent study on the behavioral Turing test, this paper presents a comprehensive analysis of five leading LLM-based chatbot families as they navigate a series of behavioral economics games. By benchmarking these AI chatbots, we aim to uncover and document both common and distinct behavioral patterns across a range of scenarios. The findings provide valuable insights into the strategic preferences of each LLM, highlighting potential implications for their deployment in critical decision-making roles.
With the growing adoption of electric vehicles (EVs), understanding user charging behavior has become critical for grid stability and transportation planning. This study investigates the behavioral heterogeneity of EV taxi drivers by analyzing the interaction between psychological traits and situational triggers within dynamic travel contexts. Leveraging large language models (LLMs) as a core simulation tool, a novel framework with statistical enhancement is developed to replicate and analyze the charging behaviors of taxi drivers. LLMs simulate personalized decision-making processes by leveraging natural language reasoning and role-playing capabilities, accounting for factors such as time sensitivity, price awareness, and range anxiety. Simulation results indicate that the framework reliably reproduces real-world charging behaviors across multiple urban environments. his fidelity arises from integrating statistical priors into the reasoning process, allowing the model to anchor its decisions in empirical behavioral patterns. Further analysis highlights the joint influence of environmental and psychological variables on charging decisions and reveals the heterogeneity of different
Large language models (LLMs) have exhibited exceptional capabilities in natural language understanding and generation, image recognition, and multimodal tasks, charting a course towards AGI and emerging as a central issue in the global technological race. This manuscript conducts a comprehensive review of the core technologies that support LLMs from a user standpoint, including prompt engineering, knowledge-enhanced retrieval augmented generation, fine tuning, pretraining, and tool learning. Additionally, it traces the historical development of Science of Science (SciSci) and presents a forward looking perspective on the potential applications of LLMs within the scientometric domain. Furthermore, it discusses the prospect of an AI agent based model for scientific evaluation, and presents new research fronts detection and knowledge graph building methods with LLMs.
Data science and technology offer transformative tools and methods to science. This review article highlights latest development and progress in the interdisciplinary field of data-driven plasma science (DDPS). A large amount of data and machine learning algorithms go hand in hand. Most plasma data, whether experimental, observational or computational, are generated or collected by machines today. It is now becoming impractical for humans to analyze all the data manually. Therefore, it is imperative to train machines to analyze and interpret (eventually) such data as intelligently as humans but far more efficiently in quantity. Despite the recent impressive progress in applications of data science to plasma science and technology, the emerging field of DDPS is still in its infancy. Fueled by some of the most challenging problems such as fusion energy, plasma processing of materials, and fundamental understanding of the universe through observable plasma phenomena, it is expected that DDPS continues to benefit significantly from the interdisciplinary marriage between plasma science and data science into the foreseeable future.
The areas of machine learning and knowledge discovery in databases have considerably matured in recent years. In this article, we briefly review recent developments as well as classical algorithms that stood the test of time. Our goal is to provide a general introduction into different tasks such as learning from tabular data, behavioral data, or textual data, with a particular focus on actual and potential applications in behavioral sciences. The supplemental appendix to the article also provides practical guidance for using the methods by pointing the reader to proven software implementations. The focus is on R, but we also cover some libraries in other programming languages as well as systems with easy-to-use graphical interfaces.
Actionable analytics are those that humans can understand, and operationalize. What kind of data mining models generate such actionable analytics? According to psychological scientists, humans understand models that most match their own internal models, which they characterize as lists of "heuristic" (i.e., lists of very succinct rules). One such heuristic rule generator is the Fast-and-Frugal Trees (FFT) preferred by psychological scientists. Despite their successful use in many applied domains, FFTs have not been applied in software analytics. Accordingly, this paper assesses FFTs for software analytics. We find that FFTs are remarkably effective. Their models are very succinct (5 lines or less describing a binary decision tree). These succinct models outperform state-of-the-art defect prediction algorithms defined by Ghortra et al. at ICSE'15. Also, when we restrict training data to operational attributes (i.e., those attributes that are frequently changed by developers), FFTs perform much better than standard learners. Our conclusions are two-fold. Firstly, there is much that software analytics community could learn from psychological science. Secondly, proponents of complex me