Computational notebooks are intended to prioritize the needs of scientists, but little is known about how scientists interact with notebooks, what requirements drive scientists' software development processes, or what tactics scientists use to meet their requirements. We conducted an observational study of 20 scientists using Jupyter notebooks for their day-to-day tasks, finding that scientists prioritize different quality attributes depending on their goals. A qualitative analysis of their usage shows (1) a collection of goals scientists pursue with Jupyter notebooks, (2) a set of quality attributes that scientists value when they write software, and (3) tactics that scientists leverage to promote quality. In addition, we identify ways scientists incorporated AI tools into their notebook work. From our observations, we derive design recommendations for improving computational notebooks and future programming systems for scientists. Key opportunities pertain to helping scientists create and manage state, dependencies, and abstractions in their software, enabling more effective reuse of clearly-defined components.
Given the recent targeting of Chinese scientists by the Department of Justice and sizable contributions of Chinese scientists to American science, it is urgent to investigate the presence and the particulars of anti-Chinese discrimination in the American academy. Across a sample of all faculty in the top 100 departments of sociology, economics, chemistry, and physics in the United States, we show that female Chinese scientists comprise a much higher percentage of the female professoriate than male Chinese scientists in the male professoriate. Using an exact matching approach, we then find that male Chinese scientists suffer from a dramatic citation penalty but that female Chinese scientists enjoy a persistent citation bonus. On average, female Chinese scientists require fewer citations on average than non-Chinese women where male Chinese scientists require more citations than their non-Chinese counterparts to attain a tenure-track professorial job of a given prestige rating.
When two AI models are trained on the same scientific task, do they learn the same theory or two different theories? Throughout history of science, we have witnessed the rise and fall of theories driven by experimental validation or falsification: many theories may co-exist when experimental data is lacking, but the space of survived theories become more constrained with more experimental data becoming available. We show the same story is true for AI scientists. With increasingly more systems provided in training data, AI scientists tend to converge in the theories they learned, although sometimes they form distinct groups corresponding to different theories. To mechanistically interpret what theories AI scientists learn and quantify their agreement, we propose MASS, Hamiltonian-Lagrangian neural networks as AI Scientists, trained on standard problems in physics, aggregating training results across many seeds simulating the different configurations of AI scientists. Our findings suggests for AI scientists switch from learning a Hamiltonian theory in simple setups to a Lagrangian formulation when more complex systems are introduced. We also observe strong seed dependence of the trai
AI scientists are emerging computational systems that serve as collaborative partners in discovery. These systems remain difficult to build because they are bespoke, tied to rigid workflows, and lack shared environments that unify tools, data, and analyses into a common ecosystem. In genomics, unified ecosystems have transformed research by enabling interoperability, reuse, and community-driven development; AI scientists require comparable infrastructure. We present ToolUniverse, an ecosystem for building AI scientists from any language or reasoning model across open- and closed-weight models. ToolUniverse standardizes how AI scientists identify and call tools by providing more than 600 machine learning models, datasets, APIs, and scientific packages for data analysis, knowledge retrieval, and experimental design. It automatically refines tool interfaces for correct use by AI scientists, generates new tools from natural language descriptions, iteratively optimizes tool specifications, and composes tools into agentic workflows. In a case study of hypercholesterolemia, ToolUniverse was used to create an AI scientist to identify a potent analog of a drug with favorable predicted prope
Despite broad acclaim for basic research, science is undergoing an applied shift that marginalizes basic scientists. This gap reflects an incomplete understanding of their distinctive roles, which prevents translating philosophical appreciation into effective support. We introduce a scalable metric--the application score--to position research along the basic-applied spectrum and apply it to 62 million publications (1970-2023) to reveal the distinctive contributions of basic scientists. We find a structural asymmetry: involvement of basic scientists substantially increases citation impact, even more so in applied contexts, while applied scientists show no such effect in basic domains. This asymmetric effect arises from their intellectual leadership in conceptualization, writing, and experimental design, amplified in large, multidisciplinary, and intermediate career teams. Yet basic scientists remain concentrated in historically prestigious institutions, while new entrants shift toward applied work, indicating critical undersupply. These findings provide large-scale evidence for the indispensable role of basic scientists, guiding policy and institutional strategy to sustain the found
Recent advances in large language models have enabled the emergence of AI scientists that aim to autonomously analyze biological data and assist scientific discovery. Despite rapid progress, it remains unclear to what extent these systems can extract meaningful biological insights from real experimental data. Existing benchmarks either evaluate reasoning in the absence of data or focus on predefined analytical outputs, failing to reflect realistic, data-driven biological research. Here, we introduce BAISBench (Biological AI Scientist Benchmark), a benchmark for evaluating AI scientists on real single-cell transcriptomic datasets. BAISBench comprises two tasks: cell type annotation across 15 expert-labeled datasets, and scientific discovery through 193 multiple-choice questions derived from biological conclusions reported in 41 published single-cell studies. We evaluated several representative AI scientists using BAISBench and, to provide a human performance baseline, invited six graduate-level bioinformaticians to collectively complete the same tasks. The results show that while current AI scientists fall short of fully autonomous biological discovery, they already demonstrate subs
Scientists across disciplines write code for critical activities like data collection and generation, statistical modeling, and visualization. As large language models that can generate code have become widely available, scientists may increasingly use these models during research software development. We investigate the characteristics of scientists who are early-adopters of code generating models and conduct interviews with scientists at a public, research-focused university. Through interviews and reviews of user interaction logs, we see that scientists often use code generating models as an information retrieval tool for navigating unfamiliar programming languages and libraries. We present findings about their verification strategies and discuss potential vulnerabilities that may emerge from code generation practices unknowingly influencing the parameters of scientific analyses.
In scientific research, collaboration is one of the most effective ways to take advantage of new ideas, skills, resources, and for performing interdisciplinary research. Although collaboration networks have been intensively studied, the question of how individual scientists choose collaborators to study a new research topic remains almost unexplored. Here, we investigate the statistics and mechanisms of collaborations of individual scientists along their careers, revealing that, in general, collaborators are involved in significantly fewer topics than expected from controlled surrogate. In particular, we find that highly productive scientists tend to have higher fraction of single-topic collaborators, while highly cited, i.e., impactful, scientists have higher fraction of multi-topic collaborators. We also suggest a plausible mechanism for this distinction. Moreover, we investigate the cases where scientists involve existing collaborators into a new topic. We find that compared to productive scientists, impactful scientists show strong preference of collaboration with high impact scientists on a new topic. Finally, we validate our findings by investigating active scientists in diff
Star scientists are highly influential researchers who have made significant contributions to their field, gained widespread recognition, and often attracted substantial research funding. They are critical for the advancement of science and innovation, and they have a significant influence on the transfer of knowledge and technology to industry. Identifying potential star scientists before their performance becomes outstanding is important for recruitment, collaboration, networking, or research funding decisions. Using machine learning techniques, this study proposes a model to predict star scientists in the field of artificial intelligence while highlighting features related to their success. Our results confirm that rising stars follow different patterns compared to their non-rising stars counterparts in almost all the early-career features. We also found that certain features such as gender and ethnic diversity play important roles in scientific collaboration and that they can significantly impact an author's career development and success. The most important features in predicting star scientists in the field of artificial intelligence were the number of articles, group discipl
In this article, I trace the early historical developments that ultimately led to the creation of the atomic bomb. Even after the completion of weapons, many scientists continued to argue that nuclear armaments were indispensable for maintaining the global balance of political power [1]. This study focuses on several scientists who confronted profound moral dilemmas concerning the use of bombs against Japan. Some openly opposed its deployment. Others sought to warn a Japanese physicist in the hope of averting further devastation. Still, others expressed deep remorse in its aftermath. In addition, the experience of an individual directly affected by the bombing is discussed. By examining these episodes, this article aims to contribute to the ongoing discourse on how scientific research should be guided by ethical principles in the future.
We, as researchers in quantum science and technology, are publishing this manifesto to express our deep concerns about the current geopolitical situation and the global race to rearm. We firmly oppose all forms of militarization in our societies and, in particular, within the academic world. We categorically reject the use of our research for military applications, population control, or surveillance. We stand against the practice of military funding for research. This manifesto is a call to action: to confront the elephant in the room of quantum research, and to unite all researchers who share our views. Our main goals are: i) To express, as a unified collective, our rejection of the use of our research for military purposes; ii) To open a debate in our community about the ethical implications of quantum research for military purposes; iii) To create a forum where concerned scientists can share their opinions and join forces in support of demilitarized research; iv) To advocate for the establishment of a public database listing all research projects at public universities funded by military or defense agencies. In what follows, we lay out our concerns and the rationale behind our
The emergence of Artificial Intelligence (AI) Scientist represents a paradigm shift in scientific discovery, with large language models (LLMs) taking the lead as the primary executor in the entire scientific workflow from idea generation to experiment implementation. Recent AI Scientist studies demonstrate sufficient capabilities for independent scientific discovery, with the generated research reports gaining acceptance at the ICLR 2025 workshop and ACL 2025, arguing that a human-level AI Scientist, capable of uncovering phenomena previously unknown to humans, may be imminent. Despite this substantial progress, AI Scientist has yet to produce a groundbreaking achievement in the domain of computer science on par with automated scientific tools. Based on extensive quantitative evidence from existing benchmarks in complex engineering tasks and a systematic evaluation assess 28 research papers generated by five advanced AI Scientist systems, we argue that \textbf{the fundamental bottleneck for AI Scientists lies in their capability to execute the requisite verification procedures.} Current AI Scientist systems lack the execution capabilities needed to execute rigorous experiments and
Keeping up with the research literature plays an important role in the workflow of scientists - allowing them to understand a field, formulate the problems they focus on, and develop the solutions that they contribute, which in turn shape the nature of the discipline. In this paper, we examine the literature review practices of data scientists. Data science represents a field seeing an exponential rise in papers, and increasingly drawing on and being applied in numerous diverse disciplines. Recent efforts have seen the development of several tools intended to help data scientists cope with a deluge of research and coordinated efforts to develop AI tools intended to uncover the research frontier. Despite these trends indicative of the information overload faced by data scientists, no prior work has examined the specific practices and challenges faced by these scientists in an interdisciplinary field with evolving scholarly norms. In this paper, we close this gap through a set of semi-structured interviews and think-aloud protocols of industry and academic data scientists (N = 20). Our results while corroborating other knowledge workers' practices uncover several novel findings: indi
The emergence of large language models (LLMs) is propelling automated scientific discovery to the next level, with LLM-based Artificial Intelligence (AI) Scientist systems now taking the lead in scientific research. Several influential works have already appeared in the field of AI Scientist systems, with AI-generated research papers having been accepted at the ICLR 2025 workshop, suggesting that a human-level AI Scientist capable of uncovering phenomena previously unknown to humans, may soon become a reality. In this survey, we focus on the central question: How far are AI scientists from changing the world and reshaping the scientific research paradigm? To answer this question, we provide a prospect-driven review that comprehensively analyzes the current achievements of AI Scientist systems, identifying key bottlenecks and the critical components required for the emergence of a scientific agent capable of producing ground-breaking discoveries that solve grand challenges. We hope this survey will contribute to a clearer understanding of limitations of current AI Scientist systems, showing where we are, what is missing, and what the ultimate goals for scientific AI should be.
How the general public perceives scientists has been of interest to science educators for decades. While there can be many factors of it, the impact of recent generative artificial intelligence (AI) models is noteworthy, as these are rapidly changing how people acquire information. This report presents the pilot study examining how modern generative AI represents images of scientist using the Draw-A-Scientist Test (DAST). As a data, 1,100 images of scientist were generated using Midjourney v 6.1. One hundred of these images were analyzed by a science education scholar using a DAST scoring rubric. Using the data, the researcher went through prompt engineering to instruct gpt-4.1-mini to automatically analyze the remaining 1,000 images. The results show that generative AI represents stereotypical images of scientists, such as lab coat (97%), eyeglasses (97%), male gender (81%), and Caucasian (85%) in the 100 images analyzed by the researcher. However, gpt-4.1-mini could also detect those stereotypes in the accuracy of 79% in the same 100 images. gpt-4.1-mini also analyzed the remaining 1,000 images and found stereotypical features in the images (lab coat: 97%, eyeglasses: 95%, male g
AI scientists powered by large language models have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines. While their capabilities are promising, these agents also introduce novel vulnerabilities that require careful consideration for safety. However, there has been limited comprehensive exploration of these vulnerabilities. This perspective examines vulnerabilities in AI scientists, shedding light on potential risks associated with their misuse, and emphasizing the need for safety measures. We begin by providing an overview of the potential risks inherent to AI scientists, taking into account user intent, the specific scientific domain, and their potential impact on the external environment. Then, we explore the underlying causes of these vulnerabilities and provide a scoping review of the limited existing works. Based on our analysis, we propose a triadic framework involving human regulation, agent alignment, and an understanding of environmental feedback (agent regulation) to mitigate these identified risks. Furthermore, we highlight the limitations and challenges associated with safeguarding A
Emergent patterns of collective attention towards scientists and their research may function as a proxy for scientific impact which traditionally is assessed via committees that award prizes to scientists. Therefore it is crucial to understand the relationships between scientific impact and online demand and supply for information about scientists and their work. In this paper, we compare the temporal pattern of information supply (article creations) and information demand (article views) on Wikipedia for two groups of scientists: scientists who received one of the most prestigious awards in their field and influential scientists from the same field who did not receive an award. Our research highlights that awards function as external shocks which increase supply and demand for information about scientists, but hardly affect information supply and demand for their research topics. Further, we find interesting differences in the temporal ordering of information supply between the two groups: (i) award-winners have a higher probability that interest in them precedes interest in their work; (ii) for award winners interest in articles about them and their work is temporally more cluste
Despite persistent efforts in revealing the temporal patterns in scientific careers, little attention has been paid to the spatial patterns of scientific activities in the knowledge space. Here, drawing on millions of papers in six disciplines, we consider scientists' publication sequence as "walks" on the quantifiable epistemic landscape constructed from large-scale bibliometric corpora by combining embedding and manifold learning algorithms, aiming to reveal the individual research topic dynamics and association between research radius with academic performance, along their careers. Intuitively, the visualization shows the localized and bounded nature of mobile trajectories. We further find that the distributions of scientists' transition radius and transition pace are both left-skewed compared with the results of controlled experiments. Then, we observe the mixed exploration and exploitation pattern and the corresponding strategic trade-off in the research transition, where scientists both deepen their previous research with frequency bias and explore new research with knowledge proximity bias. We further develop a bounded exploration-exploitation (BEE) model to reproduce the ob
High skill labour is an important factor underpinning the competitive advantage of modern economies. Therefore, attracting and retaining scientists has become a major concern for migration policy. In this work, we study the migration of scientists on a global scale, by combining two large data sets covering the publications of 3.5 Mio scientists over 60 years. We analyse their geographical distances moved for a new affiliation and their age when moving, this way reconstructing their geographical "career paths". These paths are used to derive the world network of scientists mobility between cities and to analyse its topological properties. We further develop and calibrate an agent-based model, such that it reproduces the empirical findings both at the level of scientists and of the global network. Our model takes into account that the academic hiring process is largely demand-driven and demonstrates that the probability of scientists to relocate decreases both with age and with distance. Our results allow interpreting the model assumptions as micro-based decision rules that can explain the observed mobility patterns of scientists.
National science systems have become embedded in global science and countries do everything they can to harness global knowledge to national economic needs. However, accessing and using the riches of global knowledge can occur only through scientists. Consequently, the research power of nations relies on the research power of individual scientists. Their capacity to collaborate internationally and to tap into the global networked science is key. The constantly evolving, bottom-up, autonomous, self-regulating, and self-focused nature of global science requires deeper understanding; and the best way to understand its dynamics is to understand what drives academic scientists in their work. The idea that science remains a state-driven rather than curiosity-driven is difficult to sustain. In empirical terms, we describe the globalization of science using selected publication, collaboration, and citation data from 2000-2020. The globalization of science implies two different processes in two different system types: the growth of science in the Western world is almost entirely attributable to internationally co-authored publications; its growth in the developing world, in contrast, is dri