Generative AI has greatly transformed creative work in various domains, such as screenwriting. To understand this transformation, prior research often focused on capturing a snapshot of human-AI co-creation practice at a specific moment, with less attention to how humans mobilize, regulate, and reflect to form the practice gradually. Motivated by Bandura's theory of human agency, we conducted a two-week study with 19 professional screenwriters to investigate how they embraced AI in their creation process. Our findings revealed that screenwriters not only mindfully planned, foresaw, and responded to AI usage, but, more importantly, through reflections on practice, they developed themselves and human-AI co-creation paradigms, such as cognition, strategies, and workflows. They also expressed various expectations for how future AI should better support their agency. Based on our findings, we conclude this paper with extensive discussion and actionable suggestions to screenwriters, tool developers, and researchers for sustainable human-AI co-creation.
This paper argues for recognizing an emerging paradigm of causal learning by wisdom of the crowd. Recent developments in government, industry, and research point to the rise of decentralized and crowd-based approaches within causal modeling, where causal knowledge distributed across many contributors can be systematically elicited and integrated with causal learning workflows. In this paradigm, causal learning becomes a distributed decision-making problem: each participant contributes partial and potentially noisy knowledge, while collective contributions help construct a global causal structure. This direction is enabled by advances in crowdsourcing platforms, expert knowledge elicitation, aggregation techniques, and large language model (LLM)-augmented information acquisition. Its promise is increasingly visible in early research and emerging real-world practices. Building on this momentum, we outline a framework for crowd-based causal learning spanning elicitation, modeling, aggregation, and optimization. We further discuss the opportunities and challenges introduced by this paradigm and call for interdisciplinary collaboration across causal learning, collective intelligence, hu
Everyone from AI executives and researchers to doomsayers, politicians, and activists is talking about Artificial General Intelligence (AGI). Yet, they often don't seem to agree on its exact definition. One common definition of AGI is an AI that can do everything a human can do, but are humans truly general? In this paper, we address what's wrong with our conception of AGI, and why, even in its most coherent formulation, it is a flawed concept to describe the future of AI. We explore whether the most widely accepted definitions are plausible, useful, and truly general. We argue that AI must embrace specialization, rather than strive for generality, and in its specialization strive for superhuman performance, and introduce Superhuman Adaptable Intelligence (SAI). SAI is defined as intelligence that can learn to exceed humans at anything important that we can do, and that can fill in the skill gaps where humans are incapable. We then lay out how SAI can help hone a discussion around AI that was blurred by an overloaded definition of AGI, and extrapolate the implications of using it as a guide for the future.
Shaping inclusive representations that embrace diversity and ensure fair participation and reflections of values is at the core of many conversation-based models. However, many existing methods rely on surface inclusion using mention of user demographics or behavioral attributes of social groups. Such methods overlook the nuanced, implicit expression of opinion embedded in conversations. Furthermore, the over-reliance on overt cues can exacerbate misalignment and reinforce harmful or stereotypical representations in model outputs. Thus, we took a step back and recognized that equitable inclusion needs to account for the implicit expression of opinion and use the stance of responses to validate the normative alignment. This study aims to evaluate how opinions are represented in NLP or computational models by introducing an alignment evaluation framework that foregrounds implicit, often overlooked conversations and evaluates the normative social views and discourse. Our approach models the stance of responses as a proxy for the underlying opinion, enabling a considerate and reflective representation of diverse social viewpoints. We evaluate the framework using both (i) positive-unlab
Multi-agent debate (MAD) has gained significant attention as a promising line of research to improve the factual accuracy and reasoning capabilities of large language models (LLMs). Despite its conceptual appeal, current MAD research suffers from critical limitations in evaluation practices, including limited benchmark coverage, weak baseline comparisons, and inconsistent setups. This paper presents a systematic evaluation of 5 representative MAD methods across 9 benchmarks using 4 foundational models. Surprisingly, our findings reveal that MAD often fail to outperform simple single-agent baselines such as Chain-of-Thought and Self-Consistency, even when consuming significantly more inference-time computation. To advance MAD research, we further explore the role of model heterogeneity and find it as a universal antidote to consistently improve current MAD frameworks. Based on our findings, we argue that the field must stop overvaluing MAD in its current form; for true advancement, we must critically rethink evaluation paradigms and actively embrace model heterogeneity as a core design principle.
Recent advanced vision-language models(VLMs) have demonstrated strong performance on passive, offline image and video understanding tasks. However, their effectiveness in embodied settings, which require online interaction and active scene understanding remains limited. In such scenarios, an agent perceives the environment from a first-person perspective, with each action dynamically shaping subsequent observations. Even state-of-the-art models such as GPT-4o, Claude 3.5 Sonnet, and Gemini 2.5 Pro struggle in open-environment interactions, exhibiting clear limitations in spatial reasoning and long-horizon planning. To address this gap, we introduce EmRACE-3K, a dataset of over 3,000 language-guided tasks situated in diverse, photorealistic environments constructed using Unreal Engine and the UnrealCV-Zoo framework. The tasks encompass a wide range of embodied challenges, including navigation, object manipulation, and multi-stage goal execution. Each task unfolds as a multi-step trajectory, pairing first-person visual observations with high-level instructions, grounded actions, and natural language rationales that express the agent's intent at every step. Using EmRACE-3K, we establi
This position paper contends that modern AI research must adopt an antifragile perspective on safety -- one in which the system's capacity to guarantee long-term AI safety such as handling rare or out-of-distribution (OOD) events expands over time. Conventional static benchmarks and single-shot robustness tests overlook the reality that environments evolve and that models, if left unchallenged, can drift into maladaptation (e.g., reward hacking, over-optimization, or atrophy of broader capabilities). We argue that an antifragile approach -- Rather than striving to rapidly reduce current uncertainties, the emphasis is on leveraging those uncertainties to better prepare for potentially greater, more unpredictable uncertainties in the future -- is pivotal for the long-term reliability of open-ended ML systems. In this position paper, we first identify key limitations of static testing, including scenario diversity, reward hacking, and over-alignment. We then explore the potential of antifragile solutions to manage rare events. Crucially, we advocate for a fundamental recalibration of the methods used to measure, benchmark, and continually improve AI safety over the long term, compleme
This study examines the role of vagueness in the design process and its strategic management for the effective human-AI interaction. While vagueness in the generation of design ideas promotes diverse interpretations and prevents fixation, excessive vagueness can lead to scattered results. Designers attempt to use image search tools or generative AIs (e.g., Dall-E) for their work but often fail to achieve satisfactory results because the level of vagueness is not properly managed in these technologies. In this work, we identified how designers coordinate vagueness in their design process and applied key components of the process to the design of CLAY, an interactive system that balances vagueness through iterative prompt refinement by integrating the strengths of text-to-image generative AI. Results from our user study with 10 fashion designers showed that CLAY effectively supported their design process, reducing design time, and expanding creative possibilities compared to their existing practice, by allowing them to both embrace and avoid vagueness as needed. Our study highlights the importance of identifying key characteristics of the target user and domain, and exploring ways to
This paper explores the implications of universities' rapid adoption of large language models (LLMs) for studying, teaching, and research by analyzing the logics underpinning their representation space. It argues that by uncritically adopting LLMs, the University surrenders its autonomy to a field of heteronomy, that of generative AI, whose norms are not democratically shaped. Unlike earlier forms of rule-based AI, which sought to exclude human judgment and interpretation, generative AI's new normative rationality is explicitly based on the automation of moral judgment, valuation, and interpretation. By integrating LLMs into pedagogical and research contexts before establishing a critical framework for their use, the University subjects itself to being governed by contingent, ever-evolving, and domain-non-specific norms that structure the model's virtual representation space and thus everything it generates.
The hybrid architecture of convolution neural networks (CNN) and Transformer has been the most popular method for medical image segmentation. However, the existing networks based on the hybrid architecture suffer from two problems. First, although the CNN branch can capture image local features by using convolution operation, the vanilla convolution is unable to achieve adaptive extraction of image features. Second, although the Transformer branch can model the global information of images, the conventional self-attention only focuses on the spatial self-attention of images and ignores the channel and cross-dimensional self-attention leading to low segmentation accuracy for medical images with complex backgrounds. To solve these problems, we propose vision Transformer embrace convolutional neural networks for medical image segmentation (TEC-Net). Our network has two advantages. First, dynamic deformable convolution (DDConv) is designed in the CNN branch, which not only overcomes the difficulty of adaptive feature extraction using fixed-size convolution kernels, but also solves the defect that different inputs share the same convolution kernel parameters, effectively improving the f
This study introduces a prescriptive annotation benchmark grounded in humanities research to ensure consistent, unbiased labeling of offensive language, particularly for casual and non-mainstream language uses. We contribute two newly annotated datasets that achieve higher inter-annotator agreement between human and language model (LLM) annotations compared to original datasets based on descriptive instructions. Our experiments show that LLMs can serve as effective alternatives when professional annotators are unavailable. Moreover, smaller models fine-tuned on multi-source LLM-annotated data outperform models trained on larger, single-source human-annotated datasets. These findings highlight the value of structured guidelines in reducing subjective variability, maintaining performance with limited data, and embracing language diversity. Content Warning: This article only analyzes offensive language for academic purposes. Discretion is advised.
Recent advancements in deep learning have brought significant improvements to plant disease recognition. However, achieving satisfactory performance often requires high-quality training datasets, which are challenging and expensive to collect. Consequently, the practical application of current deep learning-based methods in real-world scenarios is hindered by the scarcity of high-quality datasets. In this paper, we argue that embracing poor datasets is viable and aim to explicitly define the challenges associated with using these datasets. To delve into this topic, we analyze the characteristics of high-quality datasets, namely large-scale images and desired annotation, and contrast them with the \emph{limited} and \emph{imperfect} nature of poor datasets. Challenges arise when the training datasets deviate from these characteristics. To provide a comprehensive understanding, we propose a novel and informative taxonomy that categorizes these challenges. Furthermore, we offer a brief overview of existing studies and approaches that address these challenges. We believe that our paper sheds light on the importance of embracing poor datasets, enhances the understanding of the associate
People form impressions of their dialogue partners, be they other people or machines, based on cues drawn from their communicative style. Recent work has suggested that the gulf between people's expectations and the reality of CUI interaction widens when these impressions are misaligned with the actual capabilities of conversational user interfaces (CUIs). This has led some to rally against a perceived overriding concern for naturalness, calling instead for more representative, or appropriate communicative cues. Indeed, some have argued for a move away from naturalness as a goal for CUI design and communication. We contend that naturalness need not be abandoned, if we instead aim for ecologically grounded design. We also suggest a way this might be achieved and call on CUI designers to embrace incompetence! By letting CUIs express uncertainty and embarrassment through ecologically valid and appropriate cues that are ubiquitous in human communication - CUI designers can achieve more appropriate communication without turning away from naturalness entirely.
Probabilistic time series forecasting (PTSF) aims to model the full predictive distribution of future observations, enabling both accurate forecasting and principled uncertainty quantification. A central requirement of PTSF is to embrace heteroscedasticity, as real-world time series exhibit time-varying conditional variances induced by nonstationary dynamics, regime changes, and evolving external conditions. However, most existing non-autoregressive generative approaches to PTSF, such as TimeVAE and $K^2$VAE, rely on MSE-based training objectives that implicitly impose a homoscedastic assumption, thereby fundamentally limiting their ability to model temporal heteroscedasticity. To address this limitation, we propose the Location-Scale Gaussian VAE (LSG-VAE), a simple but effective framework that explicitly parameterizes both the predictive mean and time-dependent variance through a location-scale likelihood formulation. This design enables LSG-VAE to faithfully capture heteroscedastic aleatoric uncertainty and introduces an adaptive attenuation mechanism that automatically down-weights highly volatile observations during training, leading to improved robustness in trend prediction.
This article is a reflection on the themes of the Faraday Discussion meeting on "Biological and bio-inspired optics" held from 20 to 22 July 2020. It is a personal perspective on the nature of this field as a broad and interdisciplinary field that has led to a sound understanding of the material properties of biological nanostructured and optical materials. The article describes how the nature of the field and the themes of the conference are reflected in particular in work on the 3D bicontinuous biophotonic nanostructures known as single gyroids and in bicontinuous structures more broadly. Such single gyroid materials are found for example in the butterfly Thecla opisena, where the questions of biophotonic response, of bio-inspired optics, of the relationship between structure and function, and of the relationship between natural and synthetic realisations are closely interlinked. This multitude of facets of research on single gyroid structures reflects the beauty of the broader field of biophotonics, namely as a field that lives through embracing the serendipitous discovery of the biophotonic marvels that nature offers to us as seeds for in-depth analysis and understanding. The m
We present FLOP (Fast Learning of Order and Parents), a score-based causal discovery algorithm for linear models. It pairs fast parent selection with iterative Cholesky-based score updates, cutting run-times over prior algorithms. This makes it feasible to fully embrace discrete search, enabling iterated local search with principled order initialization to find graphs with scores at or close to the global optimum. The resulting structures are highly accurate across benchmarks, with near-perfect recovery in standard settings. This performance calls for revisiting discrete search over graphs as a reasonable approach to causal discovery.
Generative AI (GAI) technologies are disrupting professional writing, challenging traditional practices. Recent studies explore GAI adoption experiences of creative practitioners, but we know little about how these experiences evolve into established practices and how GAI resistance alters these practices. To address this gap, we conducted 25 semi-structured interviews with writing professionals who adopted and/or resisted GAI. Using the theoretical lens of Job Crafting, we identify four strategies professionals employ to reshape their roles. Writing professionals employed GAI resisting strategies to maximize human potential, reinforce professional identity, carve out a professional niche, and preserve credibility within their networks. In contrast, GAI-enabled strategies allowed writers who embraced GAI to enhance desirable workflows, minimize mundane tasks, and engage in new AI-managerial labor. These strategies amplified their collaborations with GAI while reducing their reliance on other people. We conclude by discussing implications of GAI practices on writers' identity and practices as well as crafting theory.
This position paper argues that the theoretical inconsistency often observed among Responsible AI (RAI) metrics, such as differing fairness definitions or tradeoffs between accuracy and privacy, should be embraced as a valuable feature rather than a flaw to be eliminated. We contend that navigating these inconsistencies, by treating metrics as divergent objectives, yields three key benefits: (1) Normative Pluralism: Maintaining a full suite of potentially contradictory metrics ensures that the diverse moral stances and stakeholder values inherent in RAI are adequately represented. (2) Epistemological Completeness: The use of multiple, sometimes conflicting, metrics allows for a more comprehensive capture of multifaceted ethical concepts, thereby preserving greater informational fidelity about these concepts than any single, simplified definition. (3) Implicit Regularization: Jointly optimizing for theoretically conflicting objectives discourages overfitting to one specific metric, steering models towards solutions with enhanced generalization and robustness under real-world complexities. In contrast, efforts to enforce theoretical consistency by simplifying or pruning metrics risk
Evaluations of generative AI models often collapse nuanced behaviour into a single number computed for a single decoding configuration. Such point estimates obscure tail risks, demographic disparities, and the existence of multiple near-optimal operating points. We propose a unified framework that embraces multiplicity by modelling the distribution of harmful behaviour across the entire space of decoding knobs and prompts, quantifying risk through tail-focused metrics, and integrating stakeholder preferences. Our technical contributions are threefold: (i) we formalise decoding Rashomon sets, regions of knob space whose risk is near-optimal under given criteria and measure their size and disagreement; (ii) we develop a dependent Dirichlet process (DDP) mixture with stakeholder-conditioned stick-breaking weights to learn multi-modal harm surfaces; and (iii) we introduce an active sampling pipeline that uses Bayesian deep learning surrogates to explore knob space efficiently. Our approach bridges multiplicity theory, Bayesian nonparametrics, and stakeholder-aligned sensitivity analysis, paving the way for trustworthy deployment of generative models.
We explore the challenges and opportunities of transitioning towards sustainable food systems through the lens of democratic food governance fostering inclusive and systemic transformation. Drawing on concepts of wicked problems and systems thinking, we propose a theory of change represented as a 'turtle model' that embraces the diversity of citizens' values and knowledge to highlight multiple avenues of transformation. As quadruple helix innovation and governance hubs, cities can be hotspots for food system transformations. We illustrate this for Dublin, Ireland, where local citizens' value-based food identities were galvanized to activate ecological awareness and promote sustainable seafood consumption. Within this democratic food governance framework, approaches such as open science, transdisciplinarity, and citizen engagement are fit-for-purpose to engage diverse food actors from government, industry, academia, and civil society in shared dialogue and action to transform food systems.