Affect modeling is viewed, traditionally, as the process of mapping measurable affect manifestations from multiple modalities of user input to affect labels. That mapping is usually inferred through end-to-end (manifestation-to-affect) machine learning processes. What if, instead, one trains general, subject-invariant representations that consider affect information and then uses such representations to model affect? In this paper we assume that affect labels form an integral part, and not just the training signal, of an affect representation and we explore how the recent paradigm of contrastive learning can be employed to discover general high-level affect-infused representations for the purpose of modeling affect. We introduce three different supervised contrastive learning approaches for training representations that consider affect information. In this initial study we test the proposed methods for arousal prediction in the RECOLA dataset based on user information from multiple modalities. Results demonstrate the representation capacity of contrastive learning and its efficiency in boosting the accuracy of affect models. Beyond their evidenced higher performance compared to end
This paper surveys the current state of the art in affective computing principles, methods and tools as applied to games. We review this emerging field, namely affective game computing, through the lens of the four core phases of the affective loop: game affect elicitation, game affect sensing, game affect detection and game affect adaptation. In addition, we provide a taxonomy of terms, methods and approaches used across the four phases of the affective game loop and situate the field within this taxonomy. We continue with a comprehensive review of available affect data collection methods with regards to gaming interfaces, sensors, annotation protocols, and available corpora. The paper concludes with a discussion on the current limitations of affective game computing and our vision for the most promising future research directions in the field.
Study of affect in speech requires suitable data, as emotional expression and perception vary across languages. Until now, no corpus has existed for natural expression of affect in spontaneous Finnish, existing data being acted or from a very specific communicative setting. This paper presents the first such corpus, created by annotating 12,000 utterances for emotional arousal and valence, sampled from three large-scale Finnish speech corpora. To ensure diverse affective expression, sample selection was conducted with an affect mining approach combining acoustic, cross-linguistic speech emotion, and text sentiment features. We compare this method to random sampling in terms of annotation diversity, and conduct post-hoc analyses to identify sampling choices that would have maximized the diversity. As an outcome, the work introduces a spontaneous Finnish affective speech corpus and informs sampling strategies for affective speech corpus creation in other languages or domains.
How can we reliably transfer affect models trained in controlled laboratory conditions (in-vitro) to uncontrolled real-world settings (in-vivo)? The information gap between in-vitro and in-vivo applications defines a core challenge of affective computing. This gap is caused by limitations related to affect sensing including intrusiveness, hardware malfunctions and availability of sensors. As a response to these limitations, we introduce the concept of privileged information for operating affect models in real-world scenarios (in the wild). Privileged information enables affect models to be trained across multiple modalities available in a lab, and ignore, without significant performance drops, those modalities that are not available when they operate in the wild. Our approach is tested in two multimodal affect databases one of which is designed for testing models of affect in the wild. By training our affect models using all modalities and then using solely raw footage frames for testing the models, we reach the performance of models that fuse all available modalities for both training and testing. The results are robust across both classification and regression affect modeling tas
Studying psychiatric illness has often been limited by difficulties in connecting symptoms and behavior to neurobiology. Computational psychiatry approaches promise to bridge this gap by providing formal accounts of the latent information processing changes that underlie the development and maintenance of psychiatric phenomena. Models based on these theories generate individual-level parameter estimates which can then be tested for relationships to neurobiology. In this review, we explore computational modelling approaches to one key aspect of health and illness: affect. We discuss strengths and limitations of key approaches to modelling affect, with a focus on reinforcement learning, active inference, the hierarchical gaussian filter, and drift-diffusion models. We find that, in this literature, affect is an important source of modulation in decision making, and has a bidirectional influence on how individuals infer both internal and external states. Highlighting the potential role of affect in information processing changes underlying symptom development, we extend an existing model of psychosis, where affective changes are influenced by increasing cortical noise and consequent i
Affective computing strives to unveil the unknown relationship between affect elicitation, manifestation of affect and affect annotations. The ground truth of affect, however, is predominately attributed to the affect labels which inadvertently include biases inherent to the subjective nature of emotion and its labeling. The response to such limitations is usually augmenting the dataset with more annotations per data point; however, this is not possible when we are interested in self-reports via first-person annotation. Moreover, outlier detection methods based on inter-annotator agreement only consider the annotations themselves and ignore the context and the corresponding affect manifestation. This paper reframes the ways one may obtain a reliable ground truth of affect by transferring aspects of causation theory to affective computing. In particular, we assume that the ground truth of affect can be found in the causal relationships between elicitation, manifestation and annotation that remain \emph{invariant} across tasks and participants. To test our assumption we employ causation inspired methods for detecting outliers in affective corpora and building affect models that are r
Second-person neuroscience holds social cognition as embodied meaning co-regulation through reciprocal interaction, modeled here as coupled active inference with affect emerging as inference over identity-relevant surprise. Each agent maintains a self-model that tracks violations in its predictive coherence while recursively modeling the other. Valence is computed from self-model prediction error, weighted by self-relevance, and modulated by prior affective states and by what we term temporal aiming, which captures affective appraisal over time. This accommodates shifts in the self-other boundary, allowing affect to emerge at individual and dyadic levels. We propose a novel method termed geometric hyperscanning, based on the Forman-Ricci curvature, to empirically operationalize these processes: it tracks topological reconfigurations in inter-brain networks, with its entro-py serving as a proxy for affective phase transitions such as rupture, co-regulation, and re-attunement.
Mechanistic interpretability research on emotion in large language models -- linear probing, activation patching, sparse autoencoder (SAE) feature analysis, causal ablation, steering vector extraction -- depends on stimuli that contain the words for the emotions they test. When a probe fires on "I am furious", it is unclear whether the model has detected anger or detected the word "furious". The two readings have very different consequences for every downstream claim about emotion circuits, features, and interventions. We release AIPsy-Affect, a 480-item clinical stimulus battery that removes the confound at the stimulus level: 192 keyword-free vignettes evoking each of Plutchik's eight primary emotions through narrative situation alone, 192 matched neutral controls that share characters, setting, length, and surface structure with the affect surgically removed, plus moderate-intensity and discriminant-validity splits. The matched-pair structure supports linear probing, activation patching, SAE feature analysis, causal ablation, and steering vector extraction under a strong methodological guarantee: any internal representation that distinguishes a clinical item from its matched neu
Affective computing is an emerging interdisciplinary field where computational systems are developed to analyze, recognize, and influence the affective states of a human. It can generally be divided into two subproblems: affective recognition and affective generation. Affective recognition has been extensively reviewed multiple times in the past decade. Affective generation, however, lacks a critical review. Therefore, we propose to provide a comprehensive review of affective generation models, as models are most commonly leveraged to affect others' emotional states. Affective computing has gained momentum in various fields and applications, thanks to the leap of machine learning, especially deep learning since 2015. With critical models introduced, this work is believed to benefit future research on affective generation. We conclude this work with a brief discussion on existing challenges.
Due to its expressivity, natural language is paramount for explicit and implicit affective state communication among humans. The same linguistic inquiry (e.g., How are you?) might induce responses with different affects depending on the affective state of the conversational partner(s) and the context of the conversation. Yet, most dialog systems do not consider affect as constitutive aspect of response generation. In this paper, we introduce AffectON, an approach for generating affective responses during inference. For generating language in a targeted affect, our approach leverages a probabilistic language model and an affective space. AffectON is language model agnostic, since it can work with probabilities generated by any language model (e.g., sequence-to-sequence models, neural language models, n-grams). Hence, it can be employed for both affective dialog and affective language generation. We experimented with affective dialog generation and evaluated the generated text objectively and subjectively. For the subjective part of the evaluation, we designed a custom user interface for rating and provided recommendations for the design of such interfaces. The results, both subjecti
Large language models appear to develop internal representations of emotion -- "emotion circuits," "emotion neurons," and structured emotional manifolds have been reported across multiple model families. But every study making these claims uses stimuli signalled by explicit emotion keywords, leaving a fundamental question unanswered: do these circuits detect genuine emotional meaning, or do they detect the word "devastated"? We present the first clinical validity test of emotion circuit claims using mechanistic interpretability methods grounded in clinical psychology -- clinical vignettes that evoke emotions through situational and behavioural cues alone, emotion keywords removed. Across six models (Llama-3.2-1B, Llama-3-8B, Gemma-2-9B; base and instruct variants), we apply four convergent mechanistic interpretability methods -- linear probing, causal activation patching, knockout experiments, and representational geometry -- and discover two dissociable emotion processing mechanisms. Affect reception -- detecting emotionally significant content -- operates with near-perfect accuracy (AUROC 1.000), consistent with early-layer saturation, and replicates across all six models. Emotio
In our multicultural world, affect-aware AI systems that support humans need the ability to perceive affect across variations in emotion expression patterns across cultures. These systems must perform well in cultural contexts without annotated affect datasets available for training models. A standard assumption in affective computing is that affect recognition models trained and used within the same culture (intracultural) will perform better than models trained on one culture and used on different cultures (intercultural). We test this assumption and present the first systematic study of intercultural affect recognition models using videos of real-world dyadic interactions from six cultures. We develop an attention-based feature selection approach under temporal causal discovery to identify behavioral cues that can be leveraged in intercultural affect recognition models. Across all six cultures, our findings demonstrate that intercultural affect recognition models were as effective or more effective than intracultural models. We identify and contribute useful behavioral features for intercultural affect recognition; facial features from the visual modality were more useful than t
Understanding the nuances of speech emotion dataset curation and labeling is essential for assessing speech emotion recognition (SER) model potential in real-world applications. Most training and evaluation datasets contain acted or pseudo-acted speech (e.g., podcast speech) in which emotion expressions may be exaggerated or otherwise intentionally modified. Furthermore, datasets labeled based on crowd perception often lack transparency regarding the guidelines given to annotators. These factors make it difficult to understand model performance and pinpoint necessary areas for improvement. To address this gap, we identified the Switchboard corpus as a promising source of naturalistic conversational speech, and we trained a crowd to label the dataset for categorical emotions (anger, contempt, disgust, fear, sadness, surprise, happiness, tenderness, calmness, and neutral) and dimensional attributes (activation, valence, and dominance). We refer to this label set as Switchboard-Affect (SWB-Affect). In this work, we present our approach in detail, including the definitions provided to annotators and an analysis of the lexical and paralinguistic cues that may have played a role in their
Game environments offer a unique opportunity for training virtual agents due to their interactive nature, which provides diverse play traces and affect labels. Despite their potential, no reinforcement learning framework incorporates human affect models as part of their observation space or reward mechanism. To address this, we present the \emph{Affectively Framework}, a set of Open-AI Gym environments that integrate affect as part of the observation space. This paper introduces the framework and its three game environments and provides baseline experiments to validate its effectiveness and potential.
Some previous studies based on IceCube neutrinos had found intriguing preliminary evidence that some of them might be GRB neutrinos with travel times affected by quantum properties of spacetime delaying them proportionally to their energy, an effect often labeled as "quantum-spacetime-induced in-vacuo dispersion". Those previous studies looked for candidate GRB neutrinos in a fixed (neutrino-energy-independent) time window after the GRB onset and relied rather crucially on crude estimates of the redshift of GRBs whose redshift has not been measured. We here introduce a complementary approach to the search of quantum-spacetime-affected GRB neutrinos which restricts the analysis to GRBs of sharply known redshift, and, in a way that we argue is synergistic with having sharp information on redshift, adopts a neutrino-energy-dependent time window. We find that knowing the redshift of the GRBs strengthens the analysis enough to compensate for the fact that of course the restriction to GRBs of known redshift reduces the number of candidate GRB neutrinos. And rather remarkably our estimate of the magnitude of the in-vacuo-dispersion effects is fully consistent with what had been found usin
Affective video indexing is the area of research that develops techniques to automatically generate descriptions of video content that encode the emotional reactions which the video content evokes in viewers. This paper provides a set of corpus development guidelines based on state-of-the-art practice intended to support researchers in this field. Affective descriptions can be used for video search and browsing systems offering users affective perspectives. The paper is motivated by the observation that affective video indexing has yet to fully profit from the standard corpora (data sets) that have benefited conventional forms of video indexing. Affective video indexing faces unique challenges, since viewer-reported affective reactions are difficult to assess. Moreover affect assessment efforts must be carefully designed in order to both cover the types of affective responses that video content evokes in viewers and also capture the stable and consistent aspects of these responses. We first present background information on affect and multimedia and related work on affective multimedia indexing, including existing corpora. Three dimensions emerge as critical for affective video cor
A cluster of research in Affective Computing suggests that it is possible to infer some characteristics of users' affective states by analyzing their electrophysiological activity in real-time. However, it is not clear how to use the information extracted from electrophysiological signals to create visual representations of the affective states of Virtual Reality (VR) users. Visualization of users' affective states in VR can lead to biofeedback therapies for mental health care. Understanding how to visualize affective states in VR requires an interdisciplinary approach that integrates psychology, electrophysiology, and audio-visual design. Therefore, this review aims to integrate previous studies from these fields to understand how to develop virtual environments that can automatically create visual representations of users' affective states. The manuscript addresses this challenge in four sections: First, theories related to emotion and affect are summarized. Second, evidence suggesting that visual and sound cues tend to be associated with affective states are discussed. Third, some of the available methods for assessing affect are described. The fourth and final section contains
Continuous affect prediction involves the discrete time-continuous regression of affect dimensions. Dimensions to be predicted often include arousal and valence. Continuous affect prediction researchers are now embracing multimodal model input. This provides motivation for researchers to investigate previously unexplored affective cues. Speech-based cues have traditionally received the most attention for affect prediction, however, non-verbal inputs have significant potential to increase the performance of affective computing systems and in addition, allow affect modelling in the absence of speech. However, non-verbal inputs that have received little attention for continuous affect prediction include eye and head-based cues. The eyes are involved in emotion displays and perception while head-based cues have been shown to contribute to emotion conveyance and perception. Additionally, these cues can be estimated non-invasively from video, using modern computer vision tools. This work exploits this gap by comprehensively investigating head and eye-based features and their combination with speech for continuous affect prediction. Hand-crafted, automatically generated and CNN-learned fe
This paper introduces a paradigm shift by viewing the task of affect modeling as a reinforcement learning (RL) process. According to the proposed paradigm, RL agents learn a policy (i.e. affective interaction) by attempting to maximize a set of rewards (i.e. behavioral and affective patterns) via their experience with their environment (i.e. context). Our hypothesis is that RL is an effective paradigm for interweaving affect elicitation and manifestation with behavioral and affective demonstrations. Importantly, our second hypothesis-building on Damasio's somatic marker hypothesis-is that emotion can be the facilitator of decision-making. We test our hypotheses in a racing game by training Go-Blend agents to model human demonstrations of arousal and behavior; Go-Blend is a modified version of the Go-Explore algorithm which has recently showcased supreme performance in hard exploration tasks. We first vary the arousal-based reward function and observe agents that can effectively display a palette of affect and behavioral patterns according to the specified reward. Then we use arousal-based state selection mechanisms in order to bias the strategies that Go-Blend explores. Our finding
Machine learning is frequently used in affective computing, but presents challenges due the opacity of state-of-the-art machine learning methods. Because of the impact affective machine learning systems may have on an individual's life, it is important that models be made transparent to detect and mitigate biased decision making. In this regard, affective machine learning could benefit from the recent advancements in explainable artificial intelligence (XAI) research. We perform a structured literature review to examine the use of interpretability in the context of affective machine learning. We focus on studies using audio, visual, or audiovisual data for model training and identified 29 research articles. Our findings show an emergence of the use of interpretability methods in the last five years. However, their use is currently limited regarding the range of methods used, the depth of evaluations, and the consideration of use-cases. We outline the main gaps in the research and provide recommendations for researchers that aim to implement interpretable methods for affective machine learning.