Whilst a majority of affective computing research focuses on inferring emotions, examining mood or understanding the \textit{mood-emotion interplay} has received significantly less attention. Building on prior work, we (a) deduce and incorporate emotion-change ($Δ$) information for inferring mood, without resorting to annotated labels, and (b) attempt mood prediction for long duration video clips, in alignment with the characterisation of mood. We generate the emotion-change ($Δ$) labels via metric learning from a pre-trained Siamese Network, and use these in addition to mood labels for mood classification. Experiments evaluating \textit{unimodal} (training only using mood labels) vs \textit{multimodal} (training using mood plus $Δ$ labels) models show that mood prediction benefits from the incorporation of emotion-change information, emphasising the importance of modelling the mood-emotion interplay for effective mood inference.
Mood instability is a key behavioral indicator of mental health, yet traditional assessments rely on infrequent and retrospective reports that fail to capture its continuous nature. Smartphone-based mobile sensing enables passive, in-the-wild mood inference from everyday behaviors; however, deploying such systems at scale remains challenging due to privacy constraints, uneven sensing availability, and substantial variability in behavioral patterns. In this work, we study mood inference using smartphone sensing data in a cross-country federated learning setting, where each country participates as an independent client while retaining local data. We introduce FedFAP, a feature-aware personalized federated framework designed to accommodate heterogeneous sensing modalities across regions. Evaluations across geographically and culturally diverse populations show that FedFAP achieves an AUROC of 0.744, outperforming both centralized approaches and existing personalized federated baselines. Beyond inference, our results offer design insights for mood-aware systems, demonstrating how population-aware personalization and privacy-preserving learning can enable scalable and mood-aware mobile
Group mood plays a crucial role in shaping workspace experiences, influencing group dynamics, team performance, and creativity. The perceived group mood depends on many, often subconscious, aspects such as individual emotional states or group life, which make it challenging to maintain a positive atmosphere. Intelligent technology could support mood regulation in physical office environments, for example, as adaptive ambient lighting for mood regulation. However, little is known about the relationship between the physical workspace and group mood dynamics. To address this knowledge gap, we conducted a qualitative user study (N=8 workgroups and overall 26 participants) to explore how the physical workspace shapes group mood experiences and investigate employees' perspectives on intelligent mood-aware technologies. Our findings reveal key factors influencing group mood, and participants' expectations for supportive technology to preserve privacy and autonomy. Our work highlights the potential of adaptive and responsive workspaces while also emphasizing the need for human-centered, technology-driven interventions that benefit group well-being.
The interplay between mood and eating episodes has been extensively researched, revealing a connection between the two. Previous studies have relied on questionnaires and mobile phone self-reports to investigate the relationship between mood and eating. However, current literature exhibits several limitations: a lack of investigation into the generalization of mood inference models trained with data from various everyday life situations to specific contexts like eating; an absence of studies using sensor data to explore the intersection of mood and eating; and inadequate examination of model personalization techniques within limited label settings, a common challenge in mood inference (i.e., far fewer negative mood reports compared to positive or neutral reports). In this study, we sought to examine everyday eating behavior and mood using two datasets of college students in Mexico (N_mex = 84, 1843 mood-while-eating reports) and eight countries (N_mul = 678, 24K mood-while-eating reports), which contain both passive smartphone sensing and self-report data. Our results indicate that generic mood inference models experience a decline in performance in specific contexts, such as durin
Emotion is vital to information and message processing, playing a key role in attitude formation. Consequently, creating a mood that evokes an emotional response is essential to any compelling piece of outreach communication. Many nonprofits and charities, despite having established messages, face challenges in creating advocacy campaign videos for social media. It requires significant creative and cognitive efforts to ensure that videos achieve the desired mood across multiple dimensions: script, visuals, and audio. We introduce MoodSmith, an AI-powered system that helps users explore mood possibilities for their message and create advocacy campaigns that are mood-consistent across dimensions. To achieve this, MoodSmith uses emotive language and plotlines for scripts, artistic style and color palette for visuals, and positivity and energy for audio. Our studies show that MoodSmith can effectively achieve a variety of moods, and the produced videos are consistent across media dimensions.
Recommendation systems have become essential in modern music streaming platforms, due to the vast amount of content available. A common approach in recommendation systems is collaborative filtering, which suggests content to users based on the preferences of others with similar patterns. However, this method performs poorly in domains where interactions are sparse, such as music. Content-based filtering is an alternative approach that examines the qualities of the items themselves. Prior work has explored a range of content-filtering techniques for music, including genre classification, instrument detection, and lyrics analysis. In the literature review component of this work, we examine these methods in detail. Music emotion recognition is a type of content-based filtering that is less explored but has significant potential. Since a user's emotional state influences their musical choices, incorporating user mood into recommendation systems is an alternative way to personalize the listening experience. In this study, we explore a mood-assisted recommendation system that suggests songs based on the desired mood using the energy-valence spectrum. Single-blind experiments are conducte
Expressing complex concepts is easy when they can be labeled or quantified, but many ideas are hard to define yet instantly recognizable. We propose a Mood Board, where users convey abstract concepts with examples that hint at the intended direction of attribute changes. We compute an underlying Mood Space that 1) factors out irrelevant features and 2) finds the connections between images, thus bringing relevant concepts closer. We invent a fibration computation to compress/decompress pre-trained features into/from a compact space, 50-100x smaller. The main innovation is learning to mimic the pairwise affinity relationship of the image tokens across exemplars. To focus on the coarse-to-fine hierarchical structures in the Mood Space, we compute the top eigenvector structure from the affinity matrix and define a loss in the eigenvector space. The resulting Mood Space is locally linear and compact, allowing image-level operations, such as object averaging, visual analogy, and pose transfer, to be performed as a simple vector operation in Mood Space. Our learning is efficient in computation without any fine-tuning, needs only a few (2-20) exemplars, and takes less than a minute to lear
Music representations are the backbone of modern recommendation systems, powering playlist generation, similarity search, and personalized discovery. Yet most embeddings offer little control for adjusting a single musical attribute, e.g., changing only the mood of a track while preserving its genre or instrumentation. In this work, we address the problem of controllable music retrieval through embedding-based transformation, where the objective is to retrieve songs that remain similar to a seed track but are modified along one chosen dimension. We propose a novel framework for mood-guided music embedding transformation, which learns a mapping from a seed audio embedding to a target embedding guided by mood labels, while preserving other musical attributes. Because mood cannot be directly altered in the seed audio, we introduce a sampling mechanism that retrieves proxy targets to balance diversity with similarity to the seed. We train a lightweight translation model using this sampling strategy and introduce a novel joint objective that encourages transformation and information preservation. Extensive experiments on two datasets show strong mood transformation performance while reta
We present an intelligent wearable system to monitor and predict mood states of elderly people during their daily life activities. Our system is composed of a wristband to record different physiological activities together with a mobile app for ecological momentary assessment (EMA). Machine learning is used to train a classifier to automatically predict different mood states based on the smart band only. Our approach shows promising results on mood accuracy and provides results comparable with the state of the art in the specific detection of happiness and activeness.
This study centers on overcoming the challenge of selecting the best annotators for each query in Active Learning (AL), with the objective of minimizing misclassifications. AL recognizes the challenges related to cost and time when acquiring labeled data, and decreases the number of labeled data needed. Nevertheless, there is still the necessity to reduce annotation errors, aiming to be as efficient as possible, to achieve the expected accuracy faster. Most strategies for query-annotator pairs do not consider internal factors that affect productivity, such as mood, attention, motivation, and fatigue levels. This work addresses this gap in the existing literature, by not only considering how the internal factors influence annotators (mood and fatigue levels) but also presenting a new query-annotator pair strategy, using a Knowledge-Based Recommendation System (RS). The RS ranks the available annotators, allowing to choose one or more to label the queried instance using their past accuracy values, and their mood and fatigue levels, as well as information about the instance queried. This work bases itself on existing literature on mood and fatigue influence on human performance, simul
MoodPupilar introduces a novel method for mood evaluation using pupillary response captured by a smartphone's front-facing camera during daily use. Over a four-week period, data was gathered from 25 participants to develop models capable of predicting daily mood averages. Utilizing the GLOBEM behavior modeling platform, we benchmarked the utility of pupillary response as a predictor for mood. Our proposed model demonstrated a Matthew's Correlation Coefficient (MCC) score of 0.15 for Valence and 0.12 for Arousal, which is on par with or exceeds those achieved by existing behavioral modeling algorithms supported by GLOBEM. This capability to accurately predict mood trends underscores the effectiveness of pupillary response data in providing crucial insights for timely mental health interventions and resource allocation. The outcomes are encouraging, demonstrating the potential of real-time and predictive mood analysis to support mental health interventions.
Background: Large Language Models show promise in psychiatry but are English-centric. Their ability to understand mood states in other languages is unclear, as different languages have their own idioms of distress. Aim: To quantify the ability of language models to faithfully represent phrases (idioms of distress) of four distinct mood states (depression, euthymia, euphoric mania, dysphoric mania) expressed in Indian languages. Methods: We collected 247 unique phrases for the four mood states across 11 Indic languages. We tested seven experimental conditions, comparing k-means clustering performance on: (a) direct embeddings of native and Romanised scripts (using multilingual and Indic-specific models) and (b) embeddings of phrases translated to English and Chinese. Performance was measured using a composite score based on Adjusted Rand Index, Normalised Mutual Information, Homogeneity and Completeness. Results: Direct embedding of Indic languages failed to cluster mood states (Composite Score = 0.002). All translation-based approaches showed significant improvement. High performance was achieved using Gemini-translated English (Composite=0.60) and human-translated English (Composi
Many sexually mature females suffer from premenstrual syndrome (PMS), but effective coping methods for PMS are limited due to the complexity of symptoms and unclear pathogenesis. Awareness has shown promise in alleviating PMS symptoms but faces challenges in long-term recording and consistency. Our research goal is to establish a convenient and simple method to make individual female aware of their own psychological, and autonomic conditions. In previous research, we demonstrated that participants could be classified into non-PMS and PMS groups based on mood scores obtained during the follicular phase. However, the properties of neurophysiological activity in the participants classified by mood scores have not been elucidated. This study aimed to classify participants based on their scores on a mood questionnaire during the follicular phase and to evaluate their autonomic nervous system (ANS) activity using a simple device that measures pulse waves from the earlobe. Participants were grouped into Cluster I (high positive mood) and Cluster II (low mood). Cluster II participants showed reduced parasympathetic nervous system activity from the follicular to the menstrual phase, indicat
While emotion and mood interchangeably used, they differ in terms of duration, intensity and attributes. Even as multiple psychology studies examine the mood-emotion relationship, mood prediction has barely been studied. Recent machine learning advances such as the attention mechanism to focus on salient parts of the input data, have only been applied to infer emotions rather than mood. We perform mood prediction by incorporating both mood and emotion change information. We additionally explore spatial and temporal attention, and parallel/sequential arrangements of the spatial and temporal attention modules to improve mood prediction performance. To examine generalizability of the proposed method, we evaluate models trained on the AFEW dataset with EMMA. Experiments reveal that (a) emotion change information is inherently beneficial to mood prediction, and (b) prediction performance improves with the integration of sequential and parallel spatial-temporal attention modules.
MoodCam introduces a novel method for assessing mood by utilizing facial affect analysis through the front-facing camera of smartphones during everyday activities. We collected facial behavior primitives during 15,995 real-world phone interactions involving 25 participants over four weeks. We developed three models for timely intervention: momentary, daily average, and next day average. Notably, our models exhibit AUC scores ranging from 0.58 to 0.64 for Valence and 0.60 to 0.63 for Arousal. These scores are comparable to or better than those from some previous studies. This predictive ability suggests that MoodCam can effectively forecast mood trends, providing valuable insights for timely interventions and resource planning in mental health management. The results are promising as they demonstrate the viability of using real-time and predictive mood analysis to aid in mental health interventions and potentially offer preemptive support during critical periods identified through mood trend shifts.
Psychological studies observe that emotions are rarely expressed in isolation and are typically influenced by the surrounding context. While recent studies effectively harness uni- and multimodal cues for emotion inference, hardly any study has considered the effect of long-term affect, or \emph{mood}, on short-term \emph{emotion} inference. This study (a) proposes time-continuous \emph{valence} prediction from videos, fusing multimodal cues including \emph{mood} and \emph{emotion-change} ($Δ$) labels, (b) serially integrates spatial and channel attention for improved inference, and (c) demonstrates algorithmic generalisability with experiments on the \emph{EMMA} and \emph{AffWild2} datasets. Empirical results affirm that utilising mood labels is highly beneficial for dynamic valence prediction. Comparing \emph{unimodal} (training only with mood labels) vs \emph{multimodal} (training with mood and $Δ$ labels) results, inference performance improves for the latter, conveying that both long and short-term contextual cues are critical for time-continuous emotion inference.
Although the terms mood and emotion are closely related and often used interchangeably, they are distinguished based on their duration, intensity and attribution. To date, hardly any computational models have (a) examined mood recognition, and (b) modelled the interplay between mood and emotional state in their analysis. In this paper, as a first step towards mood prediction, we propose a framework that utilises both dominant emotion (or mood) labels, and emotional change labels on the AFEW-VA database. Experiments evaluating unimodal (trained only using mood labels) and multimodal (trained with both mood and emotion change labels) convolutional neural networks confirm that incorporating emotional change information in the network training process can significantly improve the mood prediction performance, thus highlighting the importance of modelling emotion and mood simultaneously for improved performance in affective state recognition.
Music can evoke various emotions, and with the advancement of technology, it has become more accessible to people. Bangla music, which portrays different human emotions, lacks sufficient research. The authors of this article aim to analyze Bangla songs and classify their moods based on the lyrics. To achieve this, this research has compiled a dataset of 4000 Bangla song lyrics, genres, and used Natural Language Processing and the Bert Algorithm to analyze the data. Among the 4000 songs, 1513 songs are represented for the sad mood, 1362 for the romantic mood, 886 for happiness, and the rest 239 are classified as relaxation. By embedding the lyrics of the songs, the authors have classified the songs into four moods: Happy, Sad, Romantic, and Relaxed. This research is crucial as it enables a multi-class classification of songs' moods, making the music more relatable to people's emotions. The article presents the automated result of the four moods accurately derived from the song lyrics.
In this work, we study the association between song lyrics and mood through a data-driven analysis. Our data set consists of nearly one million songs, with song-mood associations derived from user playlists on the Spotify streaming platform. We take advantage of state-of-the-art natural language processing models based on transformers to learn the association between the lyrics and moods. We find that a pretrained transformer-based language model in a zero-shot setting -- i.e., out of the box with no further training on our data -- is powerful for capturing song-mood associations. Moreover, we illustrate that training on song-mood associations results in a highly accurate model that predicts these associations for unseen songs. Furthermore, by comparing the prediction of a model using lyrics with one using acoustic features, we observe that the relative importance of lyrics for mood prediction in comparison with acoustics depends on the specific mood. Finally, we verify if the models are capturing the same information about lyrics and acoustics as humans through an annotation task where we obtain human judgments of mood-song relevance based on lyrics and acoustics.
Behavioral economics tells us that emotions can profoundly affect individual behavior and decision-making. Does this also apply to societies at large, i.e., can societies experience mood states that affect their collective decision making? By extension is the public mood correlated or even predictive of economic indicators? Here we investigate whether measurements of collective mood states derived from large-scale Twitter feeds are correlated to the value of the Dow Jones Industrial Average (DJIA) over time. We analyze the text content of daily Twitter feeds by two mood tracking tools, namely OpinionFinder that measures positive vs. negative mood and Google-Profile of Mood States (GPOMS) that measures mood in terms of 6 dimensions (Calm, Alert, Sure, Vital, Kind, and Happy). We cross-validate the resulting mood time series by comparing their ability to detect the public's response to the presidential election and Thanksgiving day in 2008. A Granger causality analysis and a Self-Organizing Fuzzy Neural Network are then used to investigate the hypothesis that public mood states, as measured by the OpinionFinder and GPOMS mood time series, are predictive of changes in DJIA closing val