搜索 — ResearchTracker

We often verbally express emotions in a multifaceted manner, they may vary in their intensities and may be expressed not just as a single but as a mixture of emotions. This wide spectrum of emotions is well-studied in the structural model of emotions, which represents variety of emotions as derivative products of primary emotions with varying degrees of intensity. In this paper, we propose an emotional text-to-speech design to simulate a wider spectrum of emotions grounded on the structural model. Our proposed design, Daisy-TTS, incorporates a prosody encoder to learn emotionally-separable prosody embedding as a proxy for emotion. This emotion representation allows the model to simulate: (1) Primary emotions, as learned from the training samples, (2) Secondary emotions, as a mixture of primary emotions, (3) Intensity-level, by scaling the emotion embedding, and (4) Emotions polarity, by negating the emotion embedding. Through a series of perceptual evaluations, Daisy-TTS demonstrated overall higher emotional speech naturalness and emotion perceiveability compared to the baseline.

EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis

arXiv2026-01-30作者：Li Zhou, Hao Jiang, Junjie Li

Achieving precise and controllable emotional expression is crucial for producing natural and context-appropriate speech in text-to-speech (TTS) synthesis. However, many emotion-aware TTS systems, including large language model (LLM)-based designs, rely on scaling fixed emotion embeddings or external guidance, limiting their ability to model emotion-specific latent characteristics. To address this gap, we present EmoShift, a lightweight activation-steering framework incorporating a EmoSteer layer, which learns a steering vector for each target emotion in the output embedding space to capture its latent offset and maintain stable, appropriate expression across utterances and categories. With only 10M trainable parameters,less than 1/30 of full fine-tuning, EmoShift outperforms zero-shot and fully fine-tuned baselines in objective and subjective evaluations, enhancing emotional expressiveness while preserving naturalness and speaker similarity. Further analysis confirms the proposed EmoSteer layer's effectiveness and reveals its potential for controllable emotional intensity in speech synthesis.

搜索结果：Emotion (Washington, D.C.)

Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition

EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis

E-ICL: Enhancing Fine-Grained Emotion Recognition through the Lens of Prototype Theory

Benchmarking PyCaret AutoML Against BiLSTM for Fine-Grained Emotion Classification: A Comparative Study on 20-Class Emotion Detection

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning

A Unified and Interpretable Emotion Representation and Expression Generation

Human-Guided Reasoning with Large Language Models for Vietnamese Speech Emotion Recognition

Moral Outrage Shapes Commitments Beyond Attention: Multimodal Moral Emotions on YouTube in Korea and the US

Performance Evaluation of Emotion Classification in Japanese Using RoBERTa and DeBERTa

Contextual Emotion Estimation from Image Captions

Natural Language Processing for Cognitive Analysis of Emotions

Computer Vision Estimation of Emotion Reaction Intensity in the Wild

EMOVOME: A Dataset for Emotion Recognition in Spontaneous Real-Life Speech

Emotional Theory of Mind: Bridging Fast Visual Processing with Slow Linguistic Reasoning

Enhancing Student Engagement in Online Learning through Facial Expression Analysis and Complex Emotion Recognition using Deep Learning

Towards Emotion-Based Synthetic Consciousness: Using LLMs to Estimate Emotion Probability Vectors

Best Practices in the Creation and Use of Emotion Lexicons

Emotional Voice Messages (EMOVOME) database: emotion recognition in spontaneous voice messages

Contextual Emotion Recognition using Large Vision Language Models

Towards Fine-Grained Emotion Understanding via Skeleton-Based Micro-Gesture Recognition