搜索 — ResearchTracker

We often verbally express emotions in a multifaceted manner, they may vary in their intensities and may be expressed not just as a single but as a mixture of emotions. This wide spectrum of emotions is well-studied in the structural model of emotions, which represents variety of emotions as derivative products of primary emotions with varying degrees of intensity. In this paper, we propose an emotional text-to-speech design to simulate a wider spectrum of emotions grounded on the structural model. Our proposed design, Daisy-TTS, incorporates a prosody encoder to learn emotionally-separable prosody embedding as a proxy for emotion. This emotion representation allows the model to simulate: (1) Primary emotions, as learned from the training samples, (2) Secondary emotions, as a mixture of primary emotions, (3) Intensity-level, by scaling the emotion embedding, and (4) Emotions polarity, by negating the emotion embedding. Through a series of perceptual evaluations, Daisy-TTS demonstrated overall higher emotional speech naturalness and emotion perceiveability compared to the baseline.

Emo-LiPO: Listwise Preference Optimization for Fine-Grained Emotion Intensity Control in LLM-based Text-to-Speech

arXiv2026-06-11作者：Yihang Lin, Li Zhou, Congwei Cao

Large language model (LLM)-based text-to-speech (TTS) systems enable prompt-conditioned emotional control but struggle with fine-grained emotion intensity due to the semantic -- acoustic gap between text and speech. To address this challenge, we formulate emotion intensity control in LLM-based TTS as a learning-to-rank problem and propose Emo-LiPO, a listwise preference optimization framework that aligns prompt-conditioned speech generation with relative emotion intensity expressed in text. Emo-LiPO explicitly models global intensity ordering within each emotion under fixed transcripts, enabling more faithful and continuous emotional expression. We further construct ESD-plus, a multi-speaker dataset with explicit emotion intensity variations, to support fine-grained emotion modeling and evaluation. Experiments on ESD-plus demonstrate that Emo-LiPO significantly improves emotion accuracy and intensity controllability over both supervised- and DPO-based LLM TTS baselines, with particularly pronounced gains at high intensity levels.

搜索结果：emotion

Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition

Emo-LiPO: Listwise Preference Optimization for Fine-Grained Emotion Intensity Control in LLM-based Text-to-Speech

EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis

EPIG: Emotion-Based Prompting for Personalised Image Generation

Benchmarking PyCaret AutoML Against BiLSTM for Fine-Grained Emotion Classification: A Comparative Study on 20-Class Emotion Detection

E-ICL: Enhancing Fine-Grained Emotion Recognition through the Lens of Prototype Theory

A Cross-Cultural Analysis of Animated Representations of Emotions for Wearable Interfaces

Speech Emotion Diarization: Which Emotion Appears When?

Contextual Emotion Estimation from Image Captions

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning

Moral Outrage Shapes Commitments Beyond Attention: Multimodal Moral Emotions on YouTube in Korea and the US

EmoLoom-2B: Fast Base-Model Screening for Emotion Classification and VAD with Lexicon-Weak Supervision and KV-Off Evaluation

Performance Evaluation of Emotion Classification in Japanese Using RoBERTa and DeBERTa

Natural Language Processing for Cognitive Analysis of Emotions

Human-Guided Reasoning with Large Language Models for Vietnamese Speech Emotion Recognition

A Unified and Interpretable Emotion Representation and Expression Generation

Towards Emotion-Based Synthetic Consciousness: Using LLMs to Estimate Emotion Probability Vectors

Emotional Theory of Mind: Bridging Fast Visual Processing with Slow Linguistic Reasoning

EMOVOME: A Dataset for Emotion Recognition in Spontaneous Real-Life Speech

Enhancing Student Engagement in Online Learning through Facial Expression Analysis and Complex Emotion Recognition using Deep Learning