搜索 — ResearchTracker

Laughter is a complex social signal that conveys communicative intent beyond amusement. While prior work has focused on isolated laughter analysis tasks, a comprehensive understanding of laughter in real-world scenarios remains underexplored. Therefore, we introduce SMILE-Next, a dataset for real-world laughter understanding with multimodal textual representations and question-answer annotations across three tasks: laughter detection, laughter type classification, and laughter reasoning. Building upon SMILE-Next, we aim to develop a laughter-specialized large language model capable of nuanced understanding of laughter in real-world contexts. To this end, we propose two key components: laughter-specific Self-Instruct and the Mixture-of-Laugh-Experts (MoLE) framework. Laughter-specific Self-Instruct enhances generalization across tasks and domains by automatically synthesizing diverse laughter-centric instructions. MoLE introduces a task-adaptive expert routing mechanism that dynamically selects specialized experts tailored to each laughter-related task, improving task-specific performance and efficiency. Experimental results show that the combination of our proposed components subst

MTLLFM: Multimodal-Temporal Laughter Localization: UR-FUNNY-Temporal and SMILE-Temporal Benchmarks with an Adaptive Multimodal Fusion Model

arXiv2026-05-25作者：Eyal Hanania, Nadav Kirsch, Daniel Arkushin

Detecting laughter in video is essential for affective computing and narrative understanding, yet existing approaches treat it as coarse clip-level classification, failing to capture precise temporal boundaries of brief, transient laughter events. We address this gap with two complementary contributions. First, we introduce UR-FUNNY-Temporal and SMILE-Temporal, fully annotated temporal laughter datasets extending two widely-used humor benchmarks. Our annotations cover over 11,053 videos (78.8 hours) and provide precise onset/offset boundaries for each laughter event, along with rich metadata distinguishing speaker vs. audience laughter, modality dominance (acoustic, visual, or both), and intensity levels. Second, we propose a lightweight weakly-supervised framework for temporal laughter localization. Our architecture combines fixed HuBERT and MAE encoders with temporal softmax pooling and adaptive modality gating, learning fine-grained temporal grounding from clip-level labels without requiring frame-level annotations during training. Experiments across three datasets demonstrate that our approach substantially outperforms multimodal foundation models including Gemini 3 Flash, achi

搜索结果：laughter

SMILE-Next: Teaching Large Language Models to Detect, Classify, and Reason about Laughter

MTLLFM: Multimodal-Temporal Laughter Localization: UR-FUNNY-Temporal and SMILE-Temporal Benchmarks with an Adaptive Multimodal Fusion Model

Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus

MultiLinguahah : A New Unsupervised Multilingual Acoustic Laughter Segmentation Method

A generative framework for conversational laughter: Its 'language model' and laughter sound synthesis

Timing In stand-up Comedy: Text, Audio, Laughter, Kinesics (TIC-TALK): Pipeline and Database for the Multimodal Study of Comedic Timing

Impact of annotation modality on label quality and model performance in the automatic assessment of laughter in-the-wild

Executive Voiced Laughter and Social Approval: An Explorative Machine Learning Study

MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation

Design and Development of Laughter Recognition System Based on Multimodal Fusion and Deep Learning

LaughTalk: Expressive 3D Talking Head Generation with Laughter

SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models

Haha-Pod: An Attempt for Laughter-based Non-Verbal Speaker Verification

A New Perspective on Smiling and Laughter Detection: Intensity Levels Matter

Analysis of Co-Laughter Gesture Relationship on RGB videos in Dyadic Conversation Contex

A Corpus-based Analysis of Attitudinal Changes in Lin Yutang's Self-translation of Between Tears and Laughter

Laugh Betrays You? Learning Robust Speaker Representation From Speech Containing Non-Verbal Fragments

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

Interpreting and Steering a Text-to-Speech Language Model with Sparse Autoencoders

Beyond Words: Towards Effective Modeling of Non-Verbal Vocalizations in ASR