The metaverse refers to a digital environment that enables real-time user interaction through immersive technologies. Recent advancements in deep learning have particularly strengthened the capabilities of Natural Language Processing (NLP) and Large Language Models (LLMs). These developments have made human-computer interactions in the metaverse more natural and responsive, especially those involving users and non-Player Characters (NPCs). Many virtual platforms use LLMs-powered Application Programming Interfaces (APIs) to facilitate these interactions, but these often produce long, semantically irrelevant responses that weaken the user's immersive experience. This study addresses this limitation by designing NLP systems capable of generating concise, context-aware, and task-oriented outputs for AI-powered NPCs. Unlike open-domain conversational agents, dialogue systems for metaverse-based NPCs operate under strict real-time and contextual constraints. NPC interactions require concise, task-oriented, and context-aware responses, as overly long or semantically irrelevant outputs can disrupt immersion and degrade user experience. Although recent advances in LLMs have improved dialogue generation, most existing studies focus on open-ended conversations or general-purpose question answering. This study addresses this gap by systematically investigating fine-tuning and Retrieval-Augmented Generation (RAG) strategies within a metaverse-focused dialogue domain. We propose a comparative evaluation of systems developed using fine-tuning and RAG techniques between decoder-only models (GPT-2, LLaMA, Qwen) and encoder-decoder models (mBART, mT5). The models trained on the dataset were evaluated using a combination of standard evaluation metrics and semantic-based criteria. All evaluation scores were normalized using the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) method to ensure objective comparability between models. The findings indicate that RAG provides a more balanced performance, particularly when applied to encoder-decoder models such as mBART (~ 0.652) and mT5 (~ 0.555), even when trained on relatively small datasets. Additionally, this paper presents a speech-based interaction framework designed to enable personalized and real-time communication in metaverse environments. The proposed framework is structured as Speech-to-Text (STT) → LLMs → Text-to-Speech (TTS). This architecture improves interaction quality by enabling coherent and realistic speech-based communication.
使用 AI 将内容摘要翻译为中文,便于快速阅读
使用 AI 分析这篇文章的核心发现、关键要点和深度见解
由 DeepSeek AI 提供分析 · 首次使用需配置 API Key
PubMed · 2026-05-03
PubMed · 2026-05-04
PubMed · 2026-05-03
PubMed · 2026-05-04