学术论文

ConSinger: Efficient High-Fidelity Singing Voice Generation with Minimal Steps

来源：arXiv发布日期：2024-10-20作者：Yulin Song, Guorui Sang, Jing Yu, Chuangbai Xiao0 次点击

内容摘要

Singing voice synthesis (SVS) system is expected to generate high-fidelity singing voice from given music scores (lyrics, duration and pitch). Recently, diffusion models have performed well in this field. However, sacrificing inference speed to exchange with high-quality sample generation limits its application scenarios. In order to obtain high quality synthetic singing voice more efficiently, we propose a singing voice synthesis method based on the consistency model, ConSinger, to achieve high-fidelity singing voice synthesis with minimal steps. The model is trained by applying consistency constraint and the generation quality is greatly improved at the expense of a small amount of inference speed. Our experiments show that ConSinger is highly competitive with the baseline model in terms of generation speed and quality. Audio samples are available at https://keylxiao.github.io/consinger.

中文翻译

使用 AI 将内容摘要翻译为中文，便于快速阅读

使用 AI 分析这篇文章的核心发现、关键要点和深度见解

由 DeepSeek AI 提供分析 · 首次使用需配置 API Key