ResearchTracker科研与行业发展动态追踪平台

搜索结果：piano

共找到 20 条结果

来源：全部

PianoBind: A Multimodal Joint Embedding Model for Pop-piano Music

arXiv2025-09-04作者：Hayeon Bang, Eunjin Choi, Seungheon Doh

Solo piano music, despite being a single-instrument medium, possesses significant expressive capabilities, conveying rich semantic information across genres, moods, and styles. However, current general-purpose music representation models, predominantly trained on large-scale datasets, often struggle to captures subtle semantic distinctions within homogeneous solo piano music. Furthermore, existing piano-specific representation models are typically unimodal, failing to capture the inherently multimodal nature of piano music, expressed through audio, symbolic, and textual modalities. To address these limitations, we propose PianoBind, a piano-specific multimodal joint embedding model. We systematically investigate strategies for multi-source training and modality utilization within a joint embedding framework optimized for capturing fine-grained semantic distinctions in (1) small-scale and (2) homogeneous piano datasets. Our experimental results demonstrate that PianoBind learns multimodal representations that effectively capture subtle nuances of piano music, achieving superior text-to-music retrieval performance on in-domain and out-of-domain piano datasets compared to general-purp

查看原文 ↗

PiCoGen2: Piano cover generation with transfer learning approach and weakly aligned data

arXiv

D3PIA: A Discrete Denoising Diffusion Model for Piano Accompaniment Generation From Lead sheet

arXiv

Towards Video to Piano Music Generation with Chain-of-Perform Support Benchmarks

arXiv

Learning to Play Piano in the Real World

arXiv

Extended Admissible Dissections of Marked Surfaces and Piano Algebras

arXiv2026-03-16作者：Marina Godinho, Dave Murphy

PIANO: Physics Informed Autoregressive Network

arXiv2025-08-22作者：Mayank Nagda, Jephte Abijuru, Phil Ostheimer

PianoVAM: A Multimodal Piano Performance Dataset

arXiv2025-09-10作者：Yonghyun Kim, Junhyung Park, Joonhyung Bae

PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance

arXiv

Fine-Tuning MIDI-to-Audio Alignment using a Neural Network on Piano Roll and CQT Representations

arXiv2025-06-27作者：Sebastian Murgul, Moritz Reiser, Michael Heizmann

In this paper, we present a neural network approach for synchronizing audio recordings of human piano performances with their corresponding loosely aligned MIDI files. The task is addressed using a Convolutional Recurrent Neural Network (CRNN) architecture, which effectively captures spectral and temporal features by processing an unaligned piano roll and a spectrogram as inputs to estimate the aligned piano roll. To train the network, we create a dataset of piano pieces with augmented MIDI files that simulate common human timing errors. The proposed model achieves up to 20% higher alignment accuracy than the industry-standard Dynamic Time Warping (DTW) method across various tolerance windows. Furthermore, integrating DTW with the CRNN yields additional improvements, offering enhanced robustness and consistency. These findings demonstrate the potential of neural networks in advancing state-of-the-art MIDI-to-audio alignment.

PIAST: A Multimodal Piano Dataset with Audio, Symbolic and Text

arXiv2024-11-04作者：Hayeon Bang, Eunjin Choi, Megan Finch

While piano music has become a significant area of study in Music Information Retrieval (MIR), there is a notable lack of datasets for piano solo music with text labels. To address this gap, we present PIAST (PIano dataset with Audio, Symbolic, and Text), a piano music dataset. Utilizing a piano-specific taxonomy of semantic tags, we collected 9,673 tracks from YouTube and added human annotations for 2,023 tracks by music experts, resulting in two subsets: PIAST-YT and PIAST-AT. Both include audio, text, tag annotations, and transcribed MIDI utilizing state-of-the-art piano transcription and beat tracking models. Among many possible tasks with the multi-modal dataset, we conduct music tagging and retrieval using both audio and MIDI data and report baseline performances to demonstrate its potential as a valuable resource for MIR research.

RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands

arXiv2024-08-20作者：Yi Zhao, Le Chen, Jan Schneider

It has been a long-standing research goal to endow robot hands with human-level dexterity. Bi-manual robot piano playing constitutes a task that combines challenges from dynamic tasks, such as generating fast while precise motions, with slower but contact-rich manipulation problems. Although reinforcement learning based approaches have shown promising results in single-task performance, these methods struggle in a multi-song setting. Our work aims to close this gap and, thereby, enable imitation learning approaches for robot piano playing at scale. To this end, we introduce the Robot Piano 1 Million (RP1M) dataset, containing bi-manual robot piano playing motion data of more than one million trajectories. We formulate finger placements as an optimal transport problem, thus, enabling automatic annotation of vast amounts of unlabeled songs. Benchmarking existing imitation learning approaches shows that such approaches reach state-of-the-art robot piano playing performance by leveraging RP1M.

Pop2Piano : Pop Audio-based Piano Cover Generation

arXiv2022-11-02作者：Jongho Choi, Kyogu Lee

Piano covers of pop music are enjoyed by many people. However, the task of automatically generating piano covers of pop music is still understudied. This is partly due to the lack of synchronized {Pop, Piano Cover} data pairs, which made it challenging to apply the latest data-intensive deep learning-based methods. To leverage the power of the data-driven approach, we make a large amount of paired and synchronized {Pop, Piano Cover} data using an automated pipeline. In this paper, we present Pop2Piano, a Transformer network that generates piano covers given waveforms of pop music. To the best of our knowledge, this is the first model to generate a piano cover directly from pop audio without using melody and chord extraction modules. We show that Pop2Piano, trained with our dataset, is capable of producing plausible piano covers.

Teach Me How to ImproVISe: Co-Designing an Augmented Piano Training System for Improvisation

arXiv2024-02-05作者：Jordan Aiko Deja, Sandi Štor, Ilonka Pucihar

Improvisation is a vital but often neglected aspect of traditional piano teaching. Challenges such as difficulty in assessment and subjectivity have hindered its effective instruction. Technological approaches, including augmentation, aim to enhance piano instruction, but the specific application of digital augmentation for piano improvisation is under-explored. This paper outlines a co-design process developing an Augmented Reality (AR) Piano Improvisation Training System, ImproVISe, involving improvisation teachers. The prototype, featuring basic improvisation concepts, was created and refined through expert interaction. Their insights guided the identification of objectives, tools, interaction metaphors, and software features. The findings offer design guidelines and recommendations to address challenges in assessing piano improvisation in a learning context.

Dexterous Robotic Piano Playing at Scale

arXiv

Efficient Transformer-Based Piano Transcription With Sparse Attention Mechanisms

arXiv

Pay Attention to the Keys: Visual Piano Transcription Using Transformers

arXiv2024-11-13作者：Uros Zivanovic, Ivan Pilkov, Carlos Eduardo Cancino-Chacón

Visual piano transcription (VPT) is the task of obtaining a symbolic representation of a piano performance from visual information alone (e.g., from a top-down video of the piano keyboard). In this work we propose a VPT system based on the vision transformer (ViT), which surpasses previous methods based on convolutional neural networks (CNNs). Our system is trained on the newly introduced R3 dataset, consisting of ca.~31 hours of synchronized video and MIDI recordings of piano performances. We additionally introduce an approach to predict note offsets, which has not been previously explored in this context. We show that our system outperforms the state-of-the-art on the PianoYT dataset for onset prediction and on the R3 dataset for both onsets and offsets.

Cosmic Piano: analysing the sound of the Universe

arXiv

AMT-APC: Automatic Piano Cover by Fine-Tuning an Automatic Music Transcription Model

arXiv2024-09-21作者：Kazuma Komiya, Yoshihisa Fukuhara

There have been several studies on automatically generating piano covers, and recent advancements in deep learning have enabled the creation of more sophisticated covers. However, existing automatic piano cover models still have room for improvement in terms of expressiveness and fidelity to the original. To address these issues, we propose a learning algorithm called AMT-APC, which leverages the capabilities of automatic music transcription models. By utilizing the strengths of well-established automatic music transcription models, we aim to improve the accuracy of piano cover generation. Our experiments demonstrate that the AMT-APC model reproduces original tracks more accurately than any existing models.

GiantMIDI-Piano: A large-scale MIDI dataset for classical piano music

arXiv