Engril.com

Today's Stories Tomorrow's History

Arts Entertainments

Automating Polyphonic Piano Transcription With a Note Based Music Language Model

Piano Transcription With a Note Based Music Language Model

Automatic music transcription (AMT) converts a performance audio signal into a symbolic representation of the musical score. It is a core research field in computational musicology and has a variety of applications in related fields, such as music information retrieval (MIR) and music education. However, the ambiguity of polyphonic piano music resulting from overlapping spectral features and inharmonic overtones makes it particularly challenging. In particular, the onset and offset of multiple concurrent notes need to be detected as well as their respective pitches. The aim of this paper is to propose a method that performs frame-wise and note-level transcription for piano music using a note based music language model.

Our approach uses a combination of a deep learning approach with a hierarchical model and a novel loss function to improve performance on a large scale piano music dataset. It starts with an acoustic model trained on the MAPS dataset, which contains performance audio recordings and associated aligned MIDI files of 30 pieces of piano music.

www.tartalover.net

The output of this acoustic model is then used as the input of a language model for piano music transcription. The language model consists of a graph convolution network and a recurrent network with 88 outputs, each of which correspond to one of the 88 possible piano pitches.

Automating Polyphonic Piano Transcription With a Note Based Music Language Model

We also use a novel loss function, which is a variant of the MAP-loss formula and is optimized to minimize the MLR in the context of AMT. The formula yields a new loss function that is evaluated on the MAPS data set and outperforms other losses in terms of both recall and F-measure. The acoustic model and the language model are integrated into a hierarchical generative model where the acoustic model learns the base spectra of the instrument, the probabilities of typical note cooccurrences, and the language model infers the chord labels.

The recurrent network acts as a reduction function to model the relation between the onset and offset of a note and its pitch. It is followed by a onset detection algorithm that detects the occurrence of a note event, and then a set of 88 binary vectors representing the probability of each of the 88 possible pitches at the detected onset are recoded into a list. This list is then fed into a min-priority queue, which maintains only the top w of the probabilities in each frame and yields a list of the highest-ranked candidate notes at each onset, thereby improving recall.

By combining the onset detector, recurrent network for multi-pitch estimation, and the loss function, we obtain an AMT system that achieves high frame-level F1 scores and a relatively good performance on the note-level. The system is also able to recognize the corresponding piano notes with sufficient precision in terms of both their onset and their pitch. This is a significant step forward in AMT for piano music. Moreover, the method has the potential to be extended to other musical genres such as flamenco singing or orchestral music.

LEAVE A RESPONSE

Your email address will not be published. Required fields are marked *