Introduction 🎶

Welcome to the official page for RenderBox: Expressive Performance Rendering with Text Control. This project presents a unified framework for generating expressive music performances from symbolic scores and text descriptions. With applications across multiple instruments, RenderBox introduces a diffusion transformer-based architecture combined with natural language prompts and MIDI scores for both coarse and granular control.

Our curriculum-based paradigm spans a wide spectrum of tasks, from plain synthesis to expressive rendering, offering control over factors like speed, mistakes, and style diversity. This approach bridges symbolic scores and natural language, making performance rendering more explainable and accessible.

RenderBox

Outputs 🎵

Synthesis

Input Score MIDI Text Prompt 1 Output 1

Bach BWV 876 Prelude
Synthesis

Valse - Manuel Ponce
Synthesis

Bach BWV Violin Sonata 1002 Mvt.1
Synthesis

Synthesis with Speed Augmentation

Input Score MIDI Text Prompt 1 Output 1 Text Prompt 2 Output 2

Haydn, Op.64 No.1
Direct synthesis, a bit faster slightly slower, synthesis

Légende - Ernest Shand
A bit faster than score, direct synthesis Synthesis, much faster

Ravel, Miroirs, Une Barque
Direct synthesis, a bit faster Synthesis, a bit slower

Expressive Performance

Input Score MIDI Text Prompt 1 Output 1 Text Prompt 2 Output 2

Chopin Piano Etude Op.25 No.12
faithfully matching tempo, Chopin, Etudes_op_25, 12, expressive performance considerably slower, expressive performance, Chopin

Malagueña de Isaac Albéniz
Guitar, Expressive performance, in line with the score's tempo guitar expressive performance, a bit faster than score

Beethoven Piano Sonata No.16 Mvt.1
Beethoven, faithfully matching, Piano, expressive performance expressive, a bit faster

Sidewinder (Sax funk)
at the same tempo, Expressive playings Saxophone, no expression, at the same tempo

Bach Violin Sonata BWV1003, Mvt.2
bwv1003, just under the score’s tempo, Expressive performance Violin, expressive performance, faster than score

Haydn String Quartet No 53 in D major
Expressive performance, at the original speed String Quartet, Expressive performance, much slower, Haydn

David Russell: Rondeña - Regino Sainz de la Maza
Expressive performance, in line with score's tempo expressive performance, slower than score, guitar

No more blues (Sax BossaNova)
Performance with no expression, in line with the score's tempo, Saxophone saxophone, expressive playing, a bit faster

Performance with Mistakes

Input Score MIDI Text Prompt 1 Output 1

Burgmuller No.1
Play like a student, at the same tempo

Beethoven Piano Sonata No.18 Mvt.1
Expressive performance with mistakes, same speed

Performance with Directions

Input Score MIDI Text Prompt 1 Output 1 Text Prompt 2 Output 2

Beethoven Piano Sonata No.2 in A, Op.2 No.2, 4. Rondo (Grazioso)
In the style of Glenn Gould In the style of Pierre-Laurent Aimard

Maurice Ravel, Valses nobles et sentimentales, M.61
In the style of Krystian Zimerman In the style of Alicia de Larrocha

Beethoven Piano Sonata No.27 in E Minor, Op.90, 2. Nicht zu geschwind und sehr singbar vorgetragen
In the style of Vladimir Ashkenazy In the style of Vladimir Horowitz

Johann Sebastian Bach, Das Wohltemperierte Klavier, Book 2, BWV 870-893: Fugue in E-flat major BWV 876
In the style of Sviatoslav Richter In the style of Walter Gieseking

Robert Schumann, Kreisleriana, Op.16, 2. Sehr innig und nicht zu rasch
In the style of András Schiff In the style of Van Cliburn

Schubert Piano Sonata No.17 in D, D.850/3. Scherzo, Allegro vivace
Brisk, lively, light-hearted Slow, melancholic, with gravity

Beethoven Piano Sonata No.31 in A-Flat Major, Op.110: I. Moderato cantabile molto espressivo (Notice how the command changes the accompaniment balance)
Dreamy, flowing, serene Mechanical, staccato, rigid

Claude_Debussy, Préludes, Book 2, L.123/No. 6, Général Lavine - Eccentric
Fast, energetic, dynamic Slow, sad, clear-phrasing

David Russell: Rondeña - Regino Sainz de la Maza
Flowing, Calm, Well-Phrased Jerky, Uneven, Unpredictable

Bach BWV1003 Mvt.2
Brisk, Lively, Light-Hearted Slow, Melancholic, With Gravity

David Russell: Rondeña - Regino Sainz de la Maza
Flat, Emotionless, Robotic Romantic, Expressive, Lush

Bach BWV1006 Mvt.7
Hurried, Rushed, Chaotic Steady, Deliberate, Introspective

Hunting for Dinu Lipatti’s Lost Recordings

I recently came across this post detailing the challenges in uncovering Dinu Lipatti’s lost recordings. The limited scope of Lipatti’s repertoire on record—just over three hours of playing time—is a poignant reminder of the brevity of his life and career, tragically cut short when he passed away at the age of 33.

While no technology can fully recreate Lipatti’s unparalleled interpretations, RenderBox offers a humble opportunity to imagine and revive his sound. By learning from the stylistic nuances of his existing recordings, this project aims to honor his legacy and provide listeners with a glimpse of what might have been.

In this section we attempted some rough full-piece rendering prompted by his name, by simple concatenation of the generated outputs, without specific segment-wise prompting and mastering. In the future, to enable more refined full-piece output that serves the purpose of reviving the sound of last-generation of pianists, we should improve the model with context conditioning (for better transitions), better tempo inference and some mastering compared to the following prototypes.

 

Text Prompt Output
Piano, expressive performance in the style of Dinu Lipatti, Bach, considerably slower
(Bach BWV 868, prelude)
Piano, expressive performance in the style of Dinu Lipatti, Beethoven sonata, Allegretto, a bit faster
(Beethoven piano sonata No.16 Mvt.3)
Piano, expressive performance in the style of Dinu Lipatti, Schumann, Kreisleriana, at the same tempo
(Schumann Kleisleriana, 3. Sehr aufgeregt)

Who is similar to whom, when playing what?

In the last section of the paper we have visualized the performer-composer embedding space of the generated latent. Here we further showcase some similarity maps that observes [within the same composition] the proximity between performers’s interpretation as learnt by the model. To recap, these are the last-step denoised latent computed on a 380-pieces testing subset, conditioned by each of the 10 pianists, in which the subset contains pieces by 10 composers (Note that our dataset is unbalanced, skewed towards Beethoven). The heatmaps shows within each composer’s subset, the cosine similarity between each pair of the latent by each performer.

Without tracing back the pianism genealogy to verify the relationship between the pianists, the pattern distribution of these heatmaps already astonishingly fits towards our understanding of interpretation space with respect to stylistic periods.

Debussy and Ravel, undoubly, demonstrate very similar proximity patterns as they are close in style.

The classical-romantic piano sonatas, by Mozart, Beethoven and Schubert (sorry Haydn!), also contains visually-close pattern. Noticably, they are also ones with higher value in general, meaning a higher similarity and smaller variance across the latents generated by performer’s conditioning. This fits our impression as the classical sonatas are constraint by form and possibly interpretation, in which more performers are converging to an ‘average’ performance.

Bach, Rachmaninoff, and Schumann, are the lowest in value – meaning everyone plays them quite differently! While I could understand Bach’s complexity leads to a very personal space, the diversity in Rachmaninoff interpretations does surprises me.

These similarity maps further supports that it’s not only the ‘acoustics cues’ that’s being learnt from individual performer’s recordings, but the model does capture how each performer gets directioned under different context.

Example 2

Debussy

Example 1

Ravel

Example 2

Liszt

Example 1

Scriabin

Example 3

Schubert

Example 4

Beethoven

Example 4

Mozart

Example 2

Bach

Example 1

Rachmaninoff

Example 1

Schumann


Cite Us 📚


@article{RenderBox2024,
  author       = {Author Name},
  title        = {RenderBox: Expressive Performance Rendering with Text Control},
  journal      = {Proceedings of the International Conference},
  year         = {2024},
  url          = {https://example-link-to-your-manuscript.pdf}
}


The RenderBox project and dataset contain copyrighted material and are distributed under the following terms:

  • RenderBox may only be used for non-commercial research purposes by the individual or organization agreeing to these terms.
  • The dataset or parts of it may not be sold, leased, published, or distributed without prior written permission.
  • All publications or derivative works using RenderBox must clearly credit and cite the project.