Introduction 🎶
Welcome to the official page for RenderBox: Expressive Performance Rendering with Text Control. This project presents a unified framework for generating expressive music performances from symbolic scores and text descriptions. With applications across multiple instruments, RenderBox introduces a diffusion transformer-based architecture combined with natural language prompts and MIDI scores for both coarse and granular control.
Our curriculum-based paradigm spans a wide spectrum of tasks, from plain synthesis to expressive rendering, offering control over factors like speed, mistakes, and style diversity. This approach bridges symbolic scores and natural language, making performance rendering more explainable and accessible.

Outputs 🎵
Synthesis
Input Score MIDI | Text Prompt 1 | Output 1 |
---|---|---|
Bach BWV 876 Prelude |
Synthesis | |
Valse - Manuel Ponce |
Synthesis | |
Bach BWV Violin Sonata 1002 Mvt.1 |
Synthesis |
Synthesis with Speed Augmentation
Input Score MIDI | Text Prompt 1 | Output 1 | Text Prompt 2 | Output 2 |
---|---|---|---|---|
Haydn, Op.64 No.1 |
Direct synthesis, a bit faster | slightly slower, synthesis | ||
Légende - Ernest Shand |
A bit faster than score, direct synthesis | Synthesis, much faster | ||
Ravel, Miroirs, Une Barque |
Direct synthesis, a bit faster | Synthesis, a bit slower |
Expressive Performance
Input Score MIDI | Text Prompt 1 | Output 1 | Text Prompt 2 | Output 2 |
---|---|---|---|---|
Chopin Piano Etude Op.25 No.12 |
faithfully matching tempo, Chopin, Etudes_op_25, 12, expressive performance | considerably slower, expressive performance, Chopin | ||
Malagueña de Isaac Albéniz |
Guitar, Expressive performance, in line with the score's tempo | guitar expressive performance, a bit faster than score | ||
Beethoven Piano Sonata No.16 Mvt.1 |
Beethoven, faithfully matching, Piano, expressive performance | expressive, a bit faster | ||
Sidewinder (Sax funk) |
at the same tempo, Expressive playings | Saxophone, no expression, at the same tempo | ||
Bach Violin Sonata BWV1003, Mvt.2 |
bwv1003, just under the score’s tempo, Expressive performance | Violin, expressive performance, faster than score | ||
Haydn String Quartet No 53 in D major |
Expressive performance, at the original speed | String Quartet, Expressive performance, much slower, Haydn | ||
David Russell: Rondeña - Regino Sainz de la Maza |
Expressive performance, in line with score's tempo | expressive performance, slower than score, guitar | ||
No more blues (Sax BossaNova) |
Performance with no expression, in line with the score's tempo, Saxophone | saxophone, expressive playing, a bit faster |
Performance with Mistakes
Input Score MIDI | Text Prompt 1 | Output 1 |
---|---|---|
Burgmuller No.1 |
Play like a student, at the same tempo | |
Beethoven Piano Sonata No.18 Mvt.1 |
Expressive performance with mistakes, same speed |
Performance with Directions
Input Score MIDI | Text Prompt 1 | Output 1 | Text Prompt 2 | Output 2 |
---|---|---|---|---|
Beethoven Piano Sonata No.2 in A, Op.2 No.2, 4. Rondo (Grazioso) |
In the style of Glenn Gould | In the style of Pierre-Laurent Aimard | ||
Maurice Ravel, Valses nobles et sentimentales, M.61 |
In the style of Krystian Zimerman | In the style of Alicia de Larrocha | ||
Beethoven Piano Sonata No.27 in E Minor, Op.90, 2. Nicht zu geschwind und sehr singbar vorgetragen |
In the style of Vladimir Ashkenazy | In the style of Vladimir Horowitz | ||
Johann Sebastian Bach, Das Wohltemperierte Klavier, Book 2, BWV 870-893: Fugue in E-flat major BWV 876 |
In the style of Sviatoslav Richter | In the style of Walter Gieseking | ||
Robert Schumann, Kreisleriana, Op.16, 2. Sehr innig und nicht zu rasch |
In the style of András Schiff | In the style of Van Cliburn | ||
Schubert Piano Sonata No.17 in D, D.850/3. Scherzo, Allegro vivace |
Brisk, lively, light-hearted | Slow, melancholic, with gravity | ||
Beethoven Piano Sonata No.31 in A-Flat Major, Op.110: I. Moderato cantabile molto espressivo (Notice how the command changes the accompaniment balance) |
Dreamy, flowing, serene | Mechanical, staccato, rigid | ||
Claude_Debussy, Préludes, Book 2, L.123/No. 6, Général Lavine - Eccentric |
Fast, energetic, dynamic | Slow, sad, clear-phrasing | ||
David Russell: Rondeña - Regino Sainz de la Maza |
Flowing, Calm, Well-Phrased | Jerky, Uneven, Unpredictable | ||
Bach BWV1003 Mvt.2 |
Brisk, Lively, Light-Hearted | Slow, Melancholic, With Gravity | ||
David Russell: Rondeña - Regino Sainz de la Maza |
Flat, Emotionless, Robotic | Romantic, Expressive, Lush | ||
Bach BWV1006 Mvt.7 |
Hurried, Rushed, Chaotic | Steady, Deliberate, Introspective |
Hunting for Dinu Lipatti’s Lost Recordings
I recently came across this post detailing the challenges in uncovering Dinu Lipatti’s lost recordings. The limited scope of Lipatti’s repertoire on record—just over three hours of playing time—is a poignant reminder of the brevity of his life and career, tragically cut short when he passed away at the age of 33.
While no technology can fully recreate Lipatti’s unparalleled interpretations, RenderBox offers a humble opportunity to imagine and revive his sound. By learning from the stylistic nuances of his existing recordings, this project aims to honor his legacy and provide listeners with a glimpse of what might have been.
In this section we attempted some rough full-piece rendering prompted by his name, by simple concatenation of the generated outputs, without specific segment-wise prompting and mastering. In the future, to enable more refined full-piece output that serves the purpose of reviving the sound of last-generation of pianists, we should improve the model with context conditioning (for better transitions), better tempo inference and some mastering compared to the following prototypes.
Text Prompt | Output |
---|---|
Piano, expressive performance in the style of Dinu Lipatti, Bach, considerably slower | (Bach BWV 868, prelude) |
Piano, expressive performance in the style of Dinu Lipatti, Beethoven sonata, Allegretto, a bit faster | (Beethoven piano sonata No.16 Mvt.3) |
Piano, expressive performance in the style of Dinu Lipatti, Schumann, Kreisleriana, at the same tempo | (Schumann Kleisleriana, 3. Sehr aufgeregt) |
Who is similar to whom, when playing what?
In the last section of the paper we have visualized the performer-composer embedding space of the generated latent. Here we further showcase some similarity maps that observes [within the same composition] the proximity between performers’s interpretation as learnt by the model. To recap, these are the last-step denoised latent computed on a 380-pieces testing subset, conditioned by each of the 10 pianists, in which the subset contains pieces by 10 composers (Note that our dataset is unbalanced, skewed towards Beethoven). The heatmaps shows within each composer’s subset, the cosine similarity between each pair of the latent by each performer.
Without tracing back the pianism genealogy to verify the relationship between the pianists, the pattern distribution of these heatmaps already astonishingly fits towards our understanding of interpretation space with respect to stylistic periods.
Debussy and Ravel, undoubly, demonstrate very similar proximity patterns as they are close in style.
The classical-romantic piano sonatas, by Mozart, Beethoven and Schubert (sorry Haydn!), also contains visually-close pattern. Noticably, they are also ones with higher value in general, meaning a higher similarity and smaller variance across the latents generated by performer’s conditioning. This fits our impression as the classical sonatas are constraint by form and possibly interpretation, in which more performers are converging to an ‘average’ performance.
Bach, Rachmaninoff, and Schumann, are the lowest in value – meaning everyone plays them quite differently! While I could understand Bach’s complexity leads to a very personal space, the diversity in Rachmaninoff interpretations does surprises me.
These similarity maps further supports that it’s not only the ‘acoustics cues’ that’s being learnt from individual performer’s recordings, but the model does capture how each performer gets directioned under different context.

Debussy

Ravel

Liszt

Scriabin

Schubert

Beethoven

Mozart

Bach

Rachmaninoff

Schumann
Cite Us 📚
@article{RenderBox2024,
author = {Author Name},
title = {RenderBox: Expressive Performance Rendering with Text Control},
journal = {Proceedings of the International Conference},
year = {2024},
url = {https://example-link-to-your-manuscript.pdf}
}
License & Copyright 🔒
The RenderBox project and dataset contain copyrighted material and are distributed under the following terms:
- RenderBox may only be used for non-commercial research purposes by the individual or organization agreeing to these terms.
- The dataset or parts of it may not be sold, leased, published, or distributed without prior written permission.
- All publications or derivative works using RenderBox must clearly credit and cite the project.