JUNE 22, 2023

Data-driven workflows within Sibelius

RC-Sib-header 1862x1040

Data-driven workflows within Sibelius

We are excited to share the news about our first feature that integrates AI-powered data-driven processes within Sibelius. In our initial release, the goal was to enhance the existing workflow of entering chord symbols into Sibelius with a fast, easy-to-use, and transparent data-driven workflow. In this short technical blog, we will unpack how our new AI Assisted Chord Symbols work and share a few details about the AI model itself, the underlying training data, and a few limitations of the system. We will also provide some general insight into what themes will be important for refining this new class of Sibelius features in future releases.

Let’s dig in and look at how our first patent-pending “creator in the loop” workflow comes to life within Sibelius.


We designed the new AI Assisted Chord Symbols to be fast and easy to use. Existing users will find that the old workflow for entering chord symbols into Sibelius is the same as before. Yet, just like when you type an email or a text message, automatic suggestions are now provided as soon as you trigger the “Chord Symbol” command. If the AI-suggested chord symbol is what you intended, simply hit Escape, Space, or Tab to insert the suggestion into your project. If the suggestion is not what you wanted, you can: 1) as always, simply type in the chord symbol that you want, or 2) trigger the “Chord Symbol” command again (Control + K on Windows, Command + K on Mac) to see any additional suggestions from the model. The new feature allows you to find the most probable chord symbols for a passage, and to quickly cycle through them as suggestions, leading to perhaps equally valid and interesting analyses in some cases.

Figure 1. Workflow diagram for adding chord symbols to a project within Sibelius (legacy and new).


One unique facet of our AI Assisted Chord Symbols is the ability to view confidence values that are returned by the model. These confidence values are useful to more transparently show the underlying analysis. For example, several suggestions with a similar confidence value (e.g., medium) might indicate an ambiguous passage that can be interpreted in various, similar, ways. In contrast, a high-confidence suggestion followed by a low-confidence alternative might express a less common/obvious analysis in the second suggestion. This information could be useful if the user wants to rely more or less on the chord symbols suggested for the current location in the score. Whenever the suggestions and confidence values of the model widely mismatch the expectations of the user, this might indicate some type of bias in the model, or a passage that may be not well-represented in the training data of the model. See our Limitations section for more on this.

There are user preferences which allow the user to turn confidence values “On” or “Off” (they are “On” by default), as well as to customize the presentation as qualitative (default) or quantitative. If you are really into learning about the underlying data model, we recommend using the quantitative (i.e., percentage) representation of confidence values. For those who are using the qualitative confidence values, the category “High” represents confidence values above 66%, “Medium” represents values between 33% and 66%, and “Low” represents values below 33%.


The new feature in Sibelius is based on the Convolutional Recurrent Neural Network AugmentedNet. The network features six 1-dimensional convolutional layers, two dense layers, and a bidirectional GRU layer. It adds up to roughly 100,000 trainable parameters, which makes it a modest-to-small-sized neural network, considering how quickly deep learning models evolve nowadays.

Nevertheless, the size of the model also comes with some advantages. The model is small enough to fit in 700KB. This storage is embedded into the Sibelius application and, with the help of the ONNX inference engine, it can process an analysis window (i.e., about 20 bars of music) in just a few milliseconds (<200ms). The model estimates and analyzes the harmony and key (at the level of modulations and tonicizations) of the current location in the music. Considering all the tonal information, the model then suggests plausible chord symbols consisting of chord roots and qualities, for the current location.


The AI Assisted Chord Symbols work by dividing the score into analysis windows, each of which is approximately 20 bars long. These windows are divided from the beginning to the end of the score. For example, a score with 60 bars would fit into approximately three analysis windows. Whenever a new chord symbol is created, Sibelius will find the corresponding analysis window and dispatch it to the model. To improve the speed of the chord symbol predictions, Sibelius saves the returned analysis of the window containing the last chord symbol that was entered. By doing so, users will find that the AI Assisted Chord Symbol suggestions appear very fast. The caching process is described in the diagram below.

Figure 2
Figure 2. Flow chart describing the caching system in the AI Assisted Chord Symbols implementation

Creating a chord symbol that lies in a different (i.e., non-cached) analysis window will result in retrieving a new analysis from the AI model. Similarly, the AI model is sensitive to changes and edits within the music. If the pitch or duration of any note changes within the currently cached window, a new analysis will be requested.

Local AI Processing

One of the most important qualities of our AI model is that it is a local model, meaning that it runs inside Sibelius, without a network connection. Among other things, it is designed to be unintrusive. For example, when the auto-completion feature is turned off through the user preferences, it not only turns off the feature, but also the AI model itself. In this state, the model will not receive any data from the current project, nor have any performance impact on your work. The model analyzes music on a need-to-know basis, stores a buffer of suggestions for the current analysis window (as explained above), and destroys the information when it is not needed anymore (as well as when Sibelius is closed). While this is perhaps an unusual deployment for modern AI technology, we are proud of it and committed to using ethical AI practices to respect creators’ data.


As with any AI application, it is important to understand the relationship between the training data and the results obtained when using the feature. A model trained with classical music performs best on classical music; a model trained with chorales performs best on other chorales, etc. Thus, to help users understand the limitations inherent in the first iteration of our AI Assisted Chord Symbols, we will discuss the specifics of our training data, which consisted of harmonic annotations of public domain music.

More specifically, our training data consisted of the following datasets:

  • “The Bach Chorales Melody-Harmony Corpus” by David Sears and Jonathan Verbeten (CC BY 4.0): A set of 100 annotated Bach chorales.
  • “The Beethoven Piano Sonatas Dataset” by Tsung-Ping Chen and Li Su (GPL v3.0): A dataset with chord annotations for every first movement in the 32 Piano Sonatas by Beethoven
  • “The Haydn Sun String Quartets Dataset” by Néstor Nápoles López and Rafael Caro Repetto (Apache License 2.0): A dataset with chord annotations for all 24 movements in Haydn’s Op.20 “Sun” String Quartets.
  • “The Key Modulations and Tonicizations Dataset” by Néstor Nápoles López and Laurent Feisthauer (MIT License): A corpus of harmony annotations from various music theory textbooks, including books by Reger, Rimsky-Korsakov and Tchaikovsky.
  • “The OpenScore Lieder Corpus” by Mark Gotham and Peter Jonas (CC0): A corpus of songs by 19th century composers. In particular, a subset of the corpus with harmonic analysis annotations.
  • “The Theme and Variation Encodings with Roman Numerals (TAVERN)” by Johanna Devaney, Claire Arthur, and Nathaniel Condit-Schultz (CC BY-SA 4.0): A set of 27 piano themes and variations by Mozart and Beethoven with chord annotations.
  • “The ‘When in Rome’ dataset” by Mark Gotham, Dmitri Tymoczko, and Michael Cuthbert (CC BY-SA 4.0; with permission): A set of annotations comprising the preludes of Bach’s Well-Tempered Clavier (Book I), as well as various other pieces from the common-practice period of Western music.

This provided us with 59,451 chord symbol annotations spanning the music of several classical composers and distributed across 7 datasets. Certainly, this is a dataset that only represents a small fraction of the types of music that are typically entered into Sibelius. We are already hard at work expanding our training data to cover additional types of music. When trying to collect autosuggestions for pieces of music that are vastly different than the music within the training set, the model may provide unreliable suggestions, and sometimes no suggestions at all.

Counterintuitively, the model will likely provide better suggestions on complete musical examples rather than chords in isolation or chords with little tonal context. Once again, the reason is that the underlying training data did not contain isolated pedagogical examples (e.g., block chords outside a clear tonal context). Another situation where we have found less confident results is within passages that contain mostly rests, as these passages were not well represented in the training data, either.

It is also important to note the types of chords that can be recognized by our initial model. Since some chord types were not included in the training data, they will not appear among the suggestions. Here is the list of supported chord qualities for the initial model (note: we expect this list to quickly expand in future iterations):

  • major triad
  • minor triad
  • augmented triad
  • diminished triad
  • dominant seventh
  • major seventh
  • minor seventh
  • half-diminished seventh
  • fully diminished seventh
  • augmented sixth
  • augmented major seventh

One final limitation that deserves mention pertains to the current caching strategy used within the initial release. We have optimized the UX design for a workflow that flows chronologically when inputting chord symbols. This means that the ideal workflow involves adding all chord symbols after the notes have been entered in the project and working from the beginning of the piece towards the end. This ensures that chord symbols suggestions have chronological context, and allows the model to provide some insight into not only the chord at any given moment, but surrounding chords in the same analysis window.


Limitations aside, we hope that all existing users will try out the feature and let us know how it fits within their workflow and in their music projects. The more information we have about under-represented styles and musical examples, the faster we will be able to expand our training data to meet the diverse needs of our users. In the future, we will also consider enhancing our local model with a cloud-based model, and along with that change, allowing those customers that are interested to share analytical information to improve future data models. As we do so, we will continue to be transparent about the origin of our data and to ensure that anyone who opts-in to contribute to the model is fully aware of how their data will be used.

For the last 30 years or so, when we have made improvements to Sibelius, we have done so via code. However, with our 2023.6 release of Sibelius, we have finally changed this paradigm. We can now also improve Sibelius by collecting and training high-quality musical data. Our team is excited about these next generation “data-driven workflows” and the implications that they will have for the future of music notation and music-making in general.

Joe Plazak (PhD) & Néstor Nápoles López (PhD)
Montreal, QC


-Why don't I always get a suggestion?

The most likely reason is that the music being sent to the model does not resemble the training data (i.e., in musical genre or texture). In some instances, the model might struggle to recognize a certain location as “the location where a chord symbol should go.” We call that part of the model the “harmonic rhythm” predictor, and it is expected to improve over time. See the information about our training data above and note that our training data will continue to expand in future releases.

-Why is the most obvious suggestion not one of the choices?

It may be that: 1) what seems to be the “most obvious suggestion” to the user is still not part of the current chord vocabulary of the model (e.g., “sus2” and “sus4” chords are not currently included); 2) the passage is ambiguous; 3) certain features within the surrounding context make it hard for the model to recognize the harmony (e.g., isolated rests, polyphonic textures, etc.)

-Why can't the model recognize my add9 chords (and other types of harmonies)?

Certain chords are not in the vocabulary of the model. For a list of the chord qualities that can be identified by the first iteration of the model, see the section on Limitations in the blog post above. We expect to overcome these limitations in future releases.

-Why is the model suggesting a really odd chord?

When this occurs, it is often helpful to look at the confidence values provided alongside the suggestion. Note that displaying confidence values can be turned “Off” with the User Preferences, so there is a chance you may have to turn them “On.” When the model suggests something odd, you might also see it reflected with a low-confidence value. Another reason for an odd suggestion could be something unique about the context of the chord, such as being surrounded by rests or a passage without a strong tonal center (since the model is trained on tonal music).

-What do (“High”, “Medium”, and “Low”) confidence values represent, and how should I use this information?

The confidence values returned by the model represent the probability assigned by the model to each suggestion provided. Usually, there will be a “high confidence” suggestion followed with less probable ones. In some ambiguous cases (e.g., “is that major seventh part of the chord or a passing tone?”), the model might come up with various “medium confidence” suggestions instead. The confidence values are there to help you navigate the options offered by the model, and to provide some insight into its internal analysis. One difference between “human in the loop” workflows and “creator in the loop” workflows is that we aim to help the user explore a range of probable suggestions for a given musical moment. Sometimes, there is not necessarily “one right answer”.

-Will accuracy and generalizability improve, or is this as good as it’s going to get?

We have done our best in this blog to be as transparent as possible about the types of projects that currently will benefit the most from the new AI Assisted Chord Symbols. As we continue to improve our training data, projects across a wider array of musical styles and contexts can expect better (and more) suggestions.

-Can I contribute my own annotations to make the predictions better?

At this time, no, but in the future, users who are interested in contributing analytical data to improve the model, thus making AI Assisted Chord Symbols better for all users, will have a way to opt-in and contribute. Stay tuned and thank you for your interest!


Chen, Tsung-Ping, and Li Su. “Functional Harmony Recognition of Symbolic Music Data with Multi-Task Recurrent Neural Networks.” In Proceedings of the 19th International Society for Music Information Retrieval Conference, 90–97. Paris, France: ISMIR, 2018. https://doi.org/10.5281/zenodo.1492351.

Devaney, Johanna, Claire Arthur, Nathaniel Condit-Schultz, and Kirsten Nisula. “Theme and Variation Encodings with Roman Numerals (TAVERN): A New Data Set for Symbolic Music Analysis.” In Proceedings of the 16th International Society for Music Information Retrieval Conference, 728–34. Málaga, Spain: ISMIR, 2015. https://doi.org/10.5281/zenodo.1417497.

Gotham, Mark, and Peter Jonas. “The Openscore Lieder Corpus.” In Proceedings of the Music Encoding Conference, 2022.

Gotham, Mark, Dmitri Tymoczko, and Michael Cuthbert. “The RomanText Format: A Flexible and Standard Method for Representing Roman Numeral Analyses.” In Proceedings of the 20th International Society for Music Information Retrieval Conference, 123–29. Delft, The Netherlands: ISMIR, 2019. https://doi.org/10.5281/zenodo.3527756.

Nápoles López, Néstor. “Automatic Harmonic Analysis of Classical String Quartets from Symbolic Score.” Masters Thesis, Universitat Pompeu Fabra, 2017. https://doi.org/10.5281/zenodo.1095617.

Nápoles López, Néstor, Laurent Feisthauer, Florence Levé, and Ichiro Fujinaga. “On Local Keys, Modulations, and Tonicizations: A Dataset and Methodology for Evaluating Changes of Key.” In Proceedings of the 7th International Conference on Digital Libraries for Musicology, 18–26. New York, NY: Association for Computing Machinery, 2020. https://doi.org/10.1145/3424911.3425515.

Nápoles López, Néstor, Mark Gotham, and Ichiro Fujinaga. “AugmentedNet: A Roman Numeral Analysis Network with Synthetic Training Examples and Additional Tonal Tasks.” In Proceedings of the 22nd International Society for Music Information Retrieval Conference, 404–11, 2021. https://doi.org/10.5281/zenodo.5624533.

Sears, David, Jonathan Verbeten, and Hannah Percival. “Does Order Matter? Harmonic Priming Effects for Scrambled Tonal Chord Sequences.” Journal of Experimental Psychology: Human Perception and Performance, 2023. https://doi.org/https://doi.org/10.1037/xhp0001103.

  • Joe and Nestor

    Scoring the Sibelius hat trick, Joe works as a developer, designer, and product owner for the Sibelius team, while Néstor crunches big data, makes music AI models, and integrates next-generation workflows within Sibelius.

  • © 2024