I have always been enthusiastic to explore Computer Vision; equally, music has also been a great part of my life. Combined, they make a great research problem! Four months ago, I commenced a 4-year Ph.D. program. So far, it has been an insane journey in terms of how much I have learned and the enthusiasm to learn more.
The research problem I am undertaking is Optical Music Recognition (OMR), more pointedly, investigating if Deep Learning can assist in improving the performances of the current methods.
For you to get to comprehend this problem a little bit more, I will attempt to clarify what OMR is, the conventional methods used, and the main issues needed to be tackled in the future.
What is OMR?
Most of us have presumably used Google Translate and its camera translation feature by now. By just taking a picture of a text, we save time and avoid learning Chinese or other languages. Now let us think of how this feature would apply to music. Musicians still write on music sheets or blank paper. However, if they want to share their music, they will have to transcribe it into a computer. A computer-readable music file would be more accessible.
Therefore, the motivation behind this research is the possibility of allowing composers and musicians to not only transcribe and edit music by taking a picture of the sheet music but ultimately share and play their pieces. OMR would also assist in music statistics and enable searchability for notations, similar to searching for text.
Calvo-Zaragoza et al. provide a clear definition of OMR, calling it a research field:
Optical Music Recognition is a field of research that investigates how to computationally read music notation in documents.
The Standard Pipeline
The research field was established at MIT in the late 1960s, using scanned printed music sheets. Based on the studies conducted, a standard pipeline was formed to reflect approaches taken to solve the problem (see Figure 1).
The usual inputs to this pipeline are scans or pictures of printed/handwritten music sheets. These images are processed through techniques like binarization, blurring, and deskewing to reduce noise.
Deep Learning for OMR
We aim to explore new ways of performing OMR steps using Deep Learning (DL). Most DL models are based on artificial neural networks inspired by biological neural networks. These networks consist of multiple layers with nodes (neurons), including input, hidden, and output layers.
Our initial approach is to apply DL to the second stage of OMR: music object detection. This requires a vast dataset containing images of music sheets with corresponding ground truth data for training the model. The dataset must include test data unseen by the model to evaluate its generalization performance.
The research also proposes standardization of input/output formats and evaluation criteria to ensure consistent results.
References
- A. Rebelo, I. Fujinaga, F. Paszkiewicz, A. R. S. Marcal, C. Guedes, and J. S. Cardoso, “Optical music recognition: state-of-the-art and open issues,” Int J Multimed Info Retr, vol. 1, no. 3, pp. 173–190, Oct. 2012. Read online.
- J. Calvo-Zaragoza, J. Hajic Jr., and A. Pacha, “Understanding Optical Music Recognition,” arXiv:1908.03608 [cs, eess], Aug. 2019. Read online.
- Pacha, Alexander, Jan Hajič, and Jorge Calvo-Zaragoza. “A baseline for general music object detection with deep learning.” Applied Sciences 8.9 (2018): 1488.