Optical Music Recognition: State of the Art and Major Challenges

Review paper summary on OMR — paradigm shift and possible directions

Recently, I got my very first paper accepted to the International Conference on Technologies for Music Notation and Representation (TENOR) 2020. This experience has been insightful, serving as a guide for my future publishing endeavors.

The paper summarizes prior work and takes a position on progressing Optical Music Recognition (OMR). It highlights the paradigm shift from conventional computer vision methods to end-to-end deep learning techniques.

Overview of OMR Pipeline

The traditional OMR pipeline consists of four stages: image preprocessing, musical object detection, musical symbol reconstruction, and encoding into a machine-readable format. Each stage has seen advancements over the years:

Image Preprocessing: Techniques like binarization, blurring, deskewing, and noise removal have evolved from traditional methods to neural networks such as sectional auto-encoders.
Musical Object Detection: This stage benefits from computer vision advancements. Methods like Fast R-CNNs and Single Shot Detectors (SSD) use pre-trained models fine-tuned on datasets like MUSCIMA++.
Musical Symbol Reconstruction: Structural and semantic relationships between symbols are reconstructed using heuristics and rules. Recent research explores deep learning for this stage, though challenges remain in capturing music's spatial and temporal dependencies.
Encoding: Outputs are stored in formats like MIDI and MusicXML, supporting various levels of replayability and structural representation.

Challenges in OMR

Despite progress, several challenges persist:

Lack of large labeled datasets for training.
Improving accuracy in detecting music objects and staff lines.
Reconstructing semantic relationships between symbols.
Standardizing output formats, evaluation metrics, and representations.

Paradigm Shift to End-to-End Learning

The field is moving towards end-to-end deep learning models, streamlining processes like symbol detection and semantic reconstruction. These methods promise efficiency but require solving representation and standardization challenges.

Conclusion

Optical Music Recognition is a fascinating field bridging music and AI. My paper highlights its evolution, challenges, and future directions, paving the way for innovative research. Read more about it here.

References

I. Fujinaga, “Optical music recognition using projections,” PhD dissertation, McGill University Montreal, Canada, 1988.
B. Couasnon, et al., “Using logic programming languages for optical music recognition,” In Proceedings of the Third International Conference on The Practical Application of Prolog, 1995.
A. Fornes, et al., “Writer identification in old handwritten music scores,” 2008 The Eighth IAPR International Workshop on Document Analysis Systems. IEEE, 2008, pp. 347–353.
A. Pacha, et al., “Handwritten Music Object Detection: Open Issues and Baseline Results,” 2018 13th IAPR International Workshop on Document Analysis Systems (DAS). Vienna: IEEE, 2018.
E. Shatri and G. Fazekas, “Optical Music Recognition: State of the Art and Major Challenges”, arXiv preprint arXiv:2006.07885, 2020.