Review paper summary on OMR — paradigm shift and possible directions
Recently, I got my very first paper accepted to the International Conference on Technologies for Music Notation and Representation (TENOR) 2020. This experience has been insightful, serving as a guide for my future publishing endeavors.
The paper summarizes prior work and takes a position on progressing Optical Music Recognition (OMR). It highlights the paradigm shift from conventional computer vision methods to end-to-end deep learning techniques.
Overview of OMR Pipeline
The traditional OMR pipeline consists of four stages: image preprocessing, musical object detection, musical symbol reconstruction, and encoding into a machine-readable format. Each stage has seen advancements over the years:
- Image Preprocessing: Techniques like binarization, blurring, deskewing, and noise removal have evolved from traditional methods to neural networks such as sectional auto-encoders.
- Musical Object Detection: This stage benefits from computer vision advancements. Methods like Fast R-CNNs and Single Shot Detectors (SSD) use pre-trained models fine-tuned on datasets like MUSCIMA++.
- Musical Symbol Reconstruction: Structural and semantic relationships between symbols are reconstructed using heuristics and rules. Recent research explores deep learning for this stage, though challenges remain in capturing music's spatial and temporal dependencies.
- Encoding: Outputs are stored in formats like MIDI and MusicXML, supporting various levels of replayability and structural representation.
Challenges in OMR
Despite progress, several challenges persist:
- Lack of large labeled datasets for training.
- Improving accuracy in detecting music objects and staff lines.
- Reconstructing semantic relationships between symbols.
- Standardizing output formats, evaluation metrics, and representations.
Paradigm Shift to End-to-End Learning
The field is moving towards end-to-end deep learning models, streamlining processes like symbol detection and semantic reconstruction. These methods promise efficiency but require solving representation and standardization challenges.
Conclusion
Optical Music Recognition is a fascinating field bridging music and AI. My paper highlights its evolution, challenges, and future directions, paving the way for innovative research. Read more about it here.
References
- I. Fujinaga, “Optical music recognition using projections,” PhD dissertation, McGill University Montreal, Canada, 1988.
- B. Couasnon, et al., “Using logic programming languages for optical music recognition,” In Proceedings of the Third International Conference on The Practical Application of Prolog, 1995.
- A. Fornes, et al., “Writer identification in old handwritten music scores,” 2008 The Eighth IAPR International Workshop on Document Analysis Systems. IEEE, 2008, pp. 347–353.
- A. Pacha, et al., “Handwritten Music Object Detection: Open Issues and Baseline Results,” 2018 13th IAPR International Workshop on Document Analysis Systems (DAS). Vienna: IEEE, 2018.
- E. Shatri and G. Fazekas, “Optical Music Recognition: State of the Art and Major Challenges”, arXiv preprint arXiv:2006.07885, 2020.