I had mixed feelings this ISMIR: from one side, I was disappointed for attending to another virtual ISMIR – buuuuuut, on the other side, it was nice to meet you all! ISMIR is such a vibrant and enthusiastic community, that is always great to meet each other – even if it was virtually! Still.. I guess we all agree that ISMIR was much better when we had the possibility to jam on a boat! 🙂
This year, I contributed to ISMIR with the following presentations:
- Industry presentation. I went through a retrospective of our recent work at Dolby, to try to answer some questions that I often get: what’s your research at Dolby? do you publish? can interns publish? who’s working with you at Dolby? Here the slides.
- MDX21 (music demixing) workshop presentation: we shared the main take-aways of our recent research on upsampling layers for music source separation. More info in this demo website or on youtube.
Music technology trends
Significant advances in music source separation were presented. Here a couple of those (with code)!
- Decoupling magnitude and phase estimation with deep resUnet for music source separation
- Demucs update (v3)
Research on ethics for music technology is here to stay. Here a paper I enjoyed:
- De-centering the West: East Asian philosophies and the ethics of applying artificial intelligence to music
Automatic music generation (either in the symbolic or in the waveform domain) was also a big topic:
- Keynote by Christine McLeavey: Jukebox and MuseNet – Generating raw audio and MIDI music
- Tutorial on Designing generative models for interactive co-creation
On new perspectives for neural audio synthesis:
Using semantic spaces (like emotion) for music retrieval:
Machine learning trends
Some ISMIR folks have been fairly excited with transformers, here some papers:
- SpecTNT: a time-frequency transformer for music audio
- Sequence-to-sequence piano transcription with transformers
ISMIR is starting to explore diffusion probabilistic models (aka score-based generative models):
- CRASH: raw audio score-based generative modeling for controllable high-resolution drum sound synthesis
- Symbolic music generation with diffusion models
Also exploring the opportunities of knowledge distillation:
- DarkGAN: exploiting knowledge distillation for comprehensible audio synthesis with GANs
- Semi-supervised music tagging transformer
We also saw graph neural networks for MIR!
Also on exploring how to integrate hierarchical priors in neural networks: