Several improvements have been proposed to Conv-TasNet – that mostly focus on the separator, leaving its encoder/decoder as a (shallow) linear operator. We propose a (deep) non-linear variant of it, that is based on a deep stack of small filters. With this change, we can improve 0.6-0.9 dB SI-SNRi.
Although I’m now a researcher at Dolby Laboratories, I’m still collaborating with some universities in Barcelona — where I’ll keep teaching deep learning for music and audio. In this context, and given the importance of the gradient vanishing/explode problem in deep neural networks, this week I’ll be teaching recurrent neural networks to the Master in Sound and Music Computing students of the Universitat Pompeu Fabra.
This year’s ISMIR was in Delft, the Netherlands. It seems like the community is starting to realise that the technologies developed by the ISMIR community can have an impact to our society – because they are starting to work! During the first days of the conference, many conversations were focusing on exploring ways to positively impact society. On the other side, technology-wise, we have seen (i) many people studying how to use musical domain knowledge to disentangle/structure/learn useful neural representations for many music applications, and (ii) many attention-based neural architectures.
This last month, I have submitted my doctoral thesis entitled “Deep Neural Networks for Music and Audio Tagging”. I’ll be defending next November 15th, and I’m excited to announce that my jury will be conformed by Geoffroy Peeters, Perfecto Herrera, and Juhan Nam.