I’m happy to share the highlights of my first paper with Dolby! We will be presenting this work at ICASSP 2020, in Barcelona.
Several improvements have been proposed to Conv-TasNet – that mostly focus on the separator, leaving its encoder/decoder as a (shallow) linear operator. We propose a (deep) non-linear variant of it, that is based on a deep stack of small filters. With this change, we can improve 0.6-0.9 dB SI-SNRi.
The encoder/decoder we propose is based on a deep stack of SMALL filters, and concurrent works have also found that using an encoder/decoder based on SMALL filters works best:
We also challenge the generalisation capabilities of Conv-TasNet. We report a LARGE performance drop when using cross-dataset evaluation. This result is important. It showcases the limitations of the current evaluation setup.
USE CROSS-DATASET EVALUATION IN YOUR PAPERS 🙂