In this series of posts I have written a couple of articles discussing the pros & cons of spectrogram-based VGG architectures, to think about which is the role of the computer vision deep learning architectures in the audio field. Now is time to discuss what’s up with waveform-based VGGs!
- Post I: Why do spectrogram-based VGGs suck?
- Post II: Why do spectrogram-based VGGs rock?
- Post III: What’s up with waveform-based VGGs? [this post]