NoiseBandNet: Controllable Time-Varying Neural Synthesis of Sound Effects Using Filterbanks
[Paper] | [Code]


Sound examples

Time and frequency resolution comparison

Reconstructed audio comparison between multiple configurations of the DDSP time-varying FIR noise synthesiser [1] and NoiseBandNet. The top row shows the waveform of the entire sound, the middle row its log-magnitude spectrogram and at the bottom a detail of the transient. The transient spot is annotated with a vertical dashed line in the first and third rows. The left column shows the original training sample: a short metal impact. The middle columns show the reconstruction of five different configurations of the DDSP time-varying FIR noise synthesiser with 128, 512, 1024, 4096 and 8192 taps respectively, all of them with a hop size of 32 samples. Observe its time and frequency trade-off: the frequency resolution increases with the number of taps at the same time the time resolution decreases, and vice-versa. The right column shows the NoiseBandNet reconstruction using 2048 filters and a synthesis window of 32 samples, maintaining both good time and frequency resolution.

Original (training data)DDSP128tapsDDSP512tapsDDSP1024tapsDDSP4096tapsDDSP8192tapsNoiseBandNet (ours)

Reconstruction

We formally explore the suitability of NoiseBandNet comparing its reconstruction capabilities against four different configurations of the original DDSP time-varying FIR noise synthesiser [1] with a configuration of FIR filter taps of 256 (DDSP256taps), 512 (DDSP512taps), 1024 (DDSP1024taps) and 4096 (DDSP4096taps). The R code and loss data used to perform the statistical analysis on section IV-B of the paper can be found here.

Original (training data)DDSP256tapsDDSP512tapsDDSP1024tapsDDSP4096tapsNoiseBandNet (ours)
Footsteps
Thunderstorm
Pottery
Knocking
Metal

Creative experiments

Amplitude randomisation

Corresponding to Section V-A of the paper.

Resynthesis (no randomisation)Stereo generationTop-k randomisation ITop-k randomisation IIFrequency shift randomisation IFrequency shift randomisation IIBoth randomisations IBoth randomisations II

Loudness transfer

Corresponding to Section V-B of the paper. Caution: loud.

Metal impact (training data)
Wilhelm scream (training data)
Electric drill (training data)
Beatbox (target loudness)
Beatbox to metal

Mix
Beatbox to scream

Mix
Beatbox to drill

Mix
Scribbling (target loudness)
Scribbling to metal

Mix
Scribbling to scream

Mix
Scribbling to drill

Mix
Squeaky toy (target loudness)
Squeaky toy to metal

Mix
Squeaky toy to scream

Mix
Squeaky toy to drill

Mix

Training using user-defined control parameters

Corresponding to Section V-C of the paper. Caution: loud.

Metal impact (training data)
Metal impact control I
Metal impact control II
Metal impact control III
Wilhelm scream (training data)
Wilhelm scream control I
Wilhelm scream control II
Wilhelm scream control III
Electric drill (training data)
Electric drill control I
Electric drill control II
Electric drill control III

Training sounds attribution
Footsteps on metal sounds by: Freesound user "Eelke", licensed under CC BY 4.0: https://freesound.org/people/Eelke/sounds/462599/
Thunderstorm sounds by: Freesound user "InspectorJ", licensed under CC BY 4.0: https://freesound.org/people/InspectorJ/sounds/360328/
Pottery sounds by: Freesound user "Tumbleweed3288", licensed under CC0 1.0: https://freesound.org/people/Tumbleweed3288/sounds/381638/ and https://freesound.org/people/Tumbleweed3288/sounds/381548/
Knocking sounds by: Adrián Barahona-Ríos & Sandra Pauletto [2], licensed under CC BY 4.0: https://zenodo.org/record/3668503
Metal sounds by: Freesound user "gokalp_gonen", licensed under CC0 1.0: https://freesound.org/people/gokalp_gonen/sounds/640517/ and https://freesound.org/people/gokalp_gonen/sounds/640518/
Metal impact sounds by: Freesound user "jorickhoofd", licensed under CC BY 4.0: https://freesound.org/people/jorickhoofd/sounds/160045/
Wilhelm scream by: Freesound user "SweetNeo85", licensed under CC Sampling Plus 1.0: https://freesound.org/people/SweetNeo85/sounds/13797/
Drill sounds by: Freesound user "aharri6", licensed under CC Sampling Plus 1.0: https://freesound.org/people/aharri6/sounds/71079/
Beatbox sounds by: Freesound user "VocalPercussion", licensed under CC0 1.0: https://freesound.org/people/VocalPercussion/sounds/245324/
Scribbling sounds by: Freesound user "InspectorJ", licensed under CC BY 4.0: https://freesound.org/people/InspectorJ/sounds/398271/
Squeaky toy sounds by: Freesound user "metrostock99", licensed under CC BY 4.0: https://freesound.org/people/metrostock99/sounds/514701/

References
[1] Engel, Jesse, et al. "DDSP: Differentiable Digital Signal Processing." arXiv preprint arXiv:2001.04643 (2020).
[2] Barahona-Ríos, Adrián and Sandra Pauletto. "Synthesising Knocking Sound Effects Using Conditional WaveGAN." In: Proceedings of the 17th Sound & Music Computing Conference, pp. 450-456, Torino, Italy. 2020.