NoiseBandNet -

NoiseBandNet: Controllable Time-Varying Neural Synthesis of Sound Effects Using Filterbanks
[Paper] | [Code]

Sound examples

Time and frequency resolution comparison

Reconstructed audio comparison between multiple configurations of the DDSP time-varying FIR noise synthesiser [1] and NoiseBandNet. The top row shows the waveform of the entire sound, the middle row its log-magnitude spectrogram and at the bottom a detail of the transient. The transient spot is annotated with a vertical dashed line in the first and third rows. The left column shows the original training sample: a short metal impact. The middle columns show the reconstruction of five different configurations of the DDSP time-varying FIR noise synthesiser with 128, 512, 1024, 4096 and 8192 taps respectively, all of them with a hop size of 32 samples. Observe its time and frequency trade-off: the frequency resolution increases with the number of taps at the same time the time resolution decreases, and vice-versa. The right column shows the NoiseBandNet reconstruction using 2048 filters and a synthesis window of 32 samples, maintaining both good time and frequency resolution.

Original (training data)	DDSP_128taps	DDSP_512taps	DDSP_1024taps	DDSP_4096taps	DDSP_8192taps	NoiseBandNet (ours)

Reconstruction

We formally explore the suitability of NoiseBandNet comparing its reconstruction capabilities against four different configurations of the original DDSP time-varying FIR noise synthesiser [1] with a configuration of FIR filter taps of 256 (DDSP_256taps), 512 (DDSP_512taps), 1024 (DDSP_1024taps) and 4096 (DDSP_4096taps). The R code and loss data used to perform the statistical analysis on section IV-B of the paper can be found here.

	Original (training data)	DDSP_256taps	DDSP_512taps	DDSP_1024taps	DDSP_4096taps	NoiseBandNet (ours)
Footsteps
Thunderstorm
Pottery
Knocking
Metal

Creative experiments

Amplitude randomisation

Corresponding to Section V-A of the paper.

Resynthesis (no randomisation)	Stereo generation	Top-k randomisation I	Top-k randomisation II	Frequency shift randomisation I	Frequency shift randomisation II	Both randomisations I	Both randomisations II

Loudness transfer

Corresponding to Section V-B of the paper. Caution: loud.

	Metal impact (training data)	Wilhelm scream (training data)	Electric drill (training data)
Beatbox (target loudness)	Beatbox to metal Mix	Beatbox to scream Mix	Beatbox to drill Mix
Scribbling (target loudness)	Scribbling to metal Mix	Scribbling to scream Mix	Scribbling to drill Mix
Squeaky toy (target loudness)	Squeaky toy to metal Mix	Squeaky toy to scream Mix	Squeaky toy to drill Mix

Training using user-defined control parameters

Corresponding to Section V-C of the paper. Caution: loud.

Metal impact (training data)	Metal impact control I	Metal impact control II	Metal impact control III
Wilhelm scream (training data)	Wilhelm scream control I	Wilhelm scream control II	Wilhelm scream control III
Electric drill (training data)	Electric drill control I	Electric drill control II	Electric drill control III

Training sounds attribution
Footsteps on metal sounds by: Freesound user "Eelke", licensed under CC BY 4.0: https://freesound.org/people/Eelke/sounds/462599/
Thunderstorm sounds by: Freesound user "InspectorJ", licensed under CC BY 4.0: https://freesound.org/people/InspectorJ/sounds/360328/
Pottery sounds by: Freesound user "Tumbleweed3288", licensed under CC0 1.0: https://freesound.org/people/Tumbleweed3288/sounds/381638/ and https://freesound.org/people/Tumbleweed3288/sounds/381548/
Knocking sounds by: Adrián Barahona-Ríos & Sandra Pauletto [2], licensed under CC BY 4.0: https://zenodo.org/record/3668503
Metal sounds by: Freesound user "gokalp_gonen", licensed under CC0 1.0: https://freesound.org/people/gokalp_gonen/sounds/640517/ and https://freesound.org/people/gokalp_gonen/sounds/640518/
Metal impact sounds by: Freesound user "jorickhoofd", licensed under CC BY 4.0: https://freesound.org/people/jorickhoofd/sounds/160045/
Wilhelm scream by: Freesound user "SweetNeo85", licensed under CC Sampling Plus 1.0: https://freesound.org/people/SweetNeo85/sounds/13797/
Drill sounds by: Freesound user "aharri6", licensed under CC Sampling Plus 1.0: https://freesound.org/people/aharri6/sounds/71079/
Beatbox sounds by: Freesound user "VocalPercussion", licensed under CC0 1.0: https://freesound.org/people/VocalPercussion/sounds/245324/
Scribbling sounds by: Freesound user "InspectorJ", licensed under CC BY 4.0: https://freesound.org/people/InspectorJ/sounds/398271/
Squeaky toy sounds by: Freesound user "metrostock99", licensed under CC BY 4.0: https://freesound.org/people/metrostock99/sounds/514701/

References
[1] Engel, Jesse, et al. "DDSP: Differentiable Digital Signal Processing." arXiv preprint arXiv:2001.04643 (2020).
[2] Barahona-Ríos, Adrián and Sandra Pauletto. "Synthesising Knocking Sound Effects Using Conditional WaveGAN." In: Proceedings of the 17th Sound & Music Computing Conference, pp. 450-456, Torino, Italy. 2020.