Publications

Barahona-Ríos, Adrián, and Tom Collins. "NoiseBandNet: Controllable Time-Varying Neural Synthesis of Sound Effects Using Filterbanks." In: IEEE/ACM Transactions on Audio, Speech, and Language Processing 32 (2024): 1573-1585, 2024 [Paper]
Controllable neural audio synthesis of sound effects is a challenging task due to the potential scarcity and spectro-temporal variance of the data. Differentiable digital signal processing (DDSP) synthesisers have been successfully employed to model and control musical and harmonic signals using relatively limited data and computational resources. Here we propose NoiseBandNet, an architecture capable of synthesising and controlling sound effects by filtering white noise through a filterbank, thus going further than previous systems that make assumptions about the harmonic nature of sounds. We evaluate our approach via a series of experiments, modelling footsteps, thunderstorm, pottery, knocking, and metal sound effects. Comparing NoiseBandNet audio reconstruction capabilities to four variants of the DDSP-filtered noise synthesiser, NoiseBandNet scores higher in nine out of ten evaluation categories, establishing a flexible DDSP method for generating time-varying, inharmonic sound effects of arbitrary length with both good time and frequency resolution. Finally, we introduce some potential creative uses of NoiseBandNet, by generating variations, performing loudness transfer, and by training it on user-defined control curves.

Barahona-Ríos, Adrián and Tom Collins. "SpecSinGAN: Sound Effect Variation Synthesis Using Single-Image GANs." In: Proceedings of the 19th Sound & Music Computing Conference, Saint-Étienne, France, 2022 [Paper] [Video]
Single-image generative adversarial networks learn from the internal distribution of a single training example to generate variations of it, removing the need of a large dataset. In this paper we introduce SpecSinGAN, an unconditional generative architecture that takes a single one-shot sound effect (e.g., a footstep; a character jump) and produces novel variations of it, as if they were different takes from the same recording session. We explore the use of multi-channel spectrograms to train the model on the various layers that comprise a single sound effect. A listening study comparing our model to real recordings and to digital signal processing procedural audio models in terms of sound plausibility and variation revealed that SpecSinGAN is more plausible and varied than the procedural audio models considered, when using multi-channel spectrograms. Sound examples can be found at the project website.

Barahona-Ríos, Adrián and Sandra Pauletto. "Synthesising Knocking Sound Effects Using Conditional WaveGAN." In: Proceedings of the 17th Sound & Music Computing Conference, pp. 450-456, 2020 [Paper] [Video]
In this paper we explore the synthesis of sound effects using conditional generative adversarial networks (cGANs). We commissioned Foley artist Ulf Olausson to record a dataset of knocking sound effects with different emotions and trained a cGAN on it. We analysed the resulting synthesised sound effects by comparing their temporal acoustic features to the original dataset and by performing an online listening test. Results show that the acoustic features of the synthesised sounds are similar to those of the recorded dataset. Additionally, the listening test results show that the synthesised sounds can be identified by people with experience in sound design, but the model is not far from fooling non-experts. Moreover, on average most emotions can be recognised correctly in both recorded and synthesised sounds. Given that the temporal acoustic features of the two datasets are highly similar, we hypothesise that they strongly contribute to the perception of the intended emotions in the recorded and synthesised knocking sounds.

Barahona, Adrián and Sandra Pauletto. “Perceptual Evaluation of Modal Synthesis for Impact-Based Sounds”. In: Proceedings of the 16th Sound & Music Computing Conference, pp. 34–38, 2019 [Paper]
The use of real-time sound synthesis for sound effects can improve the sound design of interactive experiences such as video games. However, synthesized sound effects can be often perceived as synthetic, which hampers their adoption. This paper aims to determine whether or not sounds synthesized using filter-based modal synthesis are perceptually comparable to sounds directly recorded. Sounds from 4 different materials that showed clear modes were recorded and synthesized using filter-based modal synthesis. Modes are the individual sinusoidal frequencies at which objects vibrate when excited. A listening test was conducted where participants were asked to identify, in isolation, whether a sample was recorded or synthesized. Results show that recorded and synthesized samples are indistinguishable from each other. The study outcome proves that, for the analysed materials, filter-based modal synthesis is a suitable technique to synthesize hit sounds in real-time without perceptual compromises.

Co-author

Sandra Pauletto, Adrián Barahona-Ríos, Vincenzo Madaghiele and Yann Seznec. "Sonifying Energy Consumption Using SpecSinGAN" In: Proceedings of the 20th Sound and Music Computing Conference, Stockholm, Sweden, 2023. [Paper]
In this paper we present a system for the sonification of the electricity drawn by different household appliances. The system uses SpecSinGAN as the basis for the sound design, which is an unconditional generative architecture that takes a single one-shot sound effect (e.g., a fire crackle) and produces novel variations of it. SpecSinGAN is based on single-image generative adversarial networks that learn from the internal distribution of a single training example (in this case the spectrogram of the sound file) to generate novel variations of it, removing the need of a large dataset. In our system, we use a python script in a Raspberry PI to receive the data of the electricity drawn by an appliance via a Smart Plug. The data is then sent to a Pure Data patch via Open Sound Control. The electricity drawn is mapped to the sound of fire, which is generated in real-time using Pure Data by mixing different variations of four fire sounds - a fire crackle, a low end fire rumble, a mid level rumble, and hiss - which were synthesised offline by SpecSinGAN. The result is a dynamic fire sound that is never the same, and that grows in intensity depending on the electricity data. The density of the crackles and the level of the rumbles increase with the electricity data. Our testing of the system in two households, with different appliances confirms that the sonification works well and intuitively in increasing awareness about the energy consumed by different appliances. This sonification is particularly useful in drawing attention to ``invisible" energy consumption. Finally, we discuss results and future work.

Balla, Martin, et al. "Illuminating Game Space Using MAP-Elites for Assisting Video Game Design." In: 11th AISB Symposium on AI & Games (AI&G), 2021 [Paper]
In this paper we demonstrate the use of Multidimensional Archive of Phenotypic Elites (MAP-Elites), a divergent search algorithm, as a game design assisting tool. The MAP-Elites algorithm allows illumination in the game space instead of just determining a single game setting via objective based optimization. We showed how the game space can be explored by generating a diverse set of game settings, allowing the designers to explore what range of behaviours are possible in their games. The proposed method was applied to the 2D game Cave Swing. We discovered different settings of the game where a Rolling Horizon Evolutionary Algorithm (RHEA) agent behaved differently depending on the selected game parameters. The agent’s performance was plotted against its behaviour for further exploration, which allowed visualizing how the agent performed with selected behaviour traits.

Houel, Malcolm, et al. "Perception of Emotions in Knocking Sounds: An Evaluation Study." In: Proceedings of the 17th Sound & Music Computing Conference, pp. 419-425, 2020 [Paper]
Knocking sounds are highly meaningful everyday sounds. There exist many ways of knocking, expressing important information about the state of the person knocking and their relationship with the other side of the door. In media production, knocking sounds are important storytelling devices: they allow transitions to new scenes and create expectations in the audience. Despite this important role, knocking sounds have rarely been the focus of research. In this study, we create a data set of knocking actions performed with different emotional intentions. We then verify, through a listening test, whether these emotional intentions are perceived through listening to sound alone. Finally, we perform an acoustic analysis of the experimental data set to identify whether emotion-specific acoustic patterns emerge. The results show that emotional intentions are correctly perceived for some emotions. Additionally, the emerging emotion-specific acoustic patterns confirm, at least in part, findings from previous research in speech and music performance.