Publications

Barahona-Ríos, Adrián and Tom Collins. "SpecSinGAN: Sound Effect Variation Synthesis Using Single-Image GANs." arXiv preprint arXiv:2110.07311, 2021 [Paper] [Video]
Single-image generative adversarial networks learn from the internal distribution of a single training example to generate variations of it, removing the need of a large dataset. In this paper we introduce SpecSinGAN, an unconditional generative architecture that takes a single one-shot sound effect (e.g., a footstep; a character jump) and produces novel variations of it, as if they were different takes from the same recording session. We explore the use of multi-channel spectrograms to train the model on the various layers that comprise a single sound effect. A listening study comparing our model to real recordings and to digital signal processing procedural audio models in terms of sound plausibility and variation revealed that SpecSinGAN is more plausible and varied than the procedural audio models considered, when using multi-channel spectrograms. Sound examples can be found at the project website.

Barahona-Ríos, Adrián and Sandra Pauletto. "Synthesising Knocking Sound Effects Using Conditional WaveGAN." In: Proceedings of the 17th Sound & Music Computing Conference, pp. 450-456, 2020 [Paper] [Video]
In this paper we explore the synthesis of sound effects using conditional generative adversarial networks (cGANs). We commissioned Foley artist Ulf Olausson to record a dataset of knocking sound effects with different emotions and trained a cGAN on it. We analysed the resulting synthesised sound effects by comparing their temporal acoustic features to the original dataset and by performing an online listening test. Results show that the acoustic features of the synthesised sounds are similar to those of the recorded dataset. Additionally, the listening test results show that the synthesised sounds can be identified by people with experience in sound design, but the model is not far from fooling non-experts. Moreover, on average most emotions can be recognised correctly in both recorded and synthesised sounds. Given that the temporal acoustic features of the two datasets are highly similar, we hypothesise that they strongly contribute to the perception of the intended emotions in the recorded and synthesised knocking sounds.

Barahona, Adrián and Sandra Pauletto. “Perceptual Evaluation of Modal Synthesis for Impact-Based Sounds”. In: Proceedings of the 16th Sound & Music Computing Conference, pp. 34–38, 2019 [Paper]
The use of real-time sound synthesis for sound effects can improve the sound design of interactive experiences such as video games. However, synthesized sound effects can be often perceived as synthetic, which hampers their adoption. This paper aims to determine whether or not sounds synthesized using filter-based modal synthesis are perceptually comparable to sounds directly recorded. Sounds from 4 different materials that showed clear modes were recorded and synthesized using filter-based modal synthesis. Modes are the individual sinusoidal frequencies at which objects vibrate when excited. A listening test was conducted where participants were asked to identify, in isolation, whether a sample was recorded or synthesized. Results show that recorded and synthesized samples are indistinguishable from each other. The study outcome proves that, for the analysed materials, filter-based modal synthesis is a suitable technique to synthesize hit sounds in real-time without perceptual compromises.

Co-author

Balla, Martin, et al. "Illuminating Game Space Using MAP-Elites for Assisting Video Game Design." In: 11th AISB Symposium on AI & Games (AI&G), 2021 [Paper]
In this paper we demonstrate the use of Multidimensional Archive of Phenotypic Elites (MAP-Elites), a divergent search algorithm, as a game design assisting tool. The MAP-Elites algorithm allows illumination in the game space instead of just determining a single game setting via objective based optimization. We showed how the game space can be explored by generating a diverse set of game settings, allowing the designers to explore what range of behaviours are possible in their games. The proposed method was applied to the 2D game Cave Swing. We discovered different settings of the game where a Rolling Horizon Evolutionary Algorithm (RHEA) agent behaved differently depending on the selected game parameters. The agent’s performance was plotted against its behaviour for further exploration, which allowed visualizing how the agent performed with selected behaviour traits.

Houel, Malcolm, et al. "Perception of Emotions in Knocking Sounds: An Evaluation Study." In: Proceedings of the 17th Sound & Music Computing Conference, pp. 419-425, 2020 [Paper]
Knocking sounds are highly meaningful everyday sounds. There exist many ways of knocking, expressing important information about the state of the person knocking and their relationship with the other side of the door. In media production, knocking sounds are important storytelling devices: they allow transitions to new scenes and create expectations in the audience. Despite this important role, knocking sounds have rarely been the focus of research. In this study, we create a data set of knocking actions performed with different emotional intentions. We then verify, through a listening test, whether these emotional intentions are perceived through listening to sound alone. Finally, we perform an acoustic analysis of the experimental data set to identify whether emotion-specific acoustic patterns emerge. The results show that emotional intentions are correctly perceived for some emotions. Additionally, the emerging emotion-specific acoustic patterns confirm, at least in part, findings from previous research in speech and music performance.