Publications

Download the paper.

Abstract:

In this paper we explore the synthesis of sound effects using conditional generative adversarial networks (cGANs). We commissioned Foley artist Ulf Olausson to record a dataset of knocking sound effects with different emotions and trained a cGAN on it. We analysed the resulting synthesised sound effects by comparing their temporal acoustic features to the original dataset and by performing an online listening test. Results show that the acoustic features of the synthesised sounds are similar to those of the recorded dataset. Additionally, the listening test results show that the synthesised sounds can be identified by people with experience in sound design, but the model is not far from fooling non-experts. Moreover, on average most emotions can be recognised correctly in both recorded and synthesised sounds. Given that the temporal acoustic features of the two datasets are highly similar, we hypothesise that they strongly contribute to the perception of the intended emotions in the recorded and synthesised knocking sounds.

Download the paper.

Abstract:

Knocking sounds are highly meaningful everyday sounds. There exist many ways of knocking, expressing important information about the state of the person knocking and their relationship with the other side of the door. In media production, knocking sounds are important storytelling devices: they allow transitions to new scenes and create expectations in the audience. Despite this important role, knocking sounds have rarely been the focus of research. In this study, we create a data set of knocking actions performed with different emotional intentions. We then verify, through a listening test, whether these emotional intentions are perceived through listening to sound alone. Finally, we perform an acoustic analysis of the experimental data set to identify whether emotion-specific acoustic patterns emerge. The results show that emotional intentions are correctly perceived for some emotions. Additionally, the emerging emotion-specific acoustic patterns confirm, at least in part, findings from previous research in speech and music performance.

Download the paper.

Abstract:

The use of real-time sound synthesis for sound effects can improve the sound design of interactive experiences such as video games. However, synthesized sound effects can be often perceived as synthetic, which hampers their adoption. This paper aims to determine whether or not sounds synthesized using filter-based modal synthesis are perceptually comparable to sounds directly recorded. Sounds from 4 different materials that showed clear modes were recorded and synthesized using filter-based modal synthesis. Modes are the individual sinusoidal frequencies at which objects vibrate when excited. A listening test was conducted where participants were asked to identify, in isolation, whether a sample was recorded or synthesized. Results show that recorded and synthesized samples are indistinguishable from each other. The study outcome proves that, for the analysed materials, filter-based modal synthesis is a suitable technique to synthesize hit sounds in real-time without perceptual compromises.