Not I – A Creative Informatics Small Research Grant Report

Report from Creative Informatics Small Research Grant recipients.

NEWS | 22 MAY 2023

Not I is a research project and short film produced by the creative research studio Unit Test. The film exposes the epistemic limits of deep learning through an investigation into, and attack on, a machine learning model called Speech2Face. The film was first screened as part of the Edinburgh Futures Institute’s Love Machine season of events and funded through a Creative Informatics Small Research Grant.

Deep Learning is a form of Machine Learning that utilises artificial neural networks to find patterns in large amounts of data. Unlike other kinds of machine learning, the features the system uses to analyse the data are not determined, or in fact known, by the researchers. Despite this, its use as a research tool has grown rapidly over the last few years.

Speech2Face is the output of one such research project. It is a self-supervised deep learning model that attempts to generate an image of the face of a speaker based solely on the information provided by their voice. Speech2Face is a target for this project as we consider this kind of work to have fundamental flaws with myriad assumptions concerning race and gender representation as well as vocal character.

Speech2Face is the output of one such research project. It is a self-supervised deep learning model that attempts to generate an image of the face of a speaker based solely on the information provided by their voice. Speech2Face is a target for this project as we consider this kind of work to have fundamental flaws with myriad assumptions concerning race and gender representation as well as vocal character.

Following this, we are introduced to the Speech2Face model by the narrator. Their disembodied voice has guided us through the historical introduction but is suddenly given a figurative human form as their voice is run through the model and the face that is output is animated and lip-synced to the voiceover.

Human speech perception is an audiovisual process. If speech that’s heard can be matched to the movement of lips and face that is seen, then it requires a great deal of cognitive effort to continue to dissociate them. Due to this, the audience quickly assumes that this figure is indeed the narrator.

Next, our newly embodied presenter conducts a didactic deconstruction of the model’s architecture. As we progress through the technical elements of the model (literally flying through at points), many of the assumptions built into the model by the authors become apparent.

Critical attention is paid to how the model draws the modalities of sight and sound into the same representational plane. The authors encode their assumption that faces that look the same should sound the same, and vice-versa, here. It’s through this cross modal process that the model attempts to create a correspondence between faces and voices and facilitates the translation between them.

Critical attention is paid to how the model draws the modalities of sight and sound into the same representational plane. The authors encode their assumption that faces that look the same should sound the same, and vice-versa, here. It’s through this cross modal process that the model attempts to create a correspondence between faces and voices and facilitates the translation between them.

Critical attention is paid to how the model draws the modalities of sight and sound into the same representational plane. The authors encode their assumption that faces that look the same should sound the same, and vice-versa, here. It’s through this cross modal process that the model attempts to create a correspondence between faces and voices and facilitates the translation between them.

Critical attention is paid to how the model draws the modalities of sight and sound into the same representational plane. The authors encode their assumption that faces that look the same should sound the same, and vice-versa, here. It’s through this cross modal process that the model attempts to create a correspondence between faces and voices and facilitates the translation between them.

Effects like this are only possible through creative, practice-based methods. The larger research project that surrounds Not I considers exactly how these methods can serve a “poietic function,” as Joanna Zylinska puts it. The experiential nature of film, and art in general, reifies ideas in ways that language cannot.

The attack was important for us to show, not because we propose it as “the solution” to the problematics raised by these models. It’s neither practical nor accessible to anyone without specialised knowledge. Rather, its inclusion is meant to illustrate the frailty of these systems, demonstrating how narrow the representation they hold of phenomenon can be.

It was also essential for us to look beyond where much of the critical discourse on AI ends – with theoretical critique. The adversarial attack opens a counter data-science design space for us and others to consider what can be done to resist, to mitigate, and fundamentally improve our understanding of these systems through material-computational intervention.

The project that became Not I was initiated in response to an open call for research projects addressing the theme of “Deep Authentic” from the experimental music festival Unsound. As part of their 2021 Discourse Programme, Murad Khan and I presented our initial research into the histories of vocal profiling technologies and the Speech2Face system, as well as a work-in-progress attack on the algorithm.

The Creative Informatics Small Research Grant enabled us to turn this initial research into the film. The grant funded the technical work necessary to complete the adversarial attack, but more importantly, allowed us to hire Vlad Afanasiev, an architect and designer, and Błażej Kotowski, a composer and sound designer, to help us realise it.

Unit Test was co-founded by Martin Disley and Murad Khan. It is a vehicle for collaborative artistic inquiry into computational systems. Not I will be screened as part of the Sound of Deep Fake exhibition at Edinburgh Art Festival throughout the month of August in Inspace Gallery. For inquiries regarding future screenings of Not I, please contact info@unittest.studio.

 

For updates on programmes and events, sign up to our mailing list