Synthetic Participants – how to mitigate the impact of bots in qualitative research

A Report from Creative Informatics

RESOURCE | SUZANNE BLACK 25 AUGUST 2023

On 10 August 2023, Creative Informatics held a workshop via Zoom designed to understand how best to invest resources to support innovation in creative AI / machine learning in Scotland. We were thrilled when registrations started flying in and we quickly hit the maximum number of participants we could accommodate. We were less thrilled when, during the workshop, some of those participants were not who they seemed.

From their interactions, we suspected some of the participants were using text and audio chat bots to simulate participation in the workshop.

We noticed odd contributions to the chat, including a series of participants listing pronouns unprompted. There were also participants leaving and rejoining the meeting very frequently. The issue became more apparent when we moved into breakout rooms, where participants reacted to prompts by the facilitators with almost plausible audio and chat responses.

It appeared that some of the participants were using chat bots (probably ChatGPT) to generate responses to our questions, introduce themselves or say they had connection problems. They participated in standard Zoom interactions and, weirdly, the busiest activity was everyone saying goodbye!

[Image Description: Four text bubbles. First says “I really learned and enjoyed this session”. Second says “Thank you soo much I really enjoyed”. Third says “I really enjoyed everything may God bless the organizer of this project”. Fourth says “It’s was great for us to meet nice session”]

[Image Description: Four text bubbles. First says “I really learned and enjoyed this session”. Second says “Thank you soo much I really enjoyed”. Third says “I really enjoyed everything may God bless the organizer of this project”. Fourth says “It’s was great for us to meet nice session”]

We had implemented the usual security measures: the Zoom link was only shared with registered participants and entering the Zoom meeting required a password. Registration was handled through Eventbrite, which implements security checks such as making sure registration emails and Zoom accounts match.

Our downfall was offering a voucher for participation in the event, a measure we implemented to make sure participants did not lose out financially by participating. Drawn by the lure of the voucher, those who wanted the reward without participating in the workshop sent along their audio chat bots to simulate participation.

[Image Description: Two text bubbles. First says “Bye please any incentive ?”. Second says “How about the voucher”]

[Image Description: Two text bubbles. First says “Bye please any incentive ?”. Second says “How about the voucher”]

The result was a very confusing and frustrating experience for the project team, and the tainting of the data gathered as “Without removing all of the bot-generated responses, the data set cannot be used to gain insight into the research question at hand” (Simone, 2019).

We realise that creative AI is a hot-button issue for many, especially creative practitioners for whom their livelihoods may be threatened, and so expected the subject of our survey and workshops to potentially draw some negative attention, but we didn’t anticipate synthetic participants.

BOTS ON ZOOM

Bots are now being used across a multitude of use cases from the legitimate to the dubious. There are bots that will attend meetings and transcribe the discussion or summarise actions and bots that can nominate who will speak next in a meeting. There are bots for managers to delegate tasks like IT support and HR queries and even to monitor the behaviour of Zoom attendees.

Zoom has its own suite of bots that can be employed during meetings. The company recently introduced Zoom IQ, a surveillance software that promises real-time analysis of emotions and engagement during a virtual meeting, and this is despite the many flaws – both ethical and technological – with current emotion tracking techniques (Agarwal, 2022).

The increased prevalence of bots for Zoom meetings comes with a whole host of legal and ethical issues, not least around privacy. As well as skewing the data collected during a data-gathering workshop, any unauthorised recording of meetings by unknown participants is problematic. The bots we had in our workshop were relatively benign compared to the horror stories of Zoom-bombing we heard at the beginning of the Covid-19 Pandemic when people interrupted Zoom meetings with often offensive material (Greenberg, 2021).

This also comes at a time when Zoom planned to use in-meeting data to train its AI technologies, a plan that was rolled back after much public outcry (Peters, 2023).

BOT RIGHTS

We have been very careful to only label participants who exhibited multiple suspicious behaviours as using bots since there is the danger that participants with poor internet connections, who are not confident participating vocally or who struggle with English will be identified as bots incorrectly.

Participating in online activities that are deemed to be outside of the norm has led to some communities being read as bots – being dehumanised – by the digital platforms they use. Andrea Acosta has researched the ways in which fans of the K-pop group BTS, who participate in organised campaigns online to, for example, download enough copies of the group’s singles to get them to the top of the charts, have been configured as “a foreign and orientalized collective invading Western spaces of authenticity” like the Billboard Charts and the Grammy Awards by American media outlets (Acosta, 2023).

For our study, these participants that used bots to generate responses to our prompts and questions did not meet the parameters for a human participant who had given informed consent to participate.

However, future studies may make provision for such synthetic participants! Synthetic data is not new and is becoming more popular, although it is usually created in a more controlled manner as data that is modelled on an existing dataset where

“A computer, using a machine-learning algorithm or a neural network, analyses a real data set and learns about the statistical relationships within it. It then creates a new data set containing different data points than the original, but retaining the same relationships.” (Savage, 2023)

UK Research and Innovation are currently investing in the potential for synthetic data with a dedicated funding call.

SURVEY

We have also encountered participants submitting AI-generated responses to our survey. This is less of a problem as the non-human responses are easy to spot in the answers to open-ended questions. By testing to see if a participant fails to answer a number of such questions in a plausible manner, we can confidently assume the participant is not participating in good faith. And some of the answers state outright that they are being written by a generative AI tool!

The reason why someone might deploy bots to answer a survey are less obvious to us. There was no automatic reward offered beyond entry into a prize draw. Melissa Simone, who writes about her survey attracting large numbers of bots, suggests that another motivation is to skew results or to train bots for surveys offering larger financial rewards. We also find that the topic of AI inspires responses that can be mischievous or playful.

Simone offers advice on how to ward off bots from surveys offering financial incentives (Simone, 2019).

HOW CAN WE GUARD AGAINST THIS?

What can you do to guard against participants using bots taking the places of legitimate participants and distorting data collection in Zoom-based workshops? To keep your attendees and your data safe, here are some suggestions:

Require signed consent forms prior to the event.

During the event, have a moderator who can watch the chat for unusual or disruptive contributions and remove those attendees.

Require an introduction from each participant and remove those who don’t participate.

Be aware that these are not foolproof tests for bots; some participants may have unstable internet connections, for example.

The issue of financial incentives remains difficult, but think carefully about the language you use when advertising such events.

If you have any other ideas, please do let us know.

REFERENCES

Acosta, A. (2023, February 23). Bots and Binaries: On the Failure of Human Verification – Post45. https://post45.org/2023/02/bots-and-binaries-on-the-failure-of-human-verification/

Agarwal, P. (2022, December 31). Emotional AI Is No Substitute for Empathy. Wired UK. https://www.wired.co.uk/article/empathy-artificial-intelligence

Greenberg, A. (2021, February 3). Why Insider ‘Zoom Bombs’ Are So Hard to Stop. Wired. https://www.wired.com/story/zoombomb-inside-jobs/

Peters, J. (2023, August 11). Zoom rewrites its policies to make clear that your videos aren’t used to train AI tools. The Verge. https://www.theverge.com/2023/8/11/23828649/zoom-communications-like-data-train-ai-artificial-intelligence-models

Savage, N. (2023). Synthetic data could be better than real data. Nature. https://doi.org/10.1038/d41586-023-01445-8

Simone, M. (2019, November 25). How to Battle the Bots Wrecking Your Online Study. Behavioral Scientist. https://behavioralscientist.org/how-to-battle-the-bots-wrecking-your-online-study/

 

For updates on programmes and events, sign up to our mailing list