Report from Creative Informatics Small Research Grant recipients
NEWS | 22 MAY 2023
Our project aims to build the first ever tools for Gàidhlig chatbots, thereby inspiring research in low-resource languages and reaching populations that would otherwise lack language technologies. Our initial focus was on the creation of a novel dataset specifically designed to train AI models. To this end, we set out to develop a new interface for streamlining the collection of this data, recruit proficient speakers of Scottish Gaelic to provide the required conversational data, and prototype the first Gàidhlig chatbots.
We began by adapting slurk, a chat interface developed by researchers at the University of Potsdam for experiments involving collaborative conversational games. Over the course of about 6 months we created a new kind of experiment, where participants would take time to chat about a museum exhibit that they could both see. The twist, though, was that each participant had access to different information and was playing a different role: one had detailed information about the exhibit and was pretending to be a museum guide, while the other just had a set of definitions for certain keywords and tried to ask questions to learn more about the exhibit.
For our experiments, we worked with Anna Groundwater at the National Museum of Scotland to select appropriate exhibits to present to our participants and craft the detailed informational guides for our pretend museum guides. We chose a variety of exhibits from Scottish history, and started the recruitment. It was this phase where we encountered the biggest challenges of our project. Despite my extensive (Dave Howcroft) experience with corpus creation and psycholinguistic experimentation, the data collection proved far more challenging due to the low-resource settings of the project. There are only about 60 thousand speakers of Scottish Gaelic in Scotland, and fewer than that outside of Scotland. Advertising and reaching Gaelic speaker was a formidable obstacle. We wrote blog posts, sent tweets, and later posted on Mastodon and, with the help of the Napier social media team, we used boosted Facebook posts as well.
Eventually, over the course of another six months, we managed to recruit about 20 of the 100 participants we had hoped to recruit. Aside from the difficulty reaching people and persuading participants to join a two-hour long experiment, we faced technical and scheduling challenges. The first several pairs of participants ran into novel bugs we hadn’t found during our testing, forcing us to update our interface to improve its reliability. The bigger hurdle, however, was finding a time which would work for two participants at the same time and getting them both signed up and prepared in time. We will take these lessons forward with us and have plans to improve our software and our scheduling procedures in future studies.
At the end of the project, we managed to collect dozens of conversations about a dozen museum exhibits in Gaelic. It’s not the 1000 conversations we set out to collect, but it’s a solid start, and will enable us to start experimenting with chatbot development. Our next steps are to understand the patterns in the conversations we collected and implement rule- and AI-based systems for generating similar conversations. This will help us figure out how much more data we need to collect in future expansions of this study. This is important since we’ve begun working on the next grant proposal, to make the first Scottish Gaelic chatbots a reality!
Author & PI: David M. Howcroft (Edinburgh Napier University)
Co-Investigators: Dimitra Gkatzia (ENU) and Will Lamb (U. Of Edinburgh)
Collaborating: Anna Groundwater (National Museum of Scotland)