Arkaitz Zubiaga: “We will be exploring how to join forces between automated tools and humans to push the boundaries of online harm”

By Maria Campins, Newtral

The spread of fake news and hate speech online have become one of the main threats to modern democracies. Arkaitz Zubiaga is engaged in research aimed  to tackle disinformation, and is contributing to the HYBRIDS project with the same objective.

Throughout the HYBRIDS initiative, the acronym of Hybrid Intelligence to monitor, promote and analyse transformations in good democracy practices, top 14 institutions will collaborate in the following years to promote different approaches on the basis of an exhaustive analysis of public discourse about crucial global issues, such as health, climate crisis, European scepticism or immigration, which will take into account both traditional media and content published through social networks.

Zubiaga, a senior lecturer at Queen Mary University of London, co-leads the Social Data Science Lab. He has done research projects on how to fight fake news, claim detection, and sexism detection in social networks, among others. He is interested in linking online data with events in the real world, for tracking problematic issues on the Web and social media that can have a damaging effect on individuals or society. Such as hate speech, misinformation and inequality. Throughout HYBRIDS, Zubiaga is going to supervise a Ph.D. project about cross-lingual claim detection for fact-checking.

How can artificial intelligence and human-in-the-loop approaches fight misinformation, toxicity, and polarization?

There has been substantial progress in recent years in developing AI tools to tackle misinformation and offensive language online, which is helpful to mitigate this kind of harmful online content. However, we have not yet achieved a fully effective solution to mitigate these problems. 

To achieve this, we argue that human input can be beneficial to further improve these tools and make them more effective towards online harm. One of the key challenges is in making these models generalizable. In other words, applicable to new kinds of online data sources that have not been seen before by the tools. Where this is proving challenging to date, we will be exploring how to join forces between automated tools and humans to push the boundaries of online harm mitigation tools.

One of your interests is linking online data with events in the real world. How can this come closer to solving problems such as hate speech or inequality?

To tackle the problematic issues that we encounter online, I argue that it is crucial to understand how our society functions offline. Understanding how events develop offline can in turn help us tackle online events. Consequently, if we can build a link between offline and online events, we will have a better understanding of how these events develop, as can be the case for example with online harm.

What are some of the challenges you face in your research?

The biggest challenges are often informed by datasets. A close look at actual data retrieved from, say, online sources, shows you some of the unexpected challenges that you will have to deal with, and hence informs the challenges that you need to tackle when building your AI tools. 

For example, you can build a tool that works reasonably well by automatically fact-checking claims about health, but when you apply that tool to political claims, its performance drops significantly. Achieving generalisability to enable applicability to different domains and scenarios is therefore key to tackling some of the major challenges in building AI tools.

Which questions do you want to solve through the Ph.D. student assigned?

The Ph.D. student that I will be supervising will be working on claim detection for fact-checking. That is, before we run a claim verification tool that determines if a claim is accurate or not by comparing it to certain pieces of evidence, we need to know what claims we need to fact-check. 


If we take social media platforms as examples, we see lots of information on those platforms, some of which are check-worthy (should be fact-checked) but many others are just personal updates or unimportant information that we shouldn’t spend time fact-checking. The Ph.D. student will be working on developing automated tools to enable this selection of claims from online sources, while also enabling the ability to select these check-worthy claims in different languages. Hence, the generalisability of models will again play an important role; if we build a claim detection tool for one language, how can we make sure that it also works effectively for other languages?


What do you aim to achieve with HYBRIDS?

HYBRIDS is an excellent opportunity for networking with other leading researchers across Europe with a common research topic as the objective, while also training the next generation of researchers. The key benefit of HYBRIDS should come from maximizing collaborations across the network to join forces in jointly training research students.