The challenge of developing a system to detect hyperpartisan political news: interview with Pablo Gamallo

By Maria Campins, Newtral

Leading a European project is a challenge, but even more so if you aim to develop the research on how to use hybrid intelligence against an ocean of disinformation spread across borders. This is the role Pablo Gamallo has just assumed, leading the HYBRIDS initiative, a European project to increase research on deep learning techniques to tackle hate speech and hoaxes.

Throughout the HYBRIDS initiative, the acronym of Hybrid Intelligence to monitor, promote and analyse transformations in good democracy practices, top 14 institutions are collaborating in the following years to promote different approaches on the basis of an exhaustive analysis of public discourse about crucial global issues, such as health, climate crisis, European scepticism or immigration, which will take into account both traditional media and content published through social networks.

Gamallo is an expert in computational linguistics at the leading institution of the initiative, CiTIUS – Research Centre on Intelligent Technologies, part of the Universidade de Santiago de Compostela (USC). He is a researcher in Natural Language Processing and Corpus Linguistics, and he has worked as a postdoctoral researcher in the Artificial Intelligence Center of the Universidade Nova de Lisboa.

One of Gamallo’s interests is to detect and fight threats against democracy, developing strategies for detecting fake news and identifying, classifying and analyzing political bots, and he is currently developing projects on semantic relation extraction and the design of hybrid language models applied to sentiment analysis and disinformation, among other NLP tasks. 

What is HYBRIDS and what do you aim to achieve with the project?

I would like us to be able to offer training to a new generation of researchers in the field of detection of threats against democracy, threats such as the dissemination of fake news in the media and the proliferation of hate speech and harassment on minorities in social networks. 

In addition to this general objective, I aim to help researchers to develop hybrid methodologies consisting of introducing symbolic information in neural architectures to improve the automatic process of detecting disinformation. I would like all the participants in the project, young and senior researchers, to learn together to develop, evaluate and use those machine/deep/symbolic learning methods in which artificial and human intelligence meet.

Which problems do you want to solve with this project?

I would like both the studies carried out during the project and the systems developed by us to be useful to professionals in the communication sector. More specifically, I would like our research to help minimize one of the most dangerous social problems we are suffering from nowadays: the fact that disinformation is transmitted much faster than truthful and contrasted information. 

This problem is related to one of the cognitive biases identified and defined by psychologist Dan Ariely, who said that humans are “predictably irrational” and tend to systematically accept fake news. One of these biases is precisely the tendency to seek and interpret information in a way that confirms our prejudices and assumptions. And disinformation is transmitted so fast precisely because it always touches a sensitive key in our belief system.

In addition to this problem, which is of a social nature, the project will also attempt to provide answers to some of the methodological problems of artificial intelligence and linguistic technologies, namely those related to large language models. These models, trained on sophisticated neural architectures with billions

of parameters and fed with texts containing billions of words extracted from the Web, have great difficulties in solving questions related to factuality and truthfulness. Like humans, they also have their biases and can transmit misinformation. 

We believe that it is necessary to propose learning architectures that can be fed, not only with a large amount of unstructured text, but also with structured symbolic knowledge.  We need to find architectures and models that give us more confidence in relation to natural language processing systems based on artificial intelligence. As cognitive scientist Gary Marcus says, we need to build an artificial intelligence we can trust.

What would be the desired outcome achieved by the PhD student on your charge?

The objective of the Ph.D. that I will co-supervise with Professor Gaël Dias, Université de Caen, France, along with the support of J.R Pichel (Imaxin|Software) and E. Di Cesare (OPENPOLIS), is to develop a system for the detection of hyperpartisan political news by retrieving and processing the headlines of digital press. 

Hyperpartisan headlines are those that are extremely biased in favor of a particular ideology or political party. They do not transmit false statements, but they distort the facts to make them appear negative, if they are positive, and vice-versa. To deal with this issue, a thorough study of language and discourse plays a crucial role. 

An interesting example of this phenomenon: a Galician media announced the last minimum wage increase with this headline: “The minimum wage increase will cost 175 million to Galician companies”. The information is not false, but the media, through the verb “to cost”, focuses on an aspect which is negative only for one part of society: companies. By focusing on this aspect, it leaves out other central elements of the news, specifically the amount of money that the workers will receive. 

Although we know that it is not possible to be completely objective, I think that people should consume media that does not manipulate reality with their headlines. So, in this PhD, we intend to define strategies that allow us to cover a good part of the digital press to identify the most hyperpartisan as well as the least manipulative with language. We would like to create, at the end of the thesis, a press monitoring tool, so that the hyperpartisan media we have identified start reducing their tendency to manipulate the information in their headlines.

How can hybrid intelligence systems overcome the shortcomings of existing artificial intelligence methods?

We will try to design and develop natural language processing tools with open source technology, where the human factor is key to help improve the automatic detection of disinformation. Our objective is not fully automating the disinformation detection process, but rather helping experts in communication and social sciences to improve and optimize the protocols and processes involved in this task. 

I think that the identification of disinformation is a too complex task to be solved by fully automatic systems. Another important facet of the project is to deepen the study and analysis of discourse in the public space, focusing on argumentative and rhetorical schemes that build toxic and persuasive narratives. In short, we want to introduce explicit linguistic knowledge into current artificial intelligence technology to help improve its deficiencies, particularly its difficulty in identifying truthful information.

What do you think are the biggest challenges hybrid intelligence faces?

One of the main challenges is to organize fully effective and well complemented multidisciplinary teams.  It is not an easy task to ensure that linguists, computer engineers and experts in political and communication sciences can work together efficiently. 

The other great challenge is being able to open the opaque and inscrutable black-boxes with which the large language models and artificial intelligence systems work. We want these systems to be more transparent and interpretable, by integrating structured knowledge provided by experts in linguistics and social sciences. This symbolic integration is not trivial and will be the great scientific and technological challenge of the project.

What are the most promising investigation lines to solve those problems?

The line of research to be explored is precisely the development of neuro-symbolic systems. In essence, systems that combine the strengths of deep learning based on artificial neural networks with the human-like capabilities of symbolic knowledge and reasoning.