Results

Datasets

This section showcases datasets created within the HYBRIDS project. These datasets are designed to support research on disinformation, abusive language, and public discourse analysis. They are periodically updated and made available to the research community to foster collaboration and advance knowledge in the field.

MetaHate: A Unified Dataset for Hate Speech Detection

MetaHate compiles over 1.2 million social media posts from 36 datasets, providing a unified resource for hate speech detection research. Available in TSV format with binary labels, it supports computational linguistics and social media analysis. Access via Hugging Face, with subsets open and full data requiring agreements.

Software and Models

Here, you will find software tools and computational models developed as part of the HYBRIDS project. These resources include AI-driven models for disinformation detection, NLP applications, and hybrid intelligence systems. The section will be continuously updated to provide access to the latest innovations and contributions from the project.

MetaHateBERT

MetaHateBERT is a fine-tuned BERT model specifically designed to detect hate speech in text. It is based on the bert-base-uncased architecture and has been trained for binary text classification, distinguishing between ‘no hate’ and ‘hate’.

CT-BERT-PRCT

A specialized BERT model fine-tuned to detect Population Replacement Conspiracy Theory (PRCT) content across social media platforms. The model demonstrates good performance in identifying both explicit and implicit PRCT narratives, with decent cross-platform and multilingual generalization capabilities.

Llama-3-8B-Distil-MetaHate

Llama-3-8B-Distil-MetaHate is a distilled version of the Llama 3 architecture, fine-tuned for hate speech detection and explanation. Developed by the Information Retrieval Lab at the University of A Coruña, this model employs Chain-of-Thought reasoning to enhance interpretability in hate speech classification tasks. The model aims to not only detect hate speech but also provide explanations for its classifications.

Publications in Journals

This section features peer-reviewed journal articles produced within the HYBRIDS project. These publications contribute to advancing research on disinformation detection, public discourse analysis, and hybrid intelligence applications. Each article undergoes rigorous academic review, ensuring high-quality scientific contributions. The section will be continuously updated to reflect the latest findings and insights from the HYBRIDS consortium.

M. J. Maggini, D. Bassi, P. Piot, G. Dias, P. G. Otero, “A systematic review of automated hyperpartisan news detection,” PLOS ONE, vol. 20, 2024, https://doi.org/10.1371/journal.pone.0316989.
L. Nannini, E. Bonel, D. Bassi, M. J. Maggini, “Beyond phase-in: assessing impacts on disinformation of the EU Digital Services Act,” AI and Ethics, 2024, https://doi.org/10.1007/s43681-024-00467-w.
R. Panchendrarajan, A. Zubiaga, “Claim detection for automated fact-checking: A survey on monolingual, multilingual and cross-lingual research,” Natural Language Processing Journal, 2024, https://doi.org/10.1016/j.nlp.2024.100066, repository: https://doi.org/10.48550/arxiv.2401.11969.
D. Bassi, S. Fomsgaard, M. Pereira-Fariña, “Decoding persuasion: a survey on ML and NLP methods for the study of online persuasion,” Frontiers in Communication, vol. 9, 2024, https://doi.org/10.3389/fcomm.2024.1457433.
R. Panchendrarajan, A. Zubiaga, “Expert Systems with Applications,” Expert Systems with Applications, 2024, https://doi.org/10.1016/j.eswa.2024.124097, repository: https://doi.org/10.48550/arxiv.2401.11972.
E. B. Marino, J. M. Benitez-Baleato, A. S. Ribeiro, “The Polarization Loop: How Emotions Drive Propagation of Disinformation in Online Media—The Case of Conspiracy Theories and Extreme Right Movements in Southern Europe,” Social Sciences, vol. 13, 2024, https://doi.org/10.3390/socsci13110603.

Publications in conferences

Here, you will find conference papers delivered by HYBRIDS researchers at leading international scientific events. These contributions showcase ongoing research developments, methodologies, and results related to disinformation, AI-driven discourse analysis, and hybrid intelligence. By engaging with the broader academic community, these publications foster knowledge exchange and collaboration. The section will be regularly updated with new conference proceedings and presentations.

D. Bassi, M. J. Maggini, R. Vieira, M. Pereira-Fariña, “A Pipeline for the Analysis of User Interactions in YouTube Comments: A Hybridization of LLMs and Rule-Based Methods,” IEEE Xplore, 2024, https://ieeexplore.ieee.org/document/10883781, repository: https://zenodo.org/records/14917710.
E. B. Marino, R. Vieira, J. M. Benitez Baleato, A. S. Ribeiro, K. Laken, “Decoding Sentiments about Migration in Portuguese Political Manifestos (2011, 2015, 2019),” Proceedings of the 16th International Conference on Computational Processing of Portuguese – Vol. 2, vol. 149–159, 2024, https://doi.org/10.18653/v1/2024.propor-2.20.
M. J. Maggini, D. Bassi, V. Morini, G. Rossetti, “Diachronic Political Content Analysis: A Comparative Study of Topics and Sentiments in Echo Chambers and Beyond,” Zenodo, 2024, https://doi.org/10.5281/zenodo.14755298.
M. Pastor, N. Oostdijk, P. Martín-Rodilla, J. Parapar, “Enhancing Discourse Parsing for Local Structures from Social Media with LLM-Generated Data,” Proceedings of the 31st International Conference on Computational Linguistics, 2025, https://aclanthology.org/2025.coling-main.584/.
Y. Li, R. Panchendrarajan, A. Zubiaga, “FactFinders at CheckThat! 2024: Refining Check-worthy Statement Detection with LLMs through Data Pruning,” CLEF 2024: Conference and Labs of the Evaluation Forum, 2024, https://doi.org/10.5281/zenodo.13355043.
K. Laken, “Fralak at SemEval-2024 Task 4: Combining RNN-generated hierarchy paths with simple neural nets for hierarchical multilabel text classification in a multilingual zero-shot setting,” Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), 2024, https://doi.org/10.18653/v1/2024.semeval-1.89.
M. Pastor, E. B. Marino, N. Oostdijk, “La reconnaissance automatique des relations de cohérence RST en français,” Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, 2024, https://aclanthology.org/2024.jeptalnrecital-taln.34/.
M. J. Maggini, E. B. Marino, P. Gamallo, “Leveraging Advanced Prompting Strategies in Llama-8b for Enhanced Hyperpartisan News Detection,” Zenodo, 2024, https://doi.org/10.5281/zenodo.14755147.
P. Piot, P. Martín-Rodilla, J. Parapar, “MetaHate: A Dataset for Unifying Efforts on Hate Speech Detection,” Proceedings of the International AAAI Conference on Web and Social Media, 2024, https://doi.org/10.1609/icwsm.v18i1.31445.
E. Bonel, L. Nannini, D. Bassi, M. J. Maggini, “Position: Machine learning-powered assessments of the EU Digital Services Act aid quantify policy impacts on online harms,” Zenodo, 2024, https://doi.org/10.5281/zenodo.14755218, repository: https://zenodo.org/records/14755218.
R. Bandyopadhyay, D. Assenmacher, J. M. Alonso-Moral, C. Wagner, “Sexism Detection on a Data Diet,” Companion Proceedings of the 16th ACM Web Science Conference, 2024, https://doi.org/10.1145/3630744.3663609.
M. Pastor, N. Oostdijk, “Signals as Features: Predicting Error/Success in Rhetorical Structure Parsing,” Proceedings of the 5th Workshop on Computational Approaches to Discourse (CODI 2024), 2024, https://aclanthology.org/2024.codi-1.13/.
M. Pastor, N. Oostdijk, M. Larson, “The Contribution of Coherence Relations to Understanding Paratactic Forms of Communication in Social Media Comment Sections,” Proceedings of JADT 2024: 17th International Conference on Statistical Analysis of Textual Data, 2024, https://hal.science/hal-04536610.

Deliverables

This section presents the key deliverables produced within the HYBRIDS project, including both scientific and cross-cutting contributions. These documents outline research advancements, methodological developments, and project milestones achieved by the consortium. Deliverables are periodically updated and made available to ensure transparency, facilitate knowledge sharing, and support further research in the field.

WP1: Hybrid Intelligence

WP2: Public Discourse Analysis

WP3: Democracy Threats

WP 4: Training

D4.1 Recommendations to design high-quality Personal Career Development Plans (PCDPs) and its high-quality supervision

WP 5: Impact, dissemination and outreach

D5.1 HYBRIDS website and social media profiles available