The fourth annual ELG conference



Together Towards Digital Language Equality

Brussels, Belgium
Hybrid conference

Berlin Skyline
© Adobe Stock – Sergij Figurnyi

Project Expo

Project Profile

Project abbreviation: ParlaMint

Project name: ParlaMint: Towards Comparable Parliamentary Corpora

Project coordinator: Maciej Ogrodniczuk (PL-PAS), Petya Osenova (BG-BAS)

Project consortium:

  • Contributing partners: National CLARIN experts: Austria, Basque, Catalonia, Estonia, Finland, Greece, Norway, Portugal, Romania, Sweden, Serbia, Belgium, Croatia, Denmark, France, Hungary, Iceland, Italy, Latvia, Lithuania, Netherlands, Spain, Turkey

Funding: CLARIN ERIC and local in-kind contributions

  • ParlaMint I: 135k EUR
  • ParlaMint II: 163k EUR

Project duration:

  • ParlaMint I: 1 July 2020 - 30 May 2021
  • ParlaMint II: 1 Dec 2021 - 30 May 2023

Main key words: Parliamentary corpora, Common TEI-based schema, Rich metadata, Open access

Background of the research topic: National parliamentary data is a verified communication channel between the elected political representatives and society members in any democracy. It needs to be made accessible and comprehensive - especially in times of a global crisis. With the recent advances of artificial intelligence, analytics over unstructured parliamentary data for many languages is rapidly becoming a prerequisite for reliable and trustworthy approaches in checking the veracity of information in contemporary society. One of the most important characteristics of new parliamentary data is its direct correspondence to the most recent events, including the ones with global impact on human health, social life and economics such as the current COVID-19 pandemic. By comparing the data synchronically and diachronically within a cross-lingual context, scientific and civil communities will be able to track pan- European discussion and can be quickly updated on any emerging topic.

Goal of the project: The goals of the ParlaMint project is to turn the existing contemporary diverse national parliamentary data into resources that are:

  • Comparable
  • Interpretable and
  • Highly communicative with respect to the society (researchers, journalists, NGOs, citizens, etc.).
The project provides data for focused observations on trends, opinions, decisions on lockdowns and restrictive measures as well as on the consequences with respect to health, medical care systems, employment, etc. in times of emergencies. For the ParlaMint project, the emergency case is the COVID-19 pandemic. However, the methodology is scalable to other events as well, such as economic crises, environmental issues, etc. Thus, the main tasks are:
  • Compiling a collection of parliamentary datasets (corpora) in a number of languages and in a harmonised format, covering both the current data and older, reference data
  • Processing the compiled corpora linguistically
  • Indexing the data with popular concordancers so that interested parties can search and extract the relevant comparable information
  • Showing through appropriate use cases that the ParlaMint corpora and related technologies as part of the CLARIN resource families serve a variety of societal needs.
Thus, observations over democratic processes are approached through parliaments viewed through the following related strategies:
  • Speaker and party statistics (for instance, who spoke more and on which topic; who changed their mind on a certain topic; which party defends/opposes what proposals, etc.)
  • Topic modeling (which topics are most popular at what time; how topics change and interrelate, etc.)
  • Time and context-bound social tendencies (tendencies in policy making over time).


Project abstract: Strategy and Data availability: The project will establish a strategy for handling parliamentary data and processing in times of any emergency (COVID-19 is just a showcase). Thus, different reference corpora could be produced with parliamentary records from previous times with global crisis states, e.g. the great economic recession, periods of floods in Europe, the Ebola outbreak etc. Standard development: The Parla-CLARIN encoding standard will be further developed to cover more detailed and specific metadata across languages and parliaments. The corpora will serve as a baseline for further updates. Such uniform updates across the corpora would strongly support various methods of comparative research. From showcasing to real applications: The availability of comparable multilingual parliamentary data (also made visible through concordancers and Parlameter) will boost research in the areas of digital humanities, linguistics, politology, sociology, psychology as well as in all the related branches of sciences.


  • Corpora: Erjavec T. et al. (2021). Multilingual comparable corpora of parliamentary debates ParlaMint 2.1. Slovenian language resource repository CLARIN.SI.
  • Corpora: Erjavec T. et al. (2021). Linguistically annotated multilingual comparable corpora of parliamentary debates ParlaMint.ana 2.1. Slovenian language resource repository CLARIN.SI.
  • Thesis: Pieters M. (2021). A comparative analysis on the ParlaMint corpus. MSc thesis.
  • Article: Fitsilis, F., Mikros, G. (2021). Development and Validation of a Corpus of Written Parliamentary Questions in the Hellenic Parliament. Journal of Open Humanities Data, 7:18.
  • Article: Erjavec T. et al. (2021). ParlaMint: Comparable Corpora of European Parliamentary Data. Proceedings of CLARIN Annual Conference 2021.
  • Tomaž Erjavec, Maciej Ogrodniczuk, Petya Osenova, Nikola Ljubešić, Kiril Simov, Andrej Pančur, Michał Rudolf, Matyáš Kopp, Starkaður Barkarson Steinþór Steingrímsson, Çağrı Çöltekin, Jesse de Does, Katrien Depuydt, Tommaso Agnoloni, Giulia Venturi, María Calzada Pérez, Luciana D. de Macedo, Costanza Navarretta, Giancarlo Luxardo, Matthew Coole, Paul Rayson, Vaidas Morkevičius, Tomas Krilavičius, Roberts Darģis, Orsolya Ring, Ruben van Heusden, Maarten Marx, and Darja Fišer. The ParlaMint corpora of parliamentary proceedings. Language Resources and Evaluation, 2022.