Project Expo
Project Profile
Project abbreviation: ParlaMint
Project name: ParlaMint: Towards Comparable Parliamentary Corpora
Project coordinator: Maciej Ogrodniczuk (PL-PAS), Petya Osenova (BG-BAS)
Project consortium:
- Main partners: PS-PAS, BG-BAS, SI-IJS, SI-INZ, CZ-UFAL, DE-TUB, UK-UCREL
- Contributing partners: National CLARIN experts: Austria, Basque, Catalonia, Estonia, Finland, Greece, Norway, Portugal, Romania, Sweden, Serbia, Belgium, Croatia, Denmark, France, Hungary, Iceland, Italy, Latvia, Lithuania, Netherlands, Spain, Turkey
Funding: CLARIN ERIC and local in-kind contributions
- ParlaMint I: 135k EUR
- ParlaMint II: 163k EUR
Project duration:
- ParlaMint I: 1 July 2020 - 30 May 2021
- ParlaMint II: 1 Dec 2021 - 30 May 2023
Main key words: Parliamentary corpora, Common TEI-based schema, Rich metadata, Open access
Background of the research topic: National parliamentary data is a verified communication channel between the elected political representatives and society members in any democracy. It needs to be made accessible and comprehensive - especially in times of a global crisis. With the recent advances of artificial intelligence, analytics over unstructured parliamentary data for many languages is rapidly becoming a prerequisite for reliable and trustworthy approaches in checking the veracity of information in contemporary society. One of the most important characteristics of new parliamentary data is its direct correspondence to the most recent events, including the ones with global impact on human health, social life and economics such as the current COVID-19 pandemic. By comparing the data synchronically and diachronically within a cross-lingual context, scientific and civil communities will be able to track pan- European discussion and can be quickly updated on any emerging topic.
Goal of the project: The goals of the ParlaMint project is to turn the existing contemporary diverse national parliamentary data into resources that are:
- Comparable
- Interpretable and
- Highly communicative with respect to the society (researchers, journalists, NGOs, citizens, etc.).
- Compiling a collection of parliamentary datasets (corpora) in a number of languages and in a harmonised format, covering both the current data and older, reference data
- Processing the compiled corpora linguistically
- Indexing the data with popular concordancers so that interested parties can search and extract the relevant comparable information
- Showing through appropriate use cases that the ParlaMint corpora and related technologies as part of the CLARIN resource families serve a variety of societal needs.
- Speaker and party statistics (for instance, who spoke more and on which topic; who changed their mind on a certain topic; which party defends/opposes what proposals, etc.)
- Topic modeling (which topics are most popular at what time; how topics change and interrelate, etc.)
- Time and context-bound social tendencies (tendencies in policy making over time).
Project abstract: Strategy and Data availability: The project will establish a strategy for handling parliamentary data and processing in times of any emergency (COVID-19 is just a showcase). Thus, different reference corpora could be produced with parliamentary records from previous times with global crisis states, e.g. the great economic recession, periods of floods in Europe, the Ebola outbreak etc. Standard development: The Parla-CLARIN encoding standard will be further developed to cover more detailed and specific metadata across languages and parliaments. The corpora will serve as a baseline for further updates. Such uniform updates across the corpora would strongly support various methods of comparative research. From showcasing to real applications: The availability of comparable multilingual parliamentary data (also made visible through concordancers and Parlameter) will boost research in the areas of digital humanities, linguistics, politology, sociology, psychology as well as in all the related branches of sciences.
Publications:
- Corpora: Erjavec T. et al. (2021). Multilingual comparable corpora of parliamentary debates ParlaMint 2.1. Slovenian language resource repository CLARIN.SI. http://hdl.handle.net/11356/1432
- Corpora: Erjavec T. et al. (2021). Linguistically annotated multilingual comparable corpora of parliamentary debates ParlaMint.ana 2.1. Slovenian language resource repository CLARIN.SI. http://hdl.handle.net/11356/1431
- Thesis: Pieters M. (2021). A comparative analysis on the ParlaMint corpus. MSc thesis.
- Article: Fitsilis, F., Mikros, G. (2021). Development and Validation of a Corpus of Written Parliamentary Questions in the Hellenic Parliament. Journal of Open Humanities Data, 7:18.
- Article: Erjavec T. et al. (2021). ParlaMint: Comparable Corpora of European Parliamentary Data. Proceedings of CLARIN Annual Conference 2021.
- Tomaž Erjavec, Maciej Ogrodniczuk, Petya Osenova, Nikola Ljubešić, Kiril Simov, Andrej Pančur, Michał Rudolf, Matyáš Kopp, Starkaður Barkarson Steinþór Steingrímsson, Çağrı Çöltekin, Jesse de Does, Katrien Depuydt, Tommaso Agnoloni, Giulia Venturi, María Calzada Pérez, Luciana D. de Macedo, Costanza Navarretta, Giancarlo Luxardo, Matthew Coole, Paul Rayson, Vaidas Morkevičius, Tomas Krilavičius, Roberts Darģis, Orsolya Ring, Ruben van Heusden, Maarten Marx, and Darja Fišer. The ParlaMint corpora of parliamentary proceedings. Language Resources and Evaluation, 2022. https://doi.org/10.1007/s10579-021-09574-0
