Catalogue
Documentation & Media
About
Contact us
  • Catalogue
    Catalogue
    • Tools & Resources
      Tools & Resources
      • Tools, Services, NLP APIs
      • Datasets and Corpora
      • Models
      • Lexical and Conceptual Resources
      • Grammars
    • Companies & Research Organisations
    • Projects
    • DLE Dashboard
  • Documentation & Media
    Documentation & Media
    • ELG Book
    • Technical Documentation
    • Use Cases and Testimonials
    • Publications
    • LRE Journal Special Issue
    • ELG YouTube Channel
    • Contact us
  • About
    About
    • EU Project
      EU Project
      • Consortium
      • Deliverables
      • Open Calls for Pilot Projects
      • European Language Equality EU Project
    • Community
      Community
      • ELG National Competence Centers
      • European Language Technology Community
      • European Language Technology Newsletter
      • European Language Equality EU Project
    • Events
      Events
      • META-FORUM 2023
      • META-FORUM 2022
      • META-FORUM 2021
      • META-FORUM 2020
      • ELG Tutorial 2020
      • IWLTP 2020
      • META-FORUM 2019
    • News
      News
      • Newsletter

Doubling the database: How research on Digital Language Equality led to 6,000 new resources for the European Language Grid

illustration image of new resources

Over the course of a weekend in the middle of January 2022, the European Language Grid (ELG) doubled in size. More than 6,000 new data resources, tools and services for 87 different languages were added to the ELG platform, pushing the ELG much closer to one of its central objectives: developing into a joint European language technology platform in which ideally all relevant language resources and technologies are registered. With the update, we are now confident that the majority of resources available in Europe can be found in and through the ELG, whether they are corpora, tools, conceptual resources or models. How did that happen? A look at the beginnings of ELG’s sister project, European Language Equality (ELE), might help.

Prospering languages in a digital world

The ELE project’s main goal is to achieve Digital Language Equality in Europe by 2030. According to the preliminary definition, Digital Language Equality describes the state in which all languages have the technological support and situational context necessary for them to continue to exist and to prosper as living languages in the digital age. While this definition paints a clear and desirable picture for the future of multilingualism in Europe, the main work was still lying ahead: developing a strategic research, innovation and implementation agenda and roadmap that leads towards this desired state.

One of the key parts of the strategy agenda is the DLE metric, a measure or quantified index that allows to compare the levels of digital readiness of and across Europe’s languages. This metric combines several factors about each language taken into consideration, such as the number of its speakers, its recognition in the EU, but most importantly the level of technological support it currently receives. In order to suggest how digital language equality can become a reality for all European languages, detailed knowledge about the current state of technological support for each language is necessary. But how does one gather this amount of data for 87 different languages?

Creating the primary platform for European language technology

The task was part of the ELE investigation into the current LT support for Europe’s languages, in which 33 project partners from different countries described the status quo of their respective language, based on empirical data and findings. In addition to these national institutions with expertise in language technology, several associations such as the European Language Equality Network (ELEN) and the European Civil Society Platform (ECSPM) focussed on smaller languages within the European Union. Altogether, the ELE consortium gathered metadata from around 1,000 organisations such as LT companies, universities and research institutions in a total of 87 different languages. 4,147 new data resources and 2,216 new tools and services were identified and their metadata documented.

These are new tools and resources because the approximately 6,000 resources gathered by the ELE consortium had not been available in the European Language Grid yet, which already consisted of more than 5,000 resources from the European LT landscape. Including the additional 6,000 resources collected by ELE, the ELG platform now provides information about more than 11,000 language technology resources – either as ELG-compatible services that can be downloaded and used directly through the ELG, or in the form of metadata including links to the original hosting platform.

All data leads to Athens

The import itself was handled by the Institute of Language and Speech Processing (ILSP) of the “Athena” Research Center in Greece. The team in Athens, which forms part of both projects, coordinated the metadata collection effort, ensured the compatibility with the ELG platform, homogenised and curated all the metadata records to prepare them for the import. The new import includes both public as well as on-demand data and services, hosted directly by their providers or through platforms such as Huggingface or GitHub.

The ELE resource import represents a prime example for the effective collaboration between the two projects and the reason why we consider them sister projects: the development of the strategic research, innovation and implementation agenda and roadmap for full digital language equality requires a comprehensive and empirical overview of the current technological support of Europe’s languages. While the European Language Grid provides exactly this kind of service, the new data in return pushes it much closer towards one of its central objectives in becoming the primary platform for European language technology.

Join the European Language Grid

  • Contact us
  • GitLab
  • LinkedIn
  • Twitter
  • Youtube
European union flag

The European Language Grid has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement № 825627. The European Language Equality project has received funding from the European Union under grant agreement № LC-01641480 – 101018166.

© 2025 ELG Consortium

Terms of Use