National Competence Centre Spain

The Languages of Spain

Spanish is the second most spoken language in the world. It has around 480 million native speakers and 75 million speakers, who learned Spanish as a second language. Although Spanish developed in Spain from the former Vulgar Latin, the most native speakers live in Latin-America. Along with the official language Spanish, Basque, Catalan and Galician are co-official languages in some regions of Spain. Basque is spoken in a small region of the Western Pyrenees. With only approximately 750.000 speakers, it is officially classified as a “vulnerable” language on the UNESCO Map of the world’s languages in danger. Catalan is spoken by approx. ten million citizen in the region of Catalonia, the Balearic islands and Valencia. Furthermore, it is the only official language of Andorra. Galician is spoken by approx. 2.4 million native speakers in Galicia and little communities in other regions of Spain, in some European countries and in America. Spanish is a Romance language of the Ibero branch. There is much variation between the European and the Latin-American version.

Features of Spanish:

  • The inflectional system is limited for the declination of nouns, adjectives and determiners, but the conjugation of verbs produces over 50 word forms per verb.
  • Spanish is an SVO-language, but the word order deviates sometimes from the ordinary order.
  • Direct and indirect object pronouns are often used, although they are redundant in many situations.

Basque is the only pre-Indo-European language of Western Europe. The only known related language is the extinct language Aquitanian. Although Basque is spoken by a small community, it has many dialectal variations. The six commonly accepted dialects differ on the lexical, phonetic, morphological and prosodic level.

Features of Bask:

  • Basque is an agglutinating language. Grammatical and lexical morphemes with single meanings are attached and form a long prosodic unit with one root and several affixes, which are clearly assignable to their function.
  • Moreover, it is an ergative-absolutive-language. The cases ergative and absolutive have the function of marking subject and direct object. Within sentences with intransitive verbs, the absolutive marks the subject, but in sentences with transitive verbs the absolutive marks the direct object and the ergative the subject.

Catalan belongs to the Romance language family. The languages closest related are Italian and French. The regional variations are classified in five main dialects, which differ in the pronunciation of the vowels, the used functional words and some vocabulary.

Features of Catalan:

  • Catalan is an SVO-language, but the word order is changed sometimes by the use of clitic elements.
  • It is also a pro-drop-language. Usually, the subject pronoun can be skipped, because the verb form contains the information about the subject.
  • The verb and the auxiliary verb can not be seperated in a sentence. They have to be adjacent words in every sentence.

Galician belongs to the Western-Ibero branch of the Romance language family. It is closely related to Portuguese. The dialects are grouped in three main dialects: Eastern, Central and Western Galician dialect. They differ mostly on phonological and morphological level.

Features of Galician:

  • The stress of syllables is a distinctive feature of words.
  • It is also an SVO-language with clitic elements, which can change the sentence structure.
  • The passive form of verbs is rarely used in daily life. Instead, the speakers use an inverted word order or the active form with a third reflexive pronoun or an impersonal structure with a verb, inflected as a third person singular, and the pronoun “se”, but without a subject.

NCC Lead Spain

Dr. Marta Villegas has been working for more than 25 years as a researcher in the field of Natural Language Processing. First in the Fundacio Bosch i Gimpera (University of Barcelona), later in the Universitat Pompeu Fabra, the Universitat Autònoma de Barcelona and, more recently in the Centro Nacional Investigaciones Oncológicas (CNIO) and the Barcelona Supercomputing Center. She has been involved in more than 15 EU projects such as OpenMinTeD, CLARIN, DASISH and META-NET, Panacea among many others. Currently she is co-leading the Text Mining Unit at the Barcelona Supercomputing Center and is involved in a national initiative led by the Secretary of Digitalisation and Artificial Intelligence to promote the use of AI and Language Technologies in Spain within the framework of PLan-TL. She is also leading the BSC participation in the IctusNet (an Interreg Sudoe Program); the recently approved EU projects IntelCOMP project (EU project 101004870) and ELE (European Language Equality), the initiative promoted by the Catalan Government for the development of resource infrastructures for the Catalan language and the collaborative project with IBM.

Dr. Marta Villegas

Current National Initiatives

  • The Plan for the Promotion of LT was approved in 2015 to promote the development of NLP, automatic translation and conversational systems in Spanish and co-official languages in areas like health, justice, and technology watch. It has focused on the production of resources and basic tools for Spanish and other languages in Spain.

Wikipedia contributors. (2020, July 6). Indo-European languages. In Wikipedia, The Free Encyclopedia. Retrieved 17:00, July 6, 2020, from https://en.wikipedia.org/wiki/Indo-European_languages.

Wikipedia contributors. (2020, July 6). Spanish language. In Wikipedia, The Free Encyclopedia. Retrieved 17:00, July 6, 2020, from https://en.wikipedia.org/wiki/Spanish_language.

Wikipedia contributors. (2020, July 5). Basque language. In Wikipedia, The Free Encyclopedia. Retrieved 17:00, July 6, 2020, from https://en.wikipedia.org/wiki/Basque_language.

Wikipedia contributors. (2020, July 5). Catalan language. In Wikipedia, The Free Encyclopedia. Retrieved 17:00, July 6, 2020, from https://en.wikipedia.org/wiki/Catalan_language.

Wikipedia contributors. (2020, June 24). Galician language. In Wikipedia, The Free Encyclopedia. Retrieved 10:00, July 7, 2020, from https://en.wikipedia.org/wiki/Galician_language.

Events

2021
9th National ELG Workshop: Spain symbol of elg in colour National workshop Spain 23 September

META-NET White Paper on Spanish, Basque, Catalan and Galician

Maite Melero, Toni Badia, and Asunción Moreno. La lengua española en la era digital - The Spanish Language in the Digital Age. META-NET White Paper Series: Europe's Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, 9 2012. Georg Rehm and Hans Uszkoreit (series editors).
Full text of this META-NET White Paper (PDF)
Additional information on this META-NET White Paper

Cover of Spanish Whitepaper

Inmaculada Hernáez, Eva Navas, Igor Odriozola, Kepa Sarasola, Arantza Diaz de Ilarraza, Igor Leturia, Araceli Diaz de Lezana, Beñat Oihartzabal, and Jasone Salaberria. Euskara Aro Digitalean - The Basque Language in the Digital Age. META-NET White Paper Series: Europe's Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, 9 2012. Georg Rehm and Hans Uszkoreit (series editors).
Full text of this META-NET White Paper (PDF)
Additional information on this META-NET White Paper

Cover of Basque Whitepaper

Asunción Moreno, Núria Bel, Eva Revilla, Emília Garcia, and Sisco Vallverdú. La llengua catalana a l'era digital - The Catalan Language in the Digital Age. META-NET White Paper Series: Europe's Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, 9 2012. Georg Rehm and Hans Uszkoreit (series editors).
Full text of this META-NET White Paper (PDF)
Additional information on this META-NET White Paper

Cover of Catalan Whitepaper

Carmen García-Mateo and Montserrat Arza Rodríguez. O idioma galego na era dixital - The Galician Language in the Digital Age. META-NET White Paper Series: Europe's Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, 9 2012. Georg Rehm and Hans Uszkoreit (series editors).
Full text of this META-NET White Paper (PDF)
Additional information on this META-NET White Paper

Cover of Galician Whitepaper

Availability of Tools and Resources for Spanish, Basque, Catalan and Galician (as of 2012)

The following table illustrates the support of the Spanish language through speech technologies, machine translation, text analytics and language resources.

Speech technologies Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Machine translation Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Text analytics Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Language resources Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support

The following table illustrates the support of the Basque language through speech technologies, machine translation, text analytics and language resources.

Speech technologies Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Machine translation Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Text analytics Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Language resources Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support

The following table illustrates the support of the Catalan language through speech technologies, machine translation, text analytics and language resources.

Speech technologies Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Machine translation Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Text analytics Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Language resources Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support

The following table illustrates the support of the Galician language through speech technologies, machine translation, text analytics and language resources.

Speech technologies Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Machine translation Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Text analytics Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support
Language resources Excellent
support
Good
support
Moderate
support
Fragmentary
support
Weak/no
support