Language data and resources

Corpuses, language descriptions and lexical/conceptual resources


Structured collections of pieces of data (textual, audio, video, multimodal/multimedia, etc.), selected according to specific criteria external to the data, such as size, type of language, type of text producers or expected audience, etc.


Resources created through a training process involving an algorithm and the training data to learn from; examples include translation models, speech models, transformers, n-gram models, etc.

Lexical/Conceptual resources

Resources such as terminological glossaries, word lists, semantic lexica, ontologies, etc., organized on the basis of lexical or conceptual units (lexical items, terms, concepts, phrases, etc.) with their supplementary information e.g., grammatical, semantic, statistical information, etc.

Computational grammars

Resources composed of rules representing the structure of a language.

Latest added

Recently updated