ELRA ELRA
  Home Catalogue
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Languages
Anglais Français
Informations
  • Purchase procedure & Conditions

  • Pricing & user licences

  • How to promote your resources ?

  • Contact Us
  • Catalogue of Language Resources

    The language resources available in this catalogue are distributed into 4 categories : "Speech and Related Resources", "Written Resources", "Terminological Resources", and "Multimodal/Multimedia Resources".

    1/ Spoken LRs

    a - Telephone recordings
    The databases catalogued in this section have been produced with speaker recordings made over the telephone (fixed or mobile) network, or through a microphone. You will find speech resources recorded in various environments, and covering a large number of European and non-European languages, e.g. the databases produced in the framework of the SpeechDat project.

    b - Desktop/Microphone recordings
    The databases catalogued in this section have been produced with speaker recordings made over a microphone, e.g. the databases produced in the framework of the BABEL project databases.

    c - Broadcast Resources
    The databases catalogued in this section have been produced with speaker recordings made over radio, television or internet, such as the Italian Broadcast News Corpus.

    d - Speech Related Resources
    You will find in this section pronunciation and phonetic lexicons, such as BDLEX, PHONOLEX, and MHATLEX databases.

    2/ Written LRs

    a - Corpora
    This section contains monolingual and multilingual corpora, parallel or not, which may also be annotated. A few examples of the kind of resources you will find in this section are e.g. the corpora developed in the framework of the MULTEXT project, the Multilingual and Parallel Corpora (MLCC), French scientific corpora, newspaper corpora in Arabic, etc.

    b - Monolingual lexicons
    The section dedicated to monolingual lexicons contains various types of dictionaries, e.g. a dictionary of French verbs, the Japanese word dictionary, some PAROLE lexicons in many languages, etc.

    c - Multilingual lexicons
    Here you can find either bilingual or multilingual dictionaries and lexicons, such as the EuroWordNet databases.

    3/ Terminological LRs

    Monolingual, bilingual and multilingual terminological databases are available. They cover a large number of specialised domains, e.g. automobile engineering, insurance, linguistics, finance, etc., in a wide variety of languages.

    4/ Multimodal/Multimedia LRs

    The resources you will find in this section have been produced using different modalities, including the speech. An example of such resources is the database produced in the framework of the M2VTS project.


    LATEST UPDATES :

    New Resources
  • W0050 : The CINTIL Corpus – International Corpus of Portuguese
    CINTIL-Corpus Internacional do Português
    is a linguistically interpreted written
    and spoken corpus of European
    Portuguese. It is composed of one
    million annotated tokens, each one of
    which verified by human expert
    annotators. The annotation comprises
    information on part-of-speech, open
    class lemma and inflection, multi-word
    expressions pertaining to the class of
    adverbs and to the closed POS classes,
    and multi-word proper names (for named
    entity recognition). The corpus is
    developed over raw textual materials of
    several types, of which 30% are spoken
    materials.

  • M0050 : The MWN.PT - MultiWordnet of Portuguese
    MWN.PT - MultiWordnet of Portuguese
    (version 1) spans over 17,200 manually
    validated concepts/synsets, linked under
    the semantic relations of hyponymy and
    hypernymy. These concepts are made of
    over 21,000 word senses/word forms and
    16,000 lemmas from both European and
    American variants of Portuguese. They
    are aligned with the translationally
    equivalent concepts of the English
    Princeton WordNet and, transitively, of
    the MultiWordNets of Italian, Spanish,
    Hebrew, Romanian and Latin.

  • M0049 : Basque WordNet
    The Basque WordNet models nouns, verbs
    and adjectives. Each sense is linked to
    a so-called synset (for a total of
    30,281 Synsets). Every synset encodes
    the synonymy relation between (possibly)
    several words (synonyms), having a
    unique meaning, belonging to one and the
    same part of speech (specified in the
    POS tag value), and expressing the same
    lexical meaning. Each synset is related
    to the corresponding synset in the
    English WordNet 1.6. via its
    identification number ID, which includes
    the synset number and the POS tag. The
    only exceptions are newly created
    synsets to account for cultural concepts
    not present in WordNet 1.6.

  • S0300 : SIGNUM Database
    The SIGNUM Database contains both
    isolated and continuous utterances of
    various signers. The corpus was recorded
    on video. For quick random access to
    individual frames, each video clip is
    stored as a sequence of images. The
    vocabulary comprises 450 basic signs in
    German Sign Language (DGS) representing
    different word types. Based on this
    vocabulary, overall 780 sentences were
    constructed. Each sentence ranges from
    two to eleven signs in length. The
    entire corpus was performed once by 25
    native signers of different sexes and
    ages. One of them was chosen to be the
    so-called reference signer. His
    performances were recorded three times.

  • M0048 : LatinWordNet
    LatinWordNet contains information about
    the following aspects of the Latin and
    English lexicon: lexical relations
    between words, semantic relations
    between lexical concepts,
    correspondences between Latin and
    English lexical concepts. LatinWordNet
    covers nouns, verbs, adjectives and
    adverbs, and contains 8,978 synsets in
    correspondence with the English
    equivalents (and with all the
    MultiWordNet-based wordnets).

  • (last update: July 2009)

    Copyright © 2008 ELRA
    ELRACatalogue 0.8.0