ELRA ELRA
  Home Catalogue
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
 
Use keywords to find the product you are looking for.
Languages
Anglais Français
Informations
  • Purchase procedure & Conditions

  • Pricing & user licences

  • How to promote your resources ?

  • Contact Us
  • Catalogue of Language Resources Catalogue of Language Resources

    The language resources available in this catalogue are distributed into 4 categories : "Speech and Related Resources", "Written Resources", "Terminological Resources", and "Multimodal/Multimedia Resources".

    1/ Spoken LRs

    a - Telephone recordings
    The databases catalogued in this section have been produced with speaker recordings made over the telephone (fixed or mobile) network, or through a microphone. You will find speech resources recorded in various environments, and covering a large number of European and non-European languages, e.g. the databases produced in the framework of the SpeechDat project.

    b - Desktop/Microphone recordings
    The databases catalogued in this section have been produced with speaker recordings made over a microphone, e.g. the databases produced in the framework of the BABEL project databases.

    c - Broadcast Resources
    The databases catalogued in this section have been produced with speaker recordings made over radio, television or internet, such as the Italian Broadcast News Corpus.

    d - Speech Related Resources
    You will find in this section pronunciation and phonetic lexicons, such as BDLEX, PHONOLEX, and MHATLEX databases.

    2/ Written LRs

    a - Corpora
    This section contains monolingual and multilingual corpora, parallel or not, which may also be annotated. A few examples of the kind of resources you will find in this section are e.g. the corpora developed in the framework of the MULTEXT project, the Multilingual and Parallel Corpora (MLCC), French scientific corpora, newspaper corpora in Arabic, etc.

    b - Monolingual lexicons
    The section dedicated to monolingual lexicons contains various types of dictionaries, e.g. a dictionary of French verbs, the Japanese word dictionary, some PAROLE lexicons in many languages, etc.

    c - Multilingual lexicons
    Here you can find either bilingual or multilingual dictionaries and lexicons, such as the EuroWordNet databases.

    3/ Terminological LRs

    Monolingual, bilingual and multilingual terminological databases are available. They cover a large number of specialised domains, e.g. automobile engineering, insurance, linguistics, finance, etc., in a wide variety of languages.

    4/ Multimodal/Multimedia LRs

    The resources you will find in this section have been produced using different modalities, including the speech. An example of such resources is the database produced in the framework of the M2VTS project.

    New Resources
  • S0277 : SpeechDat Galician Database for the Fixed Telephone Network
    The SpeechDat Galician Database for the
    Fixed Telephone Network contains the
    recordings of 653 speakers of Galician
    recorded over the fixed telephone
    network. Each speaker uttered around 44
    read and spontaneous items.

  • S0278 : SmartWeb Handheld Corpus (SHC)
    This corpus contains recordings spoken
    by 156 speakers in a human-machine query
    situation. Users were asked to solve
    several tasks with a spoken query system
    to the WWW using a smart phone as
    portable device in natural environments
    (office, hall, restaurant, street).
    Recorded channels are the Bluetooth
    headset over UMTS (telephone quality),
    the Bluetooth headset and an additional
    collar microphone in high quality. See
    also ELRA-S0279 and ELRA-S0280.

  • S0279 : SmartWeb Motorbike Corpus (SMC)
    This corpus contains recordings spoken
    by 36 speakers in a human-machine query
    situation on a running motor cycle
    (BMW). Bikers were asked to solve
    several tasks with a spoken query system
    to the WWW using an integrated system
    connected to a speech server via an UMTS
    connection. Recorded channels are the
    Bluetooth helmet microphone over UMTS
    (telephone quality), and - partly - the
    Bluetooth helmet microphone and an
    additional neck microphone in high
    quality. See also ELRA-S0278 and
    ELRA-S0280.

  • S0280 : SmartWeb Video Corpus (SVC)
    This multimodal corpus contains 99
    recordings each containing a
    human-human-machine dialogue: one
    speaker (which is being recorded)
    interacts with a human partner as well
    with a dialogue system via a smart phone
    (SmartWeb system). See also ELRA-S0278
    and ELRA-S0279.

  • S0276 : Swedish EUROM1 (EUROM1_S)
    EUROM1 is the first really multilingual
    speech database produced in Europe. Over
    60 speakers per language pronounced
    numbers, sentences, isolated words using
    close talking microphone.

  • (last update: August 2008)

    Copyright © 2006 ELRA
    ELRACatalogue 0.8.0