Filter

Subjects

Content Types

Countries

AID systems

API

Certificates

Data access

Data access restrictions

Database access

Database access restrictions

Database licenses

Data licenses

Data upload

Data upload restrictions

Enhanced publication

Institution responsibility type

Institution type

Keywords

Metadata standards

PID systems

Provider types

Quality management

Repository languages

Software

Syndications

Repository types

Versioning

  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
Found 64 result(s)
The Language Bank features text and speech corpora with different kinds of annotations in over 60 languages. There is also a selection of tools for working with them, from linguistic analyzers to programming environments. Corpora are also available via web interfaces, and users can be allowed to download some of them. The IP holders can monitor the use of their resources and view user statistics.
PORTULAN CLARIN Research Infrastructure for the Science and Technology of Language, belonging to the Portuguese National Roadmap of Research Infrastructures of Strategic Relevance, and part of the international research infrastructure CLARIN ERIC
The repository is part of the National Research Data Infrastructure initiative Text+, in which the University of Tübingen is a partner. It is housed at the Department of General and Computational Linguistics. The infrastructure is maintained in close cooperation with the Digital Humanities Centre, which is a core facility of the university, colaborating with the library and computing center of the university. Integration of the repository into the national CLARIN-D and international CLARIN infrastructures gives it wide exposure, increasing the likelihood that the resources will be used and further developed beyond the lifetime of the projects in which they were developed. Among the resources currently available in the Tübingen Center Repository, researchers can find widely used treebanks of German (e.g. TüBa-D/Z), the German wordnet (GermaNet), the first manually annotated digital treebank (Index Thomisticus), as well as descriptions of the tools used by the WebLicht ecosystem for natural language processing.
Iceland joined CLARIN ERIC on February 1st, 2020, after having been an observer since November 2018. The Ministry of Education, Science and Culture assigned The Árni Magnússon Institute for Icelandic Studies the role of leading partner in the Icelandic National Consortium and appointed Professor Emeritus Eiríkur Rögnvaldsson as National Coordinator, later replaced by Starkaður Barkarson, a project manager at The Árni Magnússon Institute. Most of the relevant institutions participate in the CLARIN-IS National Consortium. The Árni Magnússon Institute has already established a Metadata Providing Centre (CLARIN C-Centre) which hosts metadata for Icelandic language resources and makes them available through the Virtual Language Observatory. The aim is to establish a Service Providing Centre (CLARIN B-Centre) which will provide both service and access to resources and knowledge.
The focus of CLARIN INT Portal is on resources that are relevant to the lexicological study of the Dutch language and on resources relevant for research in and development of language and speech technology. For Example: lexicons, lexical databases, text corpora, speech corpora, language and speech technology tools, etc. The resources are: Cornetto-LMF (Lexicon Markup Framework), Corpus of Contemporary Dutch (Corpus Hedendaags Nederlands), Corpus Gysseling, Corpus VU-DNC (VU University Diachronic News text Corpus), Dictionary of the Frisian Language (Woordenboek der Friese Taal), DuELME-LMF (Lexicon Markup Framework), Language Portal (Taalportaal), Namescape, NERD (Named Entity Recognition and Disambiguation) and TICCLops (Text-Induced Corpus Clean-up online processing system).
The knowledge centre is an information service offering advice on the use of digital language resources and tools for Swedish and other languages in Sweden, as well as other parts of the intangible cultural heritage of Sweden.
CLAPOP is the portal of the Dutch CLARIN community. It brings together all relevant resources that were created within the CLARIN NL project and that now are part of the CLARIN NL infrastructure or that were created by other projects but are essential for the functioning of the CLARIN (NL) infrastructure. CLARIN-NL has closely cooperated with CLARIN Flanders in a number of projects. The common results of this cooperation and the results of this cooperation created by CLARIN Flanders are included here as well.
Polish CLARIN node – CLARIN-PL Language Technology Centre – is being built at Wrocław University of Technology. The LTC is addressed to scholars in the humanities and social sciences. Registered users are granted free access to digital language resources and advanced tools to explore them. They can also archive and share their own language data (in written, spoken, video or multimodal form).
The CLARIN Centre at the University of Copenhagen, Denmark, hosts and manages a data repository (CLARIN-DK-UCPH Repository), which is part of a research infrastructure for humanities and social sciences financed by the University of Copenhagen. The CLARIN-DK-UCPH Repository provides easy and sustainable access for scholars in the humanities and social sciences to digital language data (in written, spoken, video or multimodal form) and provides advanced tools for discovering, exploring, exploiting, annotating, and analyzing data. CLARIN-DK also shares knowledge on Danish language technology and resources and is the Danish node in the European Research Infrastructure Consortium, CLARIN ERIC.
Lithuania became a full member of CLARIN ERIC in January of 2015 and soon CLARIN-LT consortium was founded by three partner universities: Vytautas Magnus University, Kaunas Technology University and Vilnius University. The main goal of the consortium is to become a CLARIN B centre, which will be able to serve language users in Lithuania and Europe for storing and accessing language resources.
Content type(s)
The Berlin-Brandenburg Academy of Sciences and Humanities (BBAW) is a CLARIN partner institution and has been an officially certified CLARIN service center since June 20th, 2013. The CLARIN center at the BBAW focuses on historical text corpora (predominantly provided by the 'Deutsches Textarchiv'/German Text Archive, DTA) as well as on lexical resources (e.g. dictionaries provided by the 'Digitales Wörterbuch der Deutschen Sprache'/Digital Dictionary of the German Language, DWDS).
CLARIN-LV is a national node of Clarin ERIC (Common Language Resources and Technology Infrastructure). The mission of the repository is to ensure the availability and long­ term preservation of language resources. The data stored in the repository are being actively used and cited in scientific publications.
Content type(s)
>>>!!!<<<<ARCHE https://www.re3data.org/repository/r3d100012523 is the successor of a repository project established in 2014 as CLARIN Centre Vienna / Language Resources Portal (CCV/LRP). The mission of CCV/LRP was to provide depositing services and easy and sustainable access to digital language resources created in Austria. ARCHE replaces CCV/LRP and extends its mission by offering an advanced and reliable data management and depositing service open to a broader range of humanities fields in Austria. >>>!!!<<<
CLARIN is a European Research Infrastructure for the Humanities and Social Sciences, focusing on language resources (data and tools). It is being implemented and constantly improved at leading institutions in a large and growing number of European countries, aiming at improving Europe's multi-linguality competence. CLARIN provides several services, such as access to language data and tools to analyze data, and offers to deposit research data, as well as direct access to knowledge about relevant topics in relation to (research on and with) language resources. The main tool is the 'Virtual Language Observatory' providing metadata and access to the different national CLARIN centers and their data.
CLARIN-UK is a consortium of centres of expertise involved in research and resource creation involving digital language data and tools. The consortium includes the national library, and academic departments and university centres in linguistics, languages, literature and computer science.
Competence Centre IULA-UPF-CC CLARIN manages, disseminates and facilitates this catalogue, which provides access to reference information on the use of language technology projects and studies in different disciplines, especially with regard to Humanities and Social Sciences. The Catalog relates information that is organized by Áreas, (disciplines and research topics), Projects (of research that use or have used language technologies), Tasks (that make the tools), Tools (of language technology), Documentation (articles regarding the tools and how they are used) and resources such as Corpora (collections of annotated texts) and Lexica (collections of words for different uses).
ILC-CNR for CLARIN-IT repository is a library for linguistic data and tools. Including: Text Processing and Computational Philology; Natural Language Processing and Knowledge Extraction; Resources, Standards and Infrastructures; Computational Models of Language Usage. The studies carried out within each area are highly interdisciplinary and involve different professional skills and expertises that extend across the disciplines of Linguistics, Computational Linguistics, Computer Science and Bio-Engineering.
Currently, the IMS repository focuses on resources provided by the Institute for Natural Language Processing in Stuttgart (IMS) and other CLARIN-D related institutions such as the local Collaborative Research Centre 732 (SFB 732) as well as institutions and/or organizations that belong to the CLARIN-D extended scientific community. Comprehensive guidelines and workflows for submission by external contributors are being compiled based on the experiences in archiving such in-house resources.
The aim of the project is systematic mapping of Czech and other languages in comparison with Czech. CNC corpora are accessible to everybody interested in studying the language after free registration.
SWE-CLARIN is a national node in European Language and Technology Infrastructure (CLARIN) - an ESFRI initiative to build an infrastructure for e-science in the humanities and social sciences. SWE-CLARIN makes language-based materials available as research data using advanced processing tools and other resources. One basic idea is that the increasing amount of text and speech - contemporary and historical - as digital research material enables new forms of e-science and new ways to tackle old research issues.
Currently the institute has more than 700 collections consisting of (digital) research data, digitized material, archival collections, printed material, handwritten questionnaires, maps and pictures. The focus is on resources relevant for the study of function, meaning and coherence of cultural expressions and resources relevant for the structural, dialectological and sociolinguistic study of language variation within the Dutch language. An overview is here https://meertens.knaw.nl/en/datasets/
LINDAT/CLARIN is designed as a Czech “node” of Clarin ERIC (Common Language Resources and Technology Infrastructure). It also supports the goals of the META-NET language technology network. Both networks aim at collection, annotation, development and free sharing of language data and basic technologies between institutions and individuals both in science and in all types of research. The Clarin ERIC infrastructural project is more focused on humanities, while META-NET aims at the development of language technologies and applications. The data stored in the repository are already being used in scientific publications in the Czech Republic. In 2019 LINDAT/CLARIAH-CZ was established as a unification of two research infrastructures, LINDAT/CLARIN and DARIAH-CZ.
The Eurac Research CLARIN Centre (ERCC) is a dedicated repository for language data. It is hosted by the Institute for Applied Linguistics (IAL) at Eurac Research, a private research centre based in Bolzano, South Tyrol. The Centre is part of the Europe-wide CLARIN infrastructure, which means that it follows well-defined international standards for (meta)data and procedures and is well-embedded in the wider European Linguistics infrastructure. The repository hosts data collected at the IAL, but is also open for data deposits from external collaborators.
As a member of SWE-CLARIN, the Humanities Lab will provide tools and expertise related to language archiving, corpus and (meta)data management, with a continued emphasis on multimodal corpora, many of which contain Swedish resources, but also other (often endangered) languages, multilingual or learner corpora. As a CLARIN K-centre we provide advice on multimodal and sensor-based methods, including EEG, eye-tracking, articulography, virtual reality, motion capture, av-recording. Current work targets automatic data retrieval from multimodal data sets, as well as the linking of measurement data (e.g. EEG, fMRI) or geo-demographic data (GIS, GPS) to language data (audio, video, text, annotations). We also provide assistance with speech and language technology related matters to various projects. A primary resource in the Lab is The Humanities Lab corpus server, containing a varied set of multimodal language corpora with standardised metadata and linked layers of annotations and other resources.
The CLARIN-D Centre CEDIFOR provides a repository for long-term storage of resources and meta-data. Resources hosted in the repository stem from research of members as well as associated research projects of CEDIFOR. This includes software and web-services as well as corpora of text, lexicons, images and other data.