Filter
Reset all

Subjects

Content Types

Countries

AID systems

API

Certificates

Data access

Data access restrictions

Database access

Database access restrictions

Database licenses

Data licenses

Data upload

Data upload restrictions

Enhanced publication

Institution responsibility type

Institution type

Keywords

Metadata standards

PID systems

Provider types

Quality management

Repository languages

Software

Syndications

Repository types

Versioning

  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
Found 68 result(s)
The repository of the Hamburg Centre for Speech Corpora is used for archiving, maintenance, distribution and development of spoken language corpora. These usually consist of audio and / or video recordings, transcriptions and other data and structured metadata. The corpora treat the focus on multilingualism and are generally freely available for research and teaching. Most of the measures maintained by the HZSK corpora were created in the years 2000-2011 in the framework of the SFB 538 "Multilingualism" at the University of Hamburg. The HZSK however also strives to take linguistic data from other projects or contexts, and to provide also the scientific community for research and teaching are available, provided that they are compatible with the current focus of HZSK, ie especially spoken language and multilingualism.
The project is set up in order to improve the infrastructure for text-based linguistic research and development by building a huge, automatically annotated German text corpus and the corresponding tools for corpus annotation and exploitation. DeReKo constitutes the largest linguistically motivated collection of contemporary German texts, contains fictional, scientific and newspaper texts, as well as several other text types, contains only licenced texts, is encoded with rich meta-textual information, is fully annotated morphosyntactically (three concurrent annotations), is continually expanded, with a focus on size and stratification of data, may be analyzed free of charge via the query system COSMAS II, serves as a 'primordial sample' from which users may draw specialized sub-samples (socalled 'virtual corpora') to represent the language domain they wish to investigate. !!! Access to data of Das Deutsche Referenzkorpus is also provided by: IDS Repository https://www.re3data.org/repository/r3d100010382 !!!
B2SAFE is a robust, safe and highly available service which allows community and departmental repositories to implement data management policies on their research data across multiple administrative domains in a trustworthy manner. A solution to: provide an abstraction layer which virtualizes large-scale data resources, guard against data loss in long-term archiving and preservation, optimize access for users from different regions, bring data closer to powerful computers for compute-intensive analysis
CLARIN.SI is the Slovenian node of the European CLARIN (Common Language Resources and Technology Infrastructure) Centers. The CLARIN.SI repository is hosted at the Jožef Stefan Institute and offers long-term preservation of deposited linguistic resources, along with their descriptive metadata. The integration of the repository with the CLARIN infrastructure gives the deposited resources wide exposure, so that they can be known, used and further developed beyond the lifetime of the projects in which they were produced. Among the resources currently available in the CLARIN.SI repository are the multilingual MULTEXT-East resources, the CC version of Slovenian reference corpus Gigafida, the morphological lexicon Sloleks, the IMP corpora and lexicons of historical Slovenian, as well as many other resources for a variety of languages. Furthermore, several REST-based web services are provided for different corpus-linguistic and NLP tasks.
CLARIN-UK is a consortium of centres of expertise involved in research and resource creation involving digital language data and tools. The consortium includes the national library, and academic departments and university centres in linguistics, languages, literature and computer science.
The focus of PolMine is on texts published by public institutions in Germany. Corpora of parliamentary protocols are at the heart of the project: Parliamentary proceedings are available for long stretches of time, cover a broad set of public policies and are in the public domain, making them a valuable text resource for political science. The project develops repositories of textual data in a sustainable fashion to suit the research needs of political science. Concerning data, the focus is on converting text issued by public institutions into a sustainable digital format (TEI/XML).
HELIX DATA is an integral component of the Hellenic Data Service "HELIX" supporting knowledge management and scholarly communication in Greece. HELIX DATA is the data catalogue and repository, with a dual role to store and preserve data that are self-deposited by researchers as well as to harvest data records from other national data sources and catalogues.
ZENODO builds and operates a simple and innovative service that enables researchers, scientists, EU projects and institutions to share and showcase multidisciplinary research results (data and publications) that are not part of the existing institutional or subject-based repositories of the research communities. ZENODO enables researchers, scientists, EU projects and institutions to: easily share the long tail of small research results in a wide variety of formats including text, spreadsheets, audio, video, and images across all fields of science. display their research results and get credited by making the research results citable and integrate them into existing reporting lines to funding agencies like the European Commission. easily access and reuse shared research results.
The DARIAH-DE repository is a digital long-term archive for human and cultural-scientific research data. Each object described and stored in the DARIAH-DE Repository has a unique and lasting Persistent Identifier (DOI), with which it is permanently referenced, cited, and kept available for the long term. In addition, the DARIAH-DE Repository enables the sustainable and secure archiving of data collections. The DARIAH-DE Repository is not only to DARIAH-DE associated research projects, but also to individual researchers as well as research projects that want to save their research data persistently, referenceable and long-term archived and make it available to third parties. The main focus is the simple and user-oriented access to long-term storage of research data. To ensure its long term sustainability, the DARIAH-DE Repository is operated by the Humanities Data Centre.
The repository is part of the National Research Data Infrastructure initiative Text+, in which the University of Tübingen is a partner. It is housed at the Department of General and Computational Linguistics. The infrastructure is maintained in close cooperation with the Digital Humanities Centre, which is a core facility of the university, colaborating with the library and computing center of the university. Integration of the repository into the national CLARIN-D and international CLARIN infrastructures gives it wide exposure, increasing the likelihood that the resources will be used and further developed beyond the lifetime of the projects in which they were developed. Among the resources currently available in the Tübingen Center Repository, researchers can find widely used treebanks of German (e.g. TüBa-D/Z), the German wordnet (GermaNet), the first manually annotated digital treebank (Index Thomisticus), as well as descriptions of the tools used by the WebLicht ecosystem for natural language processing.
CLAPOP is the portal of the Dutch CLARIN community. It brings together all relevant resources that were created within the CLARIN NL project and that now are part of the CLARIN NL infrastructure or that were created by other projects but are essential for the functioning of the CLARIN (NL) infrastructure. CLARIN-NL has closely cooperated with CLARIN Flanders in a number of projects. The common results of this cooperation and the results of this cooperation created by CLARIN Flanders are included here as well.
Currently the institute has more than 700 collections consisting of (digital) research data, digitized material, archival collections, printed material, handwritten questionnaires, maps and pictures. The focus is on resources relevant for the study of function, meaning and coherence of cultural expressions and resources relevant for the structural, dialectological and sociolinguistic study of language variation within the Dutch language. An overview is here https://meertens.knaw.nl/en/datasets/
This is the KONECT project, a project in the area of network science with the goal to collect network datasets, analyse them, and make available all analyses online. KONECT stands for Koblenz Network Collection, as the project has roots at the University of Koblenz–Landau in Germany. All source code is made available as Free Software, and includes a network analysis toolbox for GNU Octave, a network extraction library, as well as code to generate these web pages, including all statistics and plots. KONECT contains over a hundred network datasets of various types, including directed, undirected, bipartite, weighted, unweighted, signed and rating networks. The networks of KONECT are collected from many diverse areas such as social networks, hyperlink networks, authorship networks, physical networks, interaction networks and communication networks. The KONECT project has developed network analysis tools which are used to compute network statistics, to draw plots and to implement various link prediction algorithms. The result of these analyses are presented on these pages. Whenever we are allowed to do so, we provide a download of the networks.
The JRC Data Catalogue gives access to the multidisciplinary data produced and maintained by the Joint Research Centre, the European Commission's in-house science service providing independent scientific advice and support to policies of the European Union.
The knowledge centre is an information service offering advice on the use of digital language resources and tools for Swedish and other languages in Sweden, as well as other parts of the intangible cultural heritage of Sweden.
ORTOLANG is an EQUIPEX project accepted in February 2012 in the framework of investissements d’avenir. Its aim is to construct a network infrastructure including a repository of language data (corpora, lexicons, dictionaries etc.) and readily available, well-documented tools for its processing. Expected outcomes comprize: promoting research on analysis, modelling and automatic processing of our language to their highest international levels thanks to effective resource pooling; facilitating the use and transfer of resources and tools set up within public laboratories to industrial partners, notably SMEs which often cannot develop such resources and tools for language processing given the cost of investment; promoting French language and the regional languages of France by sharing expertise acquired by public laboratories. ORTOLANG is a service for the language, which is complementary to the service offered by Huma-Num (très grande infrastructure de recherche). Ortolang gives access to SLDR for speech, and CNRTL for text resources.