CLARIN-D Repository Workflows and Best Practices

The standard deposition workflow for metadata for this centre's repository consists of the following steps:

  1. Create metadata.
    • Metadata profiles are selected or created matching the type of research data to be archived. Metadata is filled out for each resource, either by the researcher or with assistance of local staff. The results are CMDI metadata profiles and descriptions for the research data eligible for archiving.
  2. Review metadata.
    • In this process step, metadata provided is assessed in accordance with the guidelines set by best practices criteria and from the infrastructure.

  3. Move metadata to persistent storage.
    • The metadata provided is moved to a persistent storage in a manner that will also be accessible via the web.
  4. Assign persistent identifiers.
    • Persistent identifiers (PIDs) provide a unique identification of the metadata in a location-independent manner. This means that even after migration of the metadata it will continue to use the same identifier. Within the centre, the Handle system is used and PIDs are granted in the form of URIs which are usable for accessing and viewing the metadata in a browser.
  5. Publish metadata via OAI-PMH.
    • The descriptive metadata are made available to third parties by a standard protocol (OAI-PMH). This ensures availability of the metadata sets to environments such as the CLARIN-EU Virtual Language Observatory (VLO).

The standard data deposition workflow consists of the following steps:

  1. Ingestion requirement checks:
    • Does the data content match the repository's mission?
      Is the data in one of the acceptable formats (non-proprietary, text-based) or can it be converted?

      The handling of requests to deposit data that do not fall within the CLARIN mission of the IMS repository (as described above) has to be decided on a case by case basis, but prospects will usually be negative. Data that conform to our mission statement will be prioritized in any case. But, the depositor may be referred to an appropriate CLARIN centre. If the data can not be provided in an acceptable format, it will not be stored in the repository. However, appropriate metadata with a link to the original data source can still be accepted.

  2. Depositor's agreement terms:
    • Are scientific and ethical norms considered?
      Are personal data contained in the data, and if so: is privacy protection ensured?
      Does the depositor hold all rights to publish the data?
      Which access license to end-users shall be granted? - Public or academic?

      To sign the agreement, the depositor will have to meet above preconditions contained in that contract. Access to the research data has to be determined in accordance with the license chosen by the depositor. Metadata always have to be publicly available.

  3. Metadata creation:
    • In close cooperation with the depositor, appropriate metadata are created and reviewed. A human reviewer probes the data submitted by external providersfor basic compliance to the depositor’s description.
  4. Signing of depositor's agreement.
    • In this step, the identity of the data depositor has to be verified, too. Moreover, data from depositors are preferred who show that they published a paper about their data or submitted it to a peer-reviewed journal.
  5. Persistent storage and assignment of persistent identifiers:
    • Metadata (and data) are stored in the repository. There is currently no formal curation policy regarding when to deprecate open, text-based data formats (accepted exclusively by the repository) and how to deal with such data.

The IMS Repository uses Fedora Repository as its base. Hence, our technical workflows are developed on top of the provided batch utilities for ingest and the API REST interfaces for access and management provided by the system. A big picture of the steps involved:

  1. packaging/updating of the resource,
  2. creating or transformation of the metadata (where necessary),
  3. quality check of the data and metadata (e.g. validation, where applicable),
  4. registering PIDs (Persistent Identifiers, handle system) and
  5. inserting them in the CMDI metadata records.

See the list of staff members for contact persons.
The following resources provide further guidelines for the creation of CMDI metadata suitable for deposit in this centre's repository:


extern/CLARIN-D (last edited 2018-06-27 09:09:21 by GerhardKremer)