CMDI BestPractice for Stuttgart's resource descriptions

To provide consistent metadata files list resource-independent decisions here: (order entries by 'Component', entries without specific component relation can be listed afterwards)

Component

Profile

Recommendation

Comment

Examples

--

WebLichtWebService (clarin.eu:cr1:p_1320657629644)

Use this profile to describe WebLicht web services. This is a valid extension, and no further embedding is needed (the description is misleading, but cannot be changed because the profile has status 'public').

Recommendation from email discussion with TÜ.

Access (clarin.eu:cr1:c_1290431694501)

NaLiDa-Profiles

see below

Contact template

Access (clarin.eu:cr1:c_1290431694501)

NaLiDa-Profiles

If the resource was developed at the IMS the CatalogueLink should leads to the IMS homepage.

http://www.ims.uni-stuttgart.de/forschung/ressourcen/index.html

Copyright (clarin.eu:cr1:c_1290431694531)

NaLiDa ToolProfile

Use only CopyrightLicence, not UsageLicence

UsageLicence to distinguish between installation for one workplace vs. for a working group.

Descriptions (clarin.eu:cr1:c_1290431694486)

NaLiDa-Profiles

If it is not the description section for the whole resource, but for a specific aspect, state this at the beginning of the description text.

Inserts some structure into the combined description section in the VLO.

'Input format description: Input format is one-token-per-line. Each sentence must be followed by an empty line. Tokens may contain blanks.', 'Prerequisite description: (Probabilistic or symbolic) context-free grammar'

Descriptions (clarin.eu:cr1:c_1290431694486)

NaLiDa-Profiles

A description should not start with a newline but should end with one (except for copied licence statements, etc.).

This will enhance readability in the VLO.

Distribution (clarin.eu:cr1:c_1290431694504)

NaLiDa-Profiles

DistributionFiles can also be used to state that additional scripts (e.g. for pre-processing) are part of the download. (Use FileName and Description elements.)

Recommendation from email discussion with TÜ.

Documentations (clarin.eu:cr1:c_1342181139642)

NaLiDa-Profiles

Use a new subcomponent 'Documentation' for each file/manual/website/etc.

Recommendation from email discussion with TÜ.

GeneralInfo (clarin.eu:cr1:c_1290431694495)

NaLiDa-Profiles

If there is no official 'long' ResourceTitle, state just the 'short' ResourceName.

GeneralInfo (clarin.eu:cr1:c_1290431694495)

NaLiDa-Profiles

Keeping track of LastUpdate might mean many updates of the MD-file, (either there is explicit versioning or no statement).

GeneralInfo (clarin.eu:cr1:c_1290431694495)

NaLiDa-Profiles

Skip element 'Genre' for tools.

MimeTypes (clarin.eu:cr1:c_1290431694511)

NaLiDa-Profiles

Prefers application/xml over text/xml.

Current status of CLARIN-D MIME type discussion. See also: https://www.w3.org/TR/webarch/#xml-media-types

Project (clarin.eu:cr1:c_1290431694522)

NaLiDa-Profiles

For SFB projects ProjectName is
SFB <SFB number>
Projekt <project number>
ProjectTitle does neither contain the SFB number nor the project number
ProjectID is
SFB <SFB number> <project number>
and the Funder is written as
DFG (Deutsche Forschungsgemeinschaft)

consistent entries for SFB projects

<cmdp:ProjectName>
SFB 732 Projekt D12
</cmdp:ProjectName>
<cmdp:ProjectTitle>
Sense Discrimination and Regular Meaning Shifts of German Particle Verbs
</cmdp:ProjectTitle>
<cmdp:ProjectID>
SFB 732 D12
</cmdp:ProjectID>
<cmdp:Funder>
DFG (Deutsche Forschungsgemeinschaft)
</cmdp:Funder>

--

--

For string-based CharacterEncoding values (such as http://www.isocat.org/rest/dc/2564) use 'ISO 8859-1' (instead of Latin-1), 'UTF-8', ...

/CMD/Header

--

MdProfile has to be filled.

Is used by VLO.

/CMD/Header

--

MdCollectionDisplayName is 'WebLicht Webservice Orchestrator' for WebServices to be harvested for WebLicht, and 'IMS, CLARIN-D Centre, University of Stuttgart' for our other resources.

Values known to VLO. Name was changed

/CMD/

--

When an element is doubled for content in different languages, xml:lang="en" should be the first entry, except for address information.

"long text" (for example in Description-fields)

--

one long line (as far as possible) and break before end-tagg

Access -> Availability

Nalida-Profiles

falls Lizenz GNU General Public License, version 3 ist, dann ist die Ressource frei

ResourceList -> PID

Resource-Bundle

use the SelfLinks not the ID of the "other" profiles

ID = LandingPage -> redundant

RFTagger_ResourceBundel: <cmd:PID>http://hdl.handle.net/11022/1007-0000-0000-8E2C-0</cmd:PID> <!-- Tool -->

DocumentationLanguage

Nalida-Profiles

Only use the DocumentationLanguage component inside the Documentation component unless the other is needed to add more general information

This component is redundant because of a modification of the profiles.

LifeCycleStatus

Nalida-Profiles

Der Unterschied zwischen 'published' und 'released' ist aber folgendermaßen gedacht: 'published' sind Ressourcen, die im Rahmen von Vertriebskanälen (Verlag, Konferenzen, Workshops, ...) herausgegeben wurden; 'released' wäre etwas schwächer, indem z.B. eine Arbeitsgruppe ihr Material für fertig erklärt und zum Download herausgibt. Theoretisch wäre es natürlich auch möglich, dass eine Ressource "publiziert" wird, obwohl sie noch nicht released ist (= noch nicht wirklich erhältlich, aber schon das Datum der Konferenz und Veröffentlichung im Verlag hat)

normalerweise bei computerlinguistischen Ressourcen "released", bei psycholinguistischen Experimentaldaten eher "published"

Contact template for Access-Component (clarin.eu:cr1:c_1290431694501)

<cmd:Contact>
    <cmd:Person>Clarin-D, Universität Stuttgart</cmd:Person>
    <cmd:Address>Pfaffenwaldring 5b, D-70569 Stuttgart, Deutschland</cmd:Address>
    <cmd:Email>clarin@ims.uni-stuttgart.de</cmd:Email>
    <cmd:Department xml:lang="de">Institut für Maschinelle Sprachverarbeitung</cmd:Department>
    <cmd:Department xml:lang="en">Institute for Natural Language Processing (IMS)</cmd:Department> 
    <cmd:Organisation xml:lang="de">Universität Stuttgart </cmd:Organisation> 
    <cmd:Organisation xml:lang="en">University of Stuttgart</cmd:Organisation>                         
    <cmd:Url>http://www.ims.uni-stuttgart.de/forschung/projekte/ClarinD.html</cmd:Url>
</cmd:Contact>

<cmd:LegalOwner xml:lang="de">Institut für Maschinelle Sprachverarbeitung (IMS), Universität Stuttgart</cmd:LegalOwner> 
<cmd:LegalOwner xml:lang="en">Institute for Natural Language Processing (IMS), University of Stuttgart</cmd:LegalOwner>                                            
<cmd:Location>      
    <cmd:Address>Universität Stuttgart, Institut für Maschinelle Sprachverarbeitung, Pfaffenwaldring 5b, 70569 Stuttgart, Deutschland</cmd:Address>  
    <cmd:ContinentName xml:lang="en">Europe</cmd:ContinentName>       
    <cmd:ContinentName xml:lang="de">Europa</cmd:ContinentName>                       
    <cmd:Country>
        <cmd:CountryName xml:lang="en">Germany</cmd:CountryName>
        <cmd:CountryName xml:lang="de">Deutschland</cmd:CountryName>
        <cmd:CountryCoding>DE</cmd:CountryCoding>
    </cmd:Country>                       
</cmd:Location>


CategoryCLARIN-D

extern/CLARIN-D/CMDI BestPractice (last edited 2018-04-05 18:47:27 by MarkusGaertner)