CMDI BestPractice for Stuttgart's resource descriptions
To provide consistent metadata files list resource-independent decisions here: (order entries by 'Component', entries without specific component relation can be listed afterwards)
Component |
Profile |
Recommendation |
Comment |
Examples |
-- |
WebLichtWebService (clarin.eu:cr1:p_1320657629644) |
Use this profile to describe WebLicht web services. This is a valid extension, and no further embedding is needed (the description is misleading, but cannot be changed because the profile has status 'public'). |
Recommendation from email discussion with TÜ. |
|
Access (clarin.eu:cr1:c_1290431694501) |
NaLiDa-Profiles |
see below |
Contact template |
|
Access (clarin.eu:cr1:c_1290431694501) |
NaLiDa-Profiles |
If the resource was developed at the IMS the CatalogueLink should leads to the IMS homepage. |
|
http://www.ims.uni-stuttgart.de/forschung/ressourcen/index.html |
Copyright (clarin.eu:cr1:c_1290431694531) |
NaLiDa ToolProfile |
Use only CopyrightLicence, not UsageLicence |
UsageLicence to distinguish between installation for one workplace vs. for a working group. |
|
Descriptions (clarin.eu:cr1:c_1290431694486) |
NaLiDa-Profiles |
If it is not the description section for the whole resource, but for a specific aspect, state this at the beginning of the description text. |
Inserts some structure into the combined description section in the VLO. |
'Input format description: Input format is one-token-per-line. Each sentence must be followed by an empty line. Tokens may contain blanks.', 'Prerequisite description: (Probabilistic or symbolic) context-free grammar' |
Descriptions (clarin.eu:cr1:c_1290431694486) |
NaLiDa-Profiles |
A description should not start with a newline but should end with one (except for copied licence statements, etc.). |
This will enhance readability in the VLO. |
|
Distribution (clarin.eu:cr1:c_1290431694504) |
NaLiDa-Profiles |
DistributionFiles can also be used to state that additional scripts (e.g. for pre-processing) are part of the download. (Use FileName and Description elements.) |
Recommendation from email discussion with TÜ. |
|
Documentations (clarin.eu:cr1:c_1342181139642) |
NaLiDa-Profiles |
Use a new subcomponent 'Documentation' for each file/manual/website/etc. |
Recommendation from email discussion with TÜ. |
|
GeneralInfo (clarin.eu:cr1:c_1290431694495) |
NaLiDa-Profiles |
If there is no official 'long' ResourceTitle, state just the 'short' ResourceName. |
|
|
GeneralInfo (clarin.eu:cr1:c_1290431694495) |
NaLiDa-Profiles |
Keeping track of LastUpdate might mean many updates of the MD-file, (either there is explicit versioning or no statement). |
|
|
GeneralInfo (clarin.eu:cr1:c_1290431694495) |
NaLiDa-Profiles |
Skip element 'Genre' for tools. |
|
|
MimeTypes (clarin.eu:cr1:c_1290431694511) |
NaLiDa-Profiles |
Prefers application/xml over text/xml. |
Current status of CLARIN-D MIME type discussion. See also: https://www.w3.org/TR/webarch/#xml-media-types |
|
Project (clarin.eu:cr1:c_1290431694522) |
NaLiDa-Profiles |
For SFB projects ProjectName is |
consistent entries for SFB projects |
<cmdp:ProjectName> |
-- |
-- |
For string-based CharacterEncoding values (such as http://www.isocat.org/rest/dc/2564) use 'ISO 8859-1' (instead of Latin-1), 'UTF-8', ... |
|
|
/CMD/Header |
-- |
MdProfile has to be filled. |
Is used by VLO. |
|
/CMD/Header |
-- |
MdCollectionDisplayName is 'WebLicht Webservice Orchestrator' for WebServices to be harvested for WebLicht, and 'IMS, CLARIN-D Centre, University of Stuttgart' for our other resources. |
Values known to VLO. Name was changed |
|
/CMD/ |
-- |
When an element is doubled for content in different languages, xml:lang="en" should be the first entry, except for address information. |
|
|
"long text" (for example in Description-fields) |
-- |
one long line (as far as possible) and break before end-tagg |
|
|
Access -> Availability |
Nalida-Profiles |
falls Lizenz GNU General Public License, version 3 ist, dann ist die Ressource frei |
|
|
ResourceList -> PID |
Resource-Bundle |
use the SelfLinks not the ID of the "other" profiles |
ID = LandingPage -> redundant |
RFTagger_ResourceBundel: <cmd:PID>http://hdl.handle.net/11022/1007-0000-0000-8E2C-0</cmd:PID> <!-- Tool --> |
DocumentationLanguage |
Nalida-Profiles |
Only use the DocumentationLanguage component inside the Documentation component unless the other is needed to add more general information |
This component is redundant because of a modification of the profiles. |
|
LifeCycleStatus |
Nalida-Profiles |
Der Unterschied zwischen 'published' und 'released' ist aber folgendermaßen gedacht: 'published' sind Ressourcen, die im Rahmen von Vertriebskanälen (Verlag, Konferenzen, Workshops, ...) herausgegeben wurden; 'released' wäre etwas schwächer, indem z.B. eine Arbeitsgruppe ihr Material für fertig erklärt und zum Download herausgibt. Theoretisch wäre es natürlich auch möglich, dass eine Ressource "publiziert" wird, obwohl sie noch nicht released ist (= noch nicht wirklich erhältlich, aber schon das Datum der Konferenz und Veröffentlichung im Verlag hat) |
normalerweise bei computerlinguistischen Ressourcen "released", bei psycholinguistischen Experimentaldaten eher "published" |
|
Contact template for Access-Component (clarin.eu:cr1:c_1290431694501)
<cmd:Contact> <cmd:Person>Clarin-D, Universität Stuttgart</cmd:Person> <cmd:Address>Pfaffenwaldring 5b, D-70569 Stuttgart, Deutschland</cmd:Address> <cmd:Email>clarin@ims.uni-stuttgart.de</cmd:Email> <cmd:Department xml:lang="de">Institut für Maschinelle Sprachverarbeitung</cmd:Department> <cmd:Department xml:lang="en">Institute for Natural Language Processing (IMS)</cmd:Department> <cmd:Organisation xml:lang="de">Universität Stuttgart </cmd:Organisation> <cmd:Organisation xml:lang="en">University of Stuttgart</cmd:Organisation> <cmd:Url>http://www.ims.uni-stuttgart.de/forschung/projekte/ClarinD.html</cmd:Url> </cmd:Contact>
Legal Owner template for General Info-Component (Legal owner = IMS)
<cmd:LegalOwner xml:lang="de">Institut für Maschinelle Sprachverarbeitung (IMS), Universität Stuttgart</cmd:LegalOwner> <cmd:LegalOwner xml:lang="en">Institute for Natural Language Processing (IMS), University of Stuttgart</cmd:LegalOwner> <cmd:Location> <cmd:Address>Universität Stuttgart, Institut für Maschinelle Sprachverarbeitung, Pfaffenwaldring 5b, 70569 Stuttgart, Deutschland</cmd:Address> <cmd:ContinentName xml:lang="en">Europe</cmd:ContinentName> <cmd:ContinentName xml:lang="de">Europa</cmd:ContinentName> <cmd:Country> <cmd:CountryName xml:lang="en">Germany</cmd:CountryName> <cmd:CountryName xml:lang="de">Deutschland</cmd:CountryName> <cmd:CountryCoding>DE</cmd:CountryCoding> </cmd:Country> </cmd:Location>