ICARUS-Coreference-Perspective

Back to ICARUS Main PageBack to ICARUS Main Page

Index:

  1. How to add a new document-set

  2. Coreference Menu

  3. Document Outline

  4. Allocation Format

Back to indexBack to index

I. How to add a new document-set:

  1. Click coref_add.png to add a new document-set.

  2. The coref_document-set.png document-set editor shows up attachment:coref_document-set-editor.png

    • Name: specify the document-set that will show up in the Coref-Manager list.
    • Location: document-set location on the file system (local/network).
    • Reader: Set the reader for the new document-set. E.g.: CONLL12
    • Properties: (currently not used)
    • coref_document-set_example.png

      • coref_document-set-loaded.png = Document-set loaded (location correct)

      • coref_document-set-error.png = No location set

      • coref_document-set.png = Document-set not loaded yet

  3. New allocations can be added clicking on coref_add-allocation.png which will open the allocation editor attachment:coref_editor-allocation.png

    • Name: specify the allocation name
    • Location: allocation location on the file system (local/network).
    • Reader: Reader for the new allocation
    • Properties: (currently not used)
    • coref_allocation_example.png The icons show the allocation status error.png = No location set, loaded.png = Allocation loaded (location correct).

  4. To view the document-set with the specified allocation click coref_inspect.png

  5. Resulting document-outline for the document-set including coreference highlight (same color = same coreferent-set) attachment:coref_document-outline_example.png

Back to indexBack to index

II. Coreference Menu:

Coref-Manager

coref_manager.png

Coref-Manager Toolbar: coref_manager-tb.png


Coref-Explorer

coref_explorer.png coref_explorer-loading.png

Back to indexBack to index


III. Document Outline:

The document outline supports three different presentation styles for document-sets:

  1. Document Outline Text

  2. Document Outline Graph

  3. Document Outline Grid

Document Outline Text:

attachment:coref_document-outline-text.png

Document Outline Text Toolbar: coref_document-outline-text-tb.png

Document Outline Graph:

Pairwise links output by an automatic coreference system can be treated as arcs in a directed graph. Linking the first mention of each cluster to an artificial root node creates a tree structure that encodes the entire clustering in a document.

In the example solid nodes/arcs present the predicted annotation whereas dashed nodes/arcs present the gold annotation. Discrepancies between predicted and gold are marked with different colors that show the different types of errors.

attachment:coref_document-set-graph.png

Document Outline Graph Toolbar coref_document-set-graph-tb.png

Document Outline Entity Grid:

Barzilay and Lapata [1] introduce the entity grid, a tabular view of entities in a document. Specifically, rows of the grid correspond to sentences, and columns to entities. The cells of the table are used to indicate that an entity is mentioned in the corresponding sentence. Entity grids provide a compact view on the distribution of mentions in a document and allows the user to see how the description of an entity changes from mention to mention. attachment:coref_document-set-entity.png

[1] Regina Barzilay and Mirella Lapata. 2008. Modeling Local Coherence: An Entity-Based Approach. Computational Linguistics, 34(1):1–34.

Document Outline Entity Toolbar: coref_document-set-entity-tb.png

Back to indexBack to index


Context Outline Toolbar: coref_document-set-graph_text-tb.png ]

Back to indexBack to index

I. Allocation Format:

The coreference plugin provides a default format for representing standoff annotations in the form of allocation files:

Each allocation file may contain multiple document blocks. The following text shows such an example document section:

#begin document (bc/cctv/00/cctv_0000); part 000
#id eng-fo-opt.mdlGold
#begin nodes
ROOT    
0-2-5   Gender:Neut;HEAD:3;Number:Sin;Type:Common;
0-5-5   Gender:Unknown;HEAD:5;Number:Unknown;Type:Common;
0-22-26 Gender:Neut;HEAD:26;Number:Sin;Type:Common;
0-24-25 Gender:Neut;HEAD:25;Number:Sin;Type:Name;
1-2-4   Gender:Neut;HEAD:4;Number:Sin;Type:Common;
1-2-2   Gender:Unknown;HEAD:2;Number:Plu;Type:Pronoun;
1-6-11  Gender:Unknown;HEAD:11;Number:Plu;Type:Common;
#end nodes
#begin edges
ROOT>>0-2-5     Type:IDENT
ROOT>>0-5-5     Type:IDENT
ROOT>>0-24-25   Type:IDENT
ROOT>>1-2-4     Type:IDENT
ROOT>>1-2-2     Type:IDENT
1-2-2>>1-6-11   Type:IDENT
0-24-25>>1-15-16        Type:IDENT
#end edges
#end document

Each section (document, nodes or edges) is surrounded by the respective "#begin <section>" and "#end <section>" lines. The document section is special, in that it requires an identifier to be specified after the "#begin document" statement. This identifier ("(bc/cctv/00/cctv_0000); part 000" in the above example) has to match a previously defined identifier in the original document set the allocation file is referring to and is used to map the coreference structure of the standoff annotation to an actual document in the original data. In addition the document section may contain special comment lines ("#id eng-fo-opt.mdlGold" in the example) that are parsed as properties for the document by using the first character sequence after the hash sign till the first space as key and the reminder of that line of text as value. Note, however, that those properties are not used by the current visualizations and therefore can be ignored.

Inside the document section there first has to be a declaration of the available nodes (mentions), followed by all the edges (coreference links between mentions). The nodes section contains one mention per line in the form of "<sentence-id>-<token-begin-index>-<token-end-index>" with the following meaning of those fields:

The declaration of the artificial root node (ROOT in the example) is optional and can be omitted. Each mention other than the root node is allowed to have an arbitrary number of properties in the form of key-value pairs assigned to it. Properties are separated from the rest of the node declaration by a tab character (\t) and form a sequence of "<key>:<value>;" definitions. The semicolon after the last key-value pair of a properties sequence is optional. Both key and value may consist of arbitrary non-empty strings (all line-break related characters such as \n or \r will however break the sequence).

The edges section uses a similar syntax to express links between previously defined mentions, again in a one item per line basis. Each edge is defined by listing its terminal points, separated by ">>", e.g. "ROOT>>0-2-5". Mentions that are not linked to a specific antecedent should be attached to the artificial root node. Properties for edges can be defined the exact same way as was described for mentions above.

EBNF of the allocation format:

<string> = ? any non-empty character sequence, not containing control characters or any of \t, \n or \r ? ;
<number> = ? any non-negative integer number starting from 0 or 1 (depending on the context) ? ;
<line-break> = "\n" | "\r" | "\r\n" ;
<blank> = " " ;
<tab> = "\t" ;
<property> = <string>, ":", <string> ;
<properties> = <property>, { ";", <property> }, [ ";" ] ;
<span> = <number>, "-", <number>, "-", <number> ;
<node> = "ROOT" | ( <span>, [ <tab>, <properties> ] ), <line-break> ;
<edge> = ( "ROOT" | <span> ), ">>", <span>, [ <tab>, <properties> ], <line-break> ;
<node-section> = "#begin nodes", <line-break>,
                 { <node> },
                 "#end nodes", <line-break>,
<edge-section> = "#begin edges", <line-break>,
                 { <edge> },
                 "#end edges", <line-break>,
<document-id> = <string> ;
<key> = ? a string as described in <string> that does not start with either "begin" or "end" and does not contain whitespaces ? ;
<comment> = "#", <key>, <blank>, <string>, <line-break> ;
<document-section> = "#begin document", <blank>, <document-id>, <line-break>,
                     { <comment> },
                     <node-section>,
                     <edge-section>
                     "#end document" ;
<document-set> = <document-section>, { <line-break>, <document-section> } ;

Back to indexBack to index

extern/ICARUS-Coreference-Perspective (last edited 2014-07-17 08:13:54 by MarkusGaertner)