## page was renamed from ICARUS-Coreference-Perspective = ICARUS-Coreference-Perspective = [[ICARUS|{{attachment:navi_up.png|Back to ICARUS Main Page}}]][[extern/ICARUS|Back to ICARUS Main Page]] <> == Index: == I. [[#how-to-add-coref|How to add a new document-set]] I. [[#coreference-menu|Coreference Menu]] I. [[#coreference-document-outline|Document Outline]] I. [[#coreference-allocation-format|Allocation Format]] [[#icarus-coref-index|{{attachment:navi_up.png|Back to index}}]][[#icarus-coref-index|Back to index]] <> == I. How to add a new document-set: == 1. Click {{attachment:coref_add.png}} to add a new document-set. 1. The {{attachment:coref_document-set.png}} document-set editor shows up [[attachment:coref_document-set-editor.png|{{attachment:coref_document-set-editor.png||width=250}}]] * Name: specify the document-set that will show up in the Coref-Manager list. * Location: document-set location on the file system (local/network). * Reader: Set the reader for the new document-set. E.g.: CONLL12 * Properties: (currently not used) * {{attachment:coref_document-set_example.png||align="top"}} * {{attachment:coref_document-set-loaded.png}} = Document-set loaded (location correct) * {{attachment:coref_document-set-error.png}} = No location set * {{attachment:coref_document-set.png}} = Document-set not loaded yet 1. New allocations can be added clicking on {{attachment:coref_add-allocation.png}} which will open the allocation editor [[attachment:coref_editor-allocation.png|{{attachment:coref_editor-allocation.png||width=250}}]] * Name: specify the allocation name * Location: allocation location on the file system (local/network). * Reader: Reader for the new allocation * Properties: (currently not used) * {{attachment:coref_allocation_example.png||align="top"}} The icons show the allocation status {{attachment:error.png}} = No location set, {{attachment:loaded.png}} = Allocation loaded (location correct). 1. To view the document-set with the specified allocation click {{attachment:coref_inspect.png}} 1. Resulting document-outline for the document-set including coreference highlight (same color = same coreferent-set) [[attachment:coref_document-outline_example.png|{{attachment:coref_document-outline_example.png||width=250}}]] [[#icarus-coref-index|{{attachment:navi_up.png|Back to index}}]][[#icarus-coref-index|Back to index]] <> == II. Coreference Menu: == === Coref-Manager === {{attachment:coref_manager.png}} '''Coref-Manager Toolbar: {{attachment:coref_manager-tb.png}}''' * {{attachment:coref_add.png}} = Add new {{attachment:coref_document-set.png}} document-set * {{attachment:coref_delete.png}} = Delete selected {{attachment:coref_document-set.png}} document-set * {{attachment:coref_edit.png}} = Edit selected {{attachment:coref_document-set.png}} document-set * {{attachment:coref_inspect.png}} = Inspect selected {{attachment:coref_document-set.png}} document-set. Note {{attachment:coref_document-set-loaded.png}} indicates that the document-set was loaded successful. * {{attachment:coref_add-allocation.png}} = Add new allocation to selected document-set * {{attachment:coref_delete-allocation.png}} = Delete selected allocation * {{attachment:coref_edit-allocation.png}} = Edit selected allocation * {{attachment:coref_show-property.png}} = Open the Property Info dialog * {{attachment:coref_collapse-all.png}} = Collapse all document-sets in the list. Note: Document-set may be opened/closed clicking on {{attachment:open.png}} / {{attachment:close.png}} ---- === Coref-Explorer === {{attachment:coref_explorer.png}} {{attachment:coref_explorer-loading.png}} [[#icarus-coref-index|{{attachment:navi_up.png|Back to index}}]][[#icarus-coref-index|Back to index]] ---- <> == III. Document Outline: == The document outline supports three different presentation styles for document-sets: i. [[#coreference-document-outline-text|Document Outline Text]] i. [[#coreference-document-outline-graph|Document Outline Graph]] i. [[#coreference-document-outline-grid|Document Outline Grid]] <> === Document Outline Text: === [[attachment:coref_document-outline-text.png|{{attachment:coref_document-outline-text.png||width=250}}]] ''' Document Outline Text Toolbar: ''' {{attachment:coref_document-outline-text-tb.png}} * {{attachment:coref_presenter-text.png}} = Selected presenter style (in this example text) * {{attachment:coref_selected_doc.png}} = Selected document * {{attachment:coref_show-property-outline.png}} = Show property outline * {{attachment:coref_show-context-outline-greyed.png}} = Text view is identical to the context outline. Therefore the context outline is shown greyed. * {{attachment:coref_show-property.png}} = Open the Property Info dialog * {{attachment:refresh.png}} = Refresh View * {{attachment:clear_view.png}} = Clear View * {{attachment:preferences.png}} = Open the preferences * {{attachment:coref_hl-type.png}} = Five different types to highlight spans. * Background: {{attachment:coref_hl_background.png}} * Foreground: {{attachment:coref_hl_foreground.png}} * Underlined: {{attachment:coref_hl_underlined.png}} * Italic: {{attachment:coref_hl_italic.png}} * Bold: {{attachment:coref_hl_bold.png}} * {{attachment:coref_hl-spans.png}} = Turn on/off the span highlight * {{attachment:coref_force-lb.png}} = Every sentence is in a new line. (default on) * {{attachment:coref_show-sindex.png}} = Show the sentence index (red border){{attachment:coref_show-index_example.png||align="middle"}} (default on) * {{attachment:coref_show-dh.png}} = Display the current document header (red border) {{attachment:coref_document-header_example.png||align="middle"}} (default on) * {{attachment:coref_show-offset.png}} = Show token offset (red border){{attachment:coref_show-offset_example.png}} (default off) * {{attachment:coref_show-cluster.png}} = Show cluster id (red border){{attachment:coref_show-cluster_example.png}} (default off) * {{attachment:filter_apply.png}} = Filter selected span. (Only words within the selected span are highlighted. The filter may be removed clicking either {{attachment:filter_reset.png}} or {{attachment:refresh.png}} (Note this refresh the whole view). (Also accessible via rightlick mouse context menu) * {{attachment:filter_reset.png}} = Remove the selected span filter. (Also accessible via rightlick mouse context menu) * {{attachment:coref_hits.png}} = * {{attachment:coref_mode.png}} = Four different modes: * default = * Gold = * False positives = * False negatives = * {{attachment:refresh.png}} = Refresh View * {{attachment:coref_filter-singleton.png}} = Filter out all singletons (default on) * {{attachment:coref_filter-nonhl.png}} = Filter out non-highlight * {{attachment:coref_hint_errortypes.png}} = ICARUS distinguish between five error types (figure below) * {{attachment:coref_errortypes.png}} <> === Document Outline Graph: === Pairwise links output by an automatic coreference system can be treated as arcs in a directed graph. Linking the first mention of each cluster to an artificial root node creates a tree structure that encodes the entire clustering in a document. In the example solid nodes/arcs present the predicted annotation whereas dashed nodes/arcs present the gold annotation. Discrepancies between predicted and gold are marked with different colors that show the different types of errors. [[attachment:coref_document-set-graph.png|{{attachment:coref_document-set-graph.png||width=250}}]] ''' Document Outline Graph Toolbar ''' {{attachment:coref_document-set-graph-tb.png}} * {{attachment:coref_presenter-graph.png}} = Selected presenter style (in this example graph) * {{attachment:coref_selected_doc.png}} = Selected document * {{attachment:coref_show-property-outline.png}} = Show property outline * {{attachment:coref_show-context-outline.png}} = Show sentences (red border) below the tree [[attachment:coref_document-set-graph_text-on.png|{{attachment:coref_document-set-graph_text-on.png||height=150}}]] * [[#coreference-context-outline-tb|Context Outline Toolbar]] {{attachment:coref_document-set-graph_text-tb.png}} * {{attachment:coref_show-property.png}} = Open the Property Info dialog * {{attachment:refresh.png}} = Refresh View * {{attachment:clear_view.png}} = Clear View * {{attachment:preferences.png}} = Open the preferences * {{attachment:clear.png}} = Clear View * {{attachment:export.png}} = Save the current graph to XML file (may be imported later) * {{attachment:import.png}} = Import graph from XML file * {{attachment:print.png}} = Print current graph * {{attachment:delete.png}} = Delete selectd node(s)/edge(s) * {{attachment:copy.png}} = Copy selectd node(s)/edge(s) * {{attachment:refresh.png}} = Refresh View * {{attachment:undo.png}} = Undo last operation * {{attachment:redo.png}} = Redo last operation * {{attachment:zin.png}} = Increase zoom level * {{attachment:zdefault.png}} = Switch back to the default zoom level * {{attachment:zout.png}} = Decrease zoom level * {{attachment:hl_inedge.png}} = Highlight the incoming edge of the selected node (multiselection possible) (default off) * {{attachment:hl_outedge.png}} = Highlight the outgoing edge/edges of the selected node (multiselection possible) (default off) * {{attachment:zauto.png}} = Autofit zoom level to the current graph panel size (default off) * {{attachment:zcompress.png}} = Compress graph (right-left). Merge node/edge information into a node. (default off) * {{attachment:coref_show-sindex.png}} = Show the sentence index (red border) {{attachment:coref_show-sindex-graph.png|align="middle"}} (default on) * {{attachment:coref_mark-false-edge.png}} = Mark false edges(default on) * {{attachment:coref_mark-false-node.png}} = Mark false nodes (default on) * {{attachment:coref_include-gold-edge.png}} = Include gold edges(default off) * {{attachment:coref_include-gold-node.png}} = Include gold nodes(default off) * {{attachment:coref_hint_errortypes.png}} = ICARUS distinguish between five error types (figure below) * {{attachment:coref_errortypes.png}} <> === Document Outline Entity Grid: === Barzilay and Lapata [1] introduce the entity grid, a tabular view of entities in a document. Specifically, rows of the grid correspond to sentences, and columns to entities. The cells of the table are used to indicate that an entity is mentioned in the corresponding sentence. Entity grids provide a compact view on the distribution of mentions in a document and allows the user to see how the description of an entity changes from mention to mention. [[attachment:coref_document-set-entity.png|{{attachment:coref_document-set-entity.png||width=250}}]] [1] Regina Barzilay and Mirella Lapata. 2008. Modeling Local Coherence: An Entity-Based Approach. Computational Linguistics, 34(1):1–34. ''' Document Outline Entity Toolbar: ''' {{attachment:coref_document-set-entity-tb.png}} * {{attachment:coref_presenter-entity.png}} = Selected presenter style (in this example entity) * {{attachment:coref_selected_doc.png}} = Selected document * {{attachment:coref_show-property-outline.png}} = Show property outline * {{attachment:coref_show-context-outline.png}} = Show sentences (red border) below the tree [[attachment:coref_document-set-graph_text-on.png|{{attachment:coref_document-set-graph_text-on.png||height=150}}]] * [[#coreference-context-outline-tb|Context Outline Toolbar]] {{attachment:coref_document-set-graph_text-tb.png}} * {{attachment:coref_show-property.png}} = Open the Property Info dialog * {{attachment:refresh.png}} = Refresh View * {{attachment:clear_view.png}} = Clear View * {{attachment:preferences.png}} = Open the preferences * {{attachment:coref_autoadjust.png}} = Enable column auto-adjust for entity grid. (default on) * {{attachment:coref_show-pattern-grid.png}} = Show specified pattern (example pattern ''$form$ - %Type% - %Number%'') in entity grid (default on) * on = [[attachment:coref_show-pattern.png|{{attachment:coref_show-pattern.png||height=100}}]] * off = [[attachment:coref_show-pattern-off.png|{{attachment:coref_show-pattern-off.png||height=100}}]] * {{attachment:coref_document_example.png}} = Pattern for entity table (Note to use patterns {{attachment:coref_show-pattern-grid.png}} must be switched on). The list of valid pattern characters is shown below: * {{attachment:coref_pattern_characters.png}} * {{attachment:refresh.png}} = Refresh View * {{attachment:coref_hits.png}} = * {{attachment:coref_mark-false-node.png}} = Show false nodes (default on) * {{attachment:coref_include-gold-node.png}} = SHow gold nodes(default off) * {{attachment:coref_filter-singleton.png}} = Filter out all singletons (default on) * {{attachment:coref_hint_errortypes.png}} = ICARUS distinguish between five error types (figure below) * {{attachment:coref_errortypes.png}} [[#icarus-coref-index|{{attachment:navi_up.png|Back to index}}]][[#icarus-coref-index|Back to index]] ---- <> '''Context Outline Toolbar:''' {{attachment:coref_document-set-graph_text-tb.png}} ] * {{attachment:preferences.png}} = Open the preferences * {{attachment:coref_hl-type.png}} = Five different types to highlight spans. * Background: {{attachment:coref_hl_background.png}} * Foreground: {{attachment:coref_hl_foreground.png}} * Underlined: {{attachment:coref_hl_underlined.png}} * Italic: {{attachment:coref_hl_italic.png}} * Bold: {{attachment:coref_hl_bold.png}} * {{attachment:coref_hl-spans.png}} = Turn on/off the span highlight * {{attachment:coref_force-lb.png}} = Every sentence is in a new line. (default on) * {{attachment:coref_show-sindex.png}} = Show the sentence informations ''SentenceNo-BeginIndex-EndIndex'' (red border) Example: SentenceNo=0, BeginIndex=24 and EndIndex=25 {{attachment:coref_show-index_example.png||align="middle"}}. (default on) * {{attachment:coref_show-dh.png}} = Display the current document header (red border) {{attachment:coref_document-header_example.png||align="middle"}} (default on) * {{attachment:coref_show-offset.png}} = Show token offset (red border){{attachment:coref_show-offset_example.png}} (default off) * {{attachment:coref_show-cluster.png}} = Show cluster id (red border){{attachment:coref_show-cluster_example.png}} (default off) * {{attachment:filter_apply.png}} = Filter selected span. (Only words within the selected span are highlighted. The filter may be removed clicking either {{attachment:filter_reset.png}} or {{attachment:refresh.png}} (Note this refresh the whole view). (Also accessible via rightlick mouse context menu) * {{attachment:filter_reset.png}} = Remove the selected span filter. (Also accessible via rightlick mouse context menu) * {{attachment:coref_hits.png}} = * {{attachment:coref_scope}} = Corrent scope how many surrounding sentences should be displayed (1...5). Example assume the sentence with the mention is at index 3. by default dentence two and four are shown too. At most sentence 1,2, '''3''', 4 ... 8 will be displayed (scope set to five). * {{attachment:refresh.png}} = Refresh View * {{attachment:coref_filter-singleton.png}} = Filter out all singletons (default on) * {{attachment:coref_filter-nonhl.png}} = Filter out non-highlight * {{attachment:coref_hint_errortypes.png}} = ICARUS distinguish between five error types (figure below) * {{attachment:coref_errortypes.png}} [[#icarus-coref-index|{{attachment:navi_up.png|Back to index}}]][[#icarus-coref-index|Back to index]] <> == I. Allocation Format: == The coreference plugin provides a default format for representing standoff annotations in the form of allocation files: Each allocation file may contain multiple document blocks. The following text shows such an example document section: {{{ #begin document (bc/cctv/00/cctv_0000); part 000 #id eng-fo-opt.mdlGold #begin nodes ROOT 0-2-5 Gender:Neut;HEAD:3;Number:Sin;Type:Common; 0-5-5 Gender:Unknown;HEAD:5;Number:Unknown;Type:Common; 0-22-26 Gender:Neut;HEAD:26;Number:Sin;Type:Common; 0-24-25 Gender:Neut;HEAD:25;Number:Sin;Type:Name; 1-2-4 Gender:Neut;HEAD:4;Number:Sin;Type:Common; 1-2-2 Gender:Unknown;HEAD:2;Number:Plu;Type:Pronoun; 1-6-11 Gender:Unknown;HEAD:11;Number:Plu;Type:Common; #end nodes #begin edges ROOT>>0-2-5 Type:IDENT ROOT>>0-5-5 Type:IDENT ROOT>>0-24-25 Type:IDENT ROOT>>1-2-4 Type:IDENT ROOT>>1-2-2 Type:IDENT 1-2-2>>1-6-11 Type:IDENT 0-24-25>>1-15-16 Type:IDENT #end edges #end document }}} Each section (document, nodes or edges) is surrounded by the respective {{{"#begin
"}}} and {{{"#end
"}}} lines. The document section is special, in that it requires an identifier to be specified after the {{{"#begin document"}}} statement. This identifier ({{{"(bc/cctv/00/cctv_0000); part 000"}}} in the above example) has to match a previously defined identifier in the original document set the allocation file is referring to and is used to map the coreference structure of the standoff annotation to an actual document in the original data. In addition the document section may contain special comment lines ({{{"#id eng-fo-opt.mdlGold"}}} in the example) that are parsed as properties for the document by using the first character sequence after the hash sign till the first space as key and the reminder of that line of text as value. Note, however, that those properties are not used by the current visualizations and therefore can be ignored. Inside the document section there first has to be a declaration of the available nodes (mentions), followed by all the edges (coreference links between mentions). The nodes section contains one mention per line in the form of {{{"--"}}} with the following meaning of those fields: * '''' Numerical id of the sentence the mention resides in, with the first sentence having the id 0. * '''' The index of the first token within the sentence the mention spans across, starting at 1 for the first token in a sentence. * '''' The index of the last token within the sentence the mention spans across, starting at 1 for the first token in a sentence. The declaration of the artificial root node ({{{ROOT}}} in the example) is optional and can be omitted. Each mention other than the root node is allowed to have an arbitrary number of properties in the form of key-value pairs assigned to it. Properties are separated from the rest of the node declaration by a tab character ({{{\t}}}) and form a sequence of {{{":;"}}} definitions. The semicolon after the last key-value pair of a properties sequence is optional. Both {{{key}}} and {{{value}}} may consist of arbitrary non-empty strings (all line-break related characters such as {{{\n}}} or {{{\r}}} will however break the sequence). The edges section uses a similar syntax to express links between previously defined mentions, again in a one item per line basis. Each edge is defined by listing its terminal points, separated by {{{">>"}}}, e.g. {{{"ROOT>>0-2-5"}}}. Mentions that are not linked to a specific antecedent should be attached to the artificial root node. Properties for edges can be defined the exact same way as was described for mentions above. EBNF of the allocation format: {{{ = ? any non-empty character sequence, not containing control characters or any of \t, \n or \r ? ; = ? any non-negative integer number starting from 0 or 1 (depending on the context) ? ; = "\n" | "\r" | "\r\n" ; = " " ; = "\t" ; = , ":", ; = , { ";", }, [ ";" ] ; = , "-", , "-", ; = "ROOT" | ( , [ , ] ), ; = ( "ROOT" | ), ">>", , [ , ], ; = "#begin nodes", , { }, "#end nodes", , = "#begin edges", , { }, "#end edges", , = ; = ? a string as described in that does not start with either "begin" or "end" and does not contain whitespaces ? ; = "#", , , , ; = "#begin document", , , , { }, , "#end document" ; = , { , } ; }}} [[#icarus-coref-index|{{attachment:navi_up.png|Back to index}}]][[#icarus-coref-index|Back to index]]