Differences between revisions 22 and 23
Revision 22 as of 2014-02-03 14:51:35
Size: 33078
Editor: GregorThiele
Comment:
Revision 23 as of 2014-02-04 08:53:25
Size: 33097
Editor: GregorThiele
Comment:
Deletions are marked like this. Additions are marked like this.
Line 300: Line 300:
'''Variation N-Gram Toolbar''' {{attachment:search_result-em-tb.png}}
 * {{attachment:search_preferences.png}} = Open the preferences
 * {{attachment:search_reorder-graph.png}} = Reset variation n-gram list filter
'''Variation N-Gram Toolbar''' {{attachment:search_em-variation-tb.png}}
 * {{attachment:search_preferences.png}} = Open the preferences
Line 306: Line 305:
 * {{attachment:filter.png}} = Apply n-gram filter  * {{attachment:filter_apply.png}} = Apply variation n-gram filter
 * {{attachment:filter_reset.png}} = Reset variation n-gram list filter
Line 328: Line 328:
 * {{attachment:search_reorder-graph.png}} = Reset label distribution filter
Line 330: Line 329:
 * {{attachment:filter.png}} = Apply n-gram filter  * {{attachment:filter_apply}} = Apply label distribution filter
 * {{attachment:filter_reset}} = Reset label distribution filter
Line 332: Line 332:
 * {{attachment:search_em-labelsize.png}} = Specify n-gram size for distribution  * {{attachment:search_em-labelsize.png}} = Specify n-gram size for label distribution

ICARUS-Search-Perspective

The search_perspective.png perspective provides the following search types:

  • Dependency-Search
  • Coreference-Documents ¹
  • ErrorMining for Part-Of-Speech Tags ¹

  • ErrorMining for Dependency Structure ¹

    • ¹ under development will be available soon

Back To ICARUS Main Page

Index:

  1. How to set up a new search

  2. Search Menu

  3. Result Outline

  4. Dependency-Search:
    1. Search Parameter (Dependency-Search)

    2. Graph Query Editor (Dependency-Search)

    3. Result Outline (Dependency-Search)

  5. Error Mining:
    1. Search Parameter (Error Mining)

    2. Error Mining Query Editor

    3. Result Outline (Error Mining)

  6. Tutorials:
    1. Tutorial Dependency Search (passive constructions) with one grouping operator

    2. Tutorial Dependency Search (passive constructions with overt logical subjects)

    3. Tutorial Dependency Search (passive constructions with overt logical subjects and object)

How to set up a new search:

  1. Click on search_new.png to create a new search.

  2. Afterwards the search need to be configured:
  3. search_configuration.png

    • Type: Select the desired search mode (dependency, error mining, coreference,...)
    • Data-Set: Select the Treebank/Document
    • Query: Clicking search_query.png opens the query editor. There may be different types of query editors depending on the search type.

    • Parameters: Search pararameters depending on the search type.
  4. Execute Search using the search_execute.png button

  5. View the Result by double-clicking the search result or use the inspect-button search_inspect.png

Back To Index

Search Menu:

search_manager_menu.png

  • search_preferences.png = Open the preferences

  • search_new.png = Create a new search

  • search_execute.png = Executes the search. Note if no data-set was set the button is disabled search_execute_inactive.png

Search History Toolbar: search_history-tb.png . Every executed search is listed in the search history. The history is available until you close your ICARUS session. The figure shows three search history items. During the search process the icons to the left may change:

  • search_runing+loading.png Search is active (first icon) but the target data-set is not loaded yet (second icon)

  • search_runing+loaded.png Search is active (first icon) and target data-set is loaded (second icon)

  • search_finished+loaded.png Search finished successful (first icon) and target data-set is loaded (second icon)

  • search_icon_error.png Search was not successful.

  • search_clearhistory.png = Clear all search history items

  • search_remove1search.png = Remove the selected search result from the history

  • search_viewquery.png = Display the query of the selected search

  • search_inspect.png = Display the result of selected search

  • search_cancel.png Cancel selected search

Back To Index

Result Outline:

attachment:search_result_1D.png

  • Aggregated result visualization depending on the number of grouping operators (dimensions) for up to three groups (3D)
  • Result highlighting for instances of query constraints
  • Fully customizable graph visualization
  • Easy navigation through results for up to three groups (3D)

Back To Index

Search Parameter (Dependency-Search):

  • Search-Mode: Non-Exhaustive (stop after first hit), Exhaustive (add each sentence to the result at most one) and Exhaustive search with Grouping

  • Direction: Left-To-Right or Right-To-Left

  • Case-Sensitiv: On/Off

  • Result Limit: limit the search result (number of hits)

Back To Index

Graph Query Editor (Dependency-Search):

search_query-editor-tab.png This tab is used to build a query. Graph Editor Toolbar: search_graph-tb.png

  • search_preferences.png = Open the preferences

  • search_arc-layout.png = Change the current graph layout. There are three different layout types available search_qe_layouts.png

    1. Arc layout search_qe_arc-layout.png

    2. No layout search_qe_no-layout.png

    3. Tree search_qe_tree-layout.png

  • search_clear.png = Clear graph panel - every nodes/edges are deleted

  • search_export.png = Save the current search graph to XML file (may be imported later)

  • search_import.png = Import a search graph XML file

  • search_print.png = Print the current graph

  • search_add-node.png = Add a new node to the current search graph

  • search_add-disjunction.png = Adds a new disjunction to the current search graph

  • search_add-edge.png = Connects two nodes (two nodes must be selected before this action can be performed)

  • search_add-pedge.png = Connects two nodes with a precedence relation (two nodes must be selected before this action can be performed)

  • search_delete.png = Delete selected node/edge (multi selection possible)

  • search_edit.png = Opens the edit node/edge dialog (Instead of using this button you may doubleclick a node/edge to open the edit dialog)

  • search_clone.png = Duplicate (copy and insert) the selected nodes/edges. Quick way to duplicate a graph. Note: edges are only copied when their source and target node is selected.

  • search_copy_button.png = Copy and the selected nodes/edges. Note: edges are only copied when their source and target node is selected. (strg+c)

  • search_paste_button.png = Paste previously copied nodes/edges. (strg+p)

  • search_reorder-graph.png = Redraw the graph, can be useful while adding new nodes, edges or constraints may mess up the graph layout. Example (arc-layout): (left nodes/edges unsorted; right nodes/edges reorderes) search_graph-redraw.png

Note: The copy&paste nodes/edges can be used to copy graphs from/into other perspectives (e.g. Tutorial 1D,..)

  • search_undo.png = Undo the last graph editor operation

  • search_redo.png = Redo the last graph editor operation

  • search_zin.png = Increase zoom level

  • search_zdefault.png = Switch back to the default zoom level

  • search_zout.png = Decrease zoom level

  • search_zauto.png = Autofit zoom level to the current graph panel size (default off)

  • search_zcompress.png = Compress graph (right-left). Merge node/edge information into a node. Search annotation highlight is never merged and always visible even. (default off)

  • search_toggle_c-d.png = If there are different (unconnected) graphs A, B the search will use the following query (A v B).

Text Query Editor Toolbar: search_query-tb-text.png

  • search_undo.png = Undo the last text editor operation

  • search_redo.png = Redo the last text editor operation

  • search_copy_button.png = Copy and the selected text. (strg+c)

  • search_paste_button.png = Paste previously copied text. (strg+p)

  • search_select-all.png = Select the entire query text (strg+a)

  • search_clearhistory.png = Clear the text query panel.

  • search_save-graph-to-desc.png = Save query graph to the current selected search history item

  • search_sync-to-graph.png = Generate search graph from text query

  • search_sync-to-text.png = Generate text query from search graph

Back To Index

Result Outline (Dependency-Search):

search_result-tab.png Use this tab to browse the search results. The visialization may be seperated into four differnet presentation styles. We describe the different types in the following section.

Result Outline Toolbar: search_result-base-tb.png

  • search_preferences.png = Open the preferences

  • search_result-search-desc.png Short query description and number of matches (here 3 grouping operators and 10 matches)

  • search_reorder-graph.png = Refresh the result outline

  • search_export.png = Save the current search result to a XML file (may be imported later)

  • search_import.png = Import search result XML file

  • search_clearhistory.png = Close the result outline

  • search_result-grouping-desc.png Grouping operator search_grouping-operator.png result informations. The corresponding search_grouping-operator.png color and the number of matches for each search_grouping-operator.png (ICARUS supports up to three grouping operators) (In this example we have 1. lemma- search_grouping-operator.png (red) 8 matches, 2. lemma- search_grouping-operator.png (green) 5 matches and 3. pos- search_grouping-operator.png 4 matches)

0. No grouping operator search_grouping-operator.png is used.

  • Query: search_query-0D.png

  • Text Query: [lemma=be [relation=VC, pos=VBN]]
  • Result Toolbar: search_result-tb-0D.png

The result is presented as a list of sentences. Every occurence that matches the query is colored blue. Results (0D) attachment:search_result_0D.png

1. One grouping operator search_grouping-operator.png is used.

  • Query: search_query-1D.png

  • Text Query: [lemma=be [relation=VC, lemma<*>1, pos=VBN]]

  • Result Toolbar: search_result-tb-1D.png

All lemma types found are shown in the list (red) to the left. The user may select one lemma type to get all instances with matching query. Every occurence that matches the query is colored blue and the "grouped" lemma colored red. Results (1D) attachment:search_result_1D.png

  • Options: search_result-option-1D.png

    • search_numeric_switch.png = Switch between numeric/percentage result numbers (total)

    • search_sort.png = Sort by wordform or by occurence (ascending/descending)

    • search_reset_sort.png = Reset list sorting

2. Two grouping operators search_grouping-operator.png are used.

  • Query: search_query-2D.png

  • Text Query: [lemma=be [relation=VC, lemma<*>1, pos=VBN [relation=LGS, form=by [relation=PMOD, lemma<*>2]]]]

  • Result Toolbar: search_result-tb-2D.png

The result is presented as a table. Grouping operator one (red) is on the y-axis and grouping operator two (green) on the x-axis (Note: The x-/y-axis may be fliped clicking on search_flip-table.png ). Every occurence that matches the query is colored blue. Results (2D) attachment:search_result_2D-a.png attachment:search_result_2D-b.png

  • Options: search_result-option-2D.png

    • search_numeric_switch.png = Switch between numeric/percentage result numbers (total)

    • search_sort.png = Sort y-axis by wordform or by occurence (ascending/descending)

    • search_sort_x-axis.png = Sort x-axis by wordform or by occurence (ascending/descending)

    • search_flip-table.png = Swap the x-/y-axis (e.g.: (old) x-axis = (new) y-axis and vice versa)

    • search_reset_sort.png = Reset table sorting

3. Three grouping operators search_grouping-operator.png are used.

  • Query: search_query-3D.png

  • Text Query: [lemma=be [relation=VC, lemma<*>1, pos=VBN [relation=LGS, form=by [relation=PMOD, lemma<*>2]][relation=OBJ, lemma<*>3]]]

  • Result Toolbar: search_result-tb-3D.png

The result is presented as a list of sentences. Every occurence that matches the query is colored blue. Results (3D) attachment:search_result_3D-a.png attachment:search_result_3D-b.png

  • Options: search_result-option-3D.png

    • search_numeric_switch.png = Switch between numeric/percentage result numbers (total)

    • search_sort.png = Sort by wordform or by occurence (ascending/descending)

    • search_reset_sort.png = Reset list sorting

    • search_sort.png = Sort y-axis by wordform or by occurence (ascending/descending)

    • search_sort_x-axis.png = Sort x-axis by wordform or by occurence (ascending/descending)

    • search_flip-table.png = Swap the x-/y-axis (e.g.: (old) x-axis = (new) y-axis and vice versa)

    • search_reset_sort.png = Reset table sorting

    • search_3D-reorder.png = Change the grouping operor ([0] = list, [1] = table y-axis and [2] = table x-axis). In this example search_3D-reorder-dialog.png we have [0] = first search_grouping-operator.png (red), [1] = second search_grouping-operator.png (green) and [3] = third search_grouping-operator.png (brown)

At the lower part of the graph panel is the text outline. The list contains all search results of the selected instance. The selected sentence is shown in the graph panel.

Toolbar: text-tb.png

  • text_outline.png = Toggle a textpanel to copy the selected sentence. (see below)

    • text_outline-on.png

  • item-first.png = First sentence

  • item-previous.png = Previous sentence

  • sentence-nr.png = Shows the current selected sentence (first number) and the total sentences (last number). In the example figure sentence 2 of 3 is selected. The user may navigate using the arrows to the left/right. It is possible to enter the sentence no. in this field by pressing "return" the sentence pops up. Note that the sentence numbers belong to the the internal index (the corpus index may differ for example if one sentence number have been skipped)

  • item-next.png = Next sentence

  • item-last.png = Last sentence

Back To Index

Search Parameter (Error Mining):

  • Replace all Numbers by Special Token: When the number wildcard replacement filter is enabled the algorithm checks for every word-form during the error mining process if the current word is a number. This is done using a regular expression that flags all words where the first letter is a number (0...9). These words will be replaced with a special NumberWildcard token. It provides the error mining algorithm with the capability to compare strings that contain different numbers and treat them equally in order to find variation within the non-number word-forms.

  • Use Fringe Heuristic: The fringe heuristic is used to filter n-grams where the nucleus occurs at the start/end of the n-gram. This is useful because when the nucleus is surrounded by words the probability that we find an error is higher.

  • Maximum NGram Size (passes): Limit the maximum n-gram size (size = algorithm iterations). By default this parameter is zero which is equivalent to ∞ .

  • Maximum Sentences for Input: The sentence limitation is used to limit the number of sentences that are used for the error mining. Starting at sentence one until the specified value x is reached. For example with a limit of 10,000 at most the first 10,000 sentences of the specified corpus will be used during the error mining process. Note: Using this option has a strong influence on the results and should be used carefully, because limiting the input data may leak the variation for one word. By default this value is "0" (zero) and the engine will use all sentences of the given corpus.

  • Show only NGrams with a size of: Even when the fringe heuristic is enabled the results will still contain uni-/bi-grams. Using the Show only NGrams with a size of option allows the user to filter the resulting n-grams. For example if the value is set to "1", the resulting list will contain 2-, 3-, n-grams, ... .

  • Create XML Output File: Using the Output to File option creates an xml-formatted file. It contains information about the word-forms, tags, tag-count and highlight information. It is formatted in a human-readable way so that its possible to do error detection even without the graphical support of the error mining plug-in. (By default no outputlocation is set in the search_preferences.png and the user will be asked for the desired filelocation when the error mining task is complete)

Back To Index

Error Mining Query Editor:

search_query-editor-tab.png This tab is used to build a query. A single query item contain of the following parts:

  1. Include Tag (boolean) = All tags that are ignored (Include Tag=true) are mapped onto a special "ignoredtag"-subclass. This option has priority over the new tag definition.

  2. Tagclass (string) = If the current tag matches the Tagclass it may be included or assigned with a new Tag (if speficied)

  3. new Tag (string) = The new tag for all tags that have a matching Tagclass within the query list specified in ii.)

If the current tag is not found within the query list it is neither ignored nor does it get a new tag assigned and the algorithm just continues the normal way taking the current tag. The benefit of this design is that there is no need to put the whole tag-set into the query system.

The Error Mining Query Editor provides the functionality to group tags together, rename tags or exclude tags from the search. It is organized in three parts attachment:search_qe-errormining-view.png. On the left side there are buttons to create/edit or delete a single query:

  • search_qe-add.png = Add a new ngram query item

  • search_qe-edit.png = Edit selected ngram query item attachment:search_qe-edit-ngramtag.png

  • search_qe-delete.png = Delete selected ngram query item

In the middle there is an overview over all specified queries represented as a list. attachment:search_qe-errormining-list.png

Below are three buttons to manage the ngram query item list:

  • search_qe-load.png = Load ngram query xml file

  • search_qe-save.png = Save all ngram query items to xml

  • search_qe-reset.png = Remove all ngram query items from list

The capability of saving a query to an extensible mark-up file (xml) and load it again later is useful if the user specifies a query and wants to use it later in different corpora. Using reset will delete all specified query items.

Back To Index

Result Outline(Error Mining):

search_result-tab.png Use this tab to browse the search error mining results. ICARUS provides two views for browsing the potential errors. The search_variation-ngrams.png view shows a list of all variation n-grams found whereas the second view search_label-distribution.png shows label distribution over word forms.

Result Outline Toolbar: search_result-em-tb.png

  • search_preferences.png = Open the preferences

  • search_result-search-desc.png Short query description and number of matches (note grouping is never used always "0" when viewing an error mining result)

  • search_reorder-graph.png = Refresh the result outline

  • search_export.png = Save the current search result to a XML file (may be imported later)

  • search_import.png = Import search result XML file

  • search_clearhistory.png = Close the result outline

Variation N-Gram View (Error Mining):

attachment:search_result_em-pos-variation.png

Variation N-Gram Toolbar search_em-variation-tb.png

  • search_preferences.png = Open the preferences

  • search_em_text-filter.png = Filter the variation n-gram list using the specified string

  • search_em_min-gram.png = Minimum n-gram size for items within the list

  • search_em_max-gram.png = Maximum n-gram size for items within the list

  • filter_apply.png = Apply variation n-gram filter

  • filter_reset.png = Reset variation n-gram list filter

  • sort_asc.png = Sort the n-gram ascending list by n-gram length

  • sort_desc.png = Sort the n-gram descending list by n-gram length

Each variation entry has the following format "Listindex) n-gram-length Occurence-Count ngram"

Example n-gram: search_em-single-result.png.

  • "1)" List Index
  • "1-gram" Length of the variation n-gram (here 1)
  • "100+" Variation n-gram occurence count. (100+ = more than 100 matches)
  • "'s" Every variation nucleus is colored purple

When the user selects one n-gram additional information about the nucleus (part-of-speech tags, tagcount) is displayed below the list. To inspect the result the user may double click on an entry from the variation n-gram lis. In the example he would recieve all sentences with the nucleus "'s" (POS, VBZ and NNP) clicking on search_em-single-result.png

If he is only interested in instances where "'s" was tagged as VBZ first he have to select the n-gram in the list and anfterwards double click on one of the lines in the lower part of the window search_em-single-tag.png that contain that particular combination of word form and part-of-speech tag. Each time the user clicks on a n-gram, a new tab will be created, allows the user to jump back to previous results without having to recreate them (run the search again).

Label Distribution View (Error Mining):

attachment:search_result_em-pos-distribution.png

Variation Label Distribution Toolbar search_em-distribution-tb.png

  • search_preferences.png = Open the preferences

  • search_em_text-filter.png = Filter the label distribution list using the specified string

  • filter_apply = Apply label distribution filter

  • filter_reset = Reset label distribution filter

  • search_show_ngram.png = Show sentences for the n-gram

  • search_em-labelsize.png = Specify n-gram size for label distribution

  • search_em-distribution.png = Generate new label distribution for specified search_em-labelsize.png n-gram size

  • search_export.png = Export barchart to "portable network graphics" (.png) (export settings can be configured in the preferences search_preferences.png

On the left a list of unique label combinations is shown. Selecting one displays a list of word form that occur with exactly these tags in the corpus. This list is below search_result-em-label-dist-b.png. To the right the frequencies of the different labels are shown in a barchart. The left-most bar (here red) for each label always shows the total frequency. The user may select more words froms from the list to add additional bars to the chart that show the frequencies for eacht selected word form.

Results Presentation:

attachment:search_result_em-pos-distribution.png

Back To Index

1) Tutorial Dependency Search (passive constructions) with one grouping operator:

If the the user doesn't exactly know the how passive constructions are annotated in a treebank. Then he can use e.g. mate-tools or weblicht to parse a sentence contains a passice construction and copy&paste the structure to the search graph.

  1. Parsed sentence "Mary was kissed by a boy." search_example_mt.png .

  2. Select the passice construction search_example_mt_selected.png

  3. Copy the selected cells and edges search_copy.png and switch to the search_perspective.png

  4. Paste selected cells and edges into the search query editor window search_paste.png

  5. The resulting graph when using the arc-layout (recommended) search_arc-layout.png search_cp-graph-arc.png

  6. In the following step the search graph (query) will be generalized (double clicking the edge / nodes to open the edge/node editor).
    1. Node 1 properties search_edit-node.png changed to search_edit-node-b.png

    2. Edge properties search_example-edge.png changed to search_example-edge-b.png

    3. Node 2 properties search_example-node2.png changed to (added grouping operator <*>) search_example-node2-b.png search_example-node2-c.png

    4. These changes result in a new more generalized version of the search graph (below is the textual query representation) search_example_sg+text.png This query matches passive constructions in English as annotated in the CoNLL08 Shared Task data set.

  7. Results (1D) attachment:search_result_1D.png

Back To Index

2) Tutorial Dependency Search (passive constructions with overt logical subjects):

We are interested in passive constructions with overt logical subjects, grouped by lemma of the verb and the lemma of the logical subject. We may use the search graph for passive constructions or build the query completly manually (shown here).

  1. First of all clear the graph editor panel (if there is any remaining graph) using search_clear.png

  2. Add four new nodes search_add-node.png you may "automatic reorder" them by clicking search_reorder-graph.png

  3. Your graph editor should look like search_t2_4nodes.png

  4. There are two ways connecting nodes / adding edges
    1. Select two nodes search_t2_addingedge-a.png and connect them clicking on search_add-edge.png

    2. Place the cursor in the middle of the desired (source) node. A green border will show up search_hl-node.png . Hold the left mousebutton and move to the (target) node. When you reached the target node again a green border shows up. Release the left mousebutton to draw an edge between those node search_t2_addingedge-b.png

  5. Double click on the nodes/edges to specify the constraints. (Note: Adding constraints may mess up the graph layout. You may use search_reorder-graph.png to redraw the graph)

    1. Node 1: Lemma = be search_t2-n1.png

    2. Node 2: Lemma = <*> (red grouping operator); Part-Of-Speech = VBN search_t2-n2.png

    3. Node 3: Form = by search_t2-n3.png

    4. Node 4: Lemma = <*> (green grouping operator) search_t2-n4.png

    5. Edge 1: Relation = VC search_t2-e1.png

    6. Edge 2: Relation = LGS search_t2-e2.png

    7. Edge 3: Relation = PMOD search_t2-e3.png

  6. When every node, edge was linked and there was no error setting the constraints above the search graph should look like this: search_t2-sg.png

    • (Textual query: [lemma=be [relation=VC, lemma<*>1, pos=VBN [relation=LGS, form=by [relation=PMOD, lemma<*>2]]]])

  7. Results (2D) attachment:search_result_2D-a.png attachment:search_result_2D-b.png

Back To Index

3) Tutorial Dependency Search (passive constructions with overt logical subjects and object):

In tutorial 1) we showed how to create a query using a copied graph from the parser. Tutorial 2) shows how to create a query from scratch. In tutorial 3) we will extend the search graph used in 2) with an additional grouping operator.

  1. We start with the following search graph search_t2-sg.png

  2. Add one new node search_add-node.png you may "automatic reorder" them by clicking search_reorder-graph.png

  3. Your graph editor should look like search_t3-n5added.png

  4. Connect the "red" node with the new node using one of the following options
    1. Select the node search_t3-addedge-c.png and connect them clicking on search_add-edge.png

    2. Place the cursor in the middle of node 2. A green border will show up search_t3-addedge-a.png . Hold the left mousebutton and move to the new node. When you reached the target node again a green border shows up search_hl-node.png . Release the left mousebutton to draw an edge between those node search_t3-addedge-b.png

  5. Double click on the new node/edge to specify the constraints. (Note: Adding constraints may mess up the graph layout. You may use search_reorder-graph.png to redraw the graph)

    1. Node 5: Lemma = <*> (browngrouping operator) search_t3-n5.png

    2. Edge 4: Relation = OBJ search_t3-e4.png

  6. When every node, edge was linked and there was no error setting the constraints above the search graph should look like this: search_t3-sg.png

    • (Textual query: [lemma=be [relation=VC, lemma<*>1, pos=VBN [relation=LGS, form=by [relation=PMOD, lemma<*>2]][relation=OBJ, lemma<*>3]]])

  7. Results (3D) attachment:search_result_3D-a.png attachment:search_result_3D-b.png

Back To Index

extern/ICARUS-Search-Perspective (last edited 2014-04-25 12:09:32 by GregorThiele)