Resources of the StatNLP group


Tools

TreeTagger

The TreeTagger is a tool for automatic annotation of text corpora with part-of-speech and lemma information.

RFTagger

The RFTagger is a POS tagger for fine-grained POS tagsets.

SFST

SFST is a toolbox for the implementation of morphological analysers and other programs which are based on finite state transducers.

SMOR

SMOR is a German finite-state morphology implemented in the SFST programming language. An older version of SMOR with a few sample lexicon entries comes with the SFST tools (see above).

LoPar

LoPar is a parser for head-lexicalized probabilistic context-free grammars.

BitPar

BitPar is an efficient parser for Treebank grammars.

Trace Parser

BitPar-based English parser which generates analyses with traces

YAP

YAP is a fast parser for feature-based grammars.

VPF

VPF is a parse forest browser for feature-structure based grammars.

Text corpora

Corpus name

Description

CQP

Source

Contact

Reuters

This is distributed on two CDs and contains about 810,000 Reuters, English Language News stories. It requires about 2.5 GB for storage of the uncompressed files.

Reuters Corpora @ NIST

German Wikipedia

German Wikipedia articles

(./)

http://www.de.wikipedia.org

LukasMichelbacher

English Wikipedia

English Wikipedia articles

(./)

http://www.en.wikipedia.org

LukasMichelbacher

Back to StatNLP Group