This is an archival page of the former StatNLP group at IMS, University of Stuttgart
StatNLP: Research Areas
Statistical parsing
The DFG project ISIC-D4 (SFB 732 - Incremental Specification In Context) aims to improve the accuracy of statistical parsing with (i) better parameter estimation techniques which will allow it to incorporate more local linguistic features and (ii) a predicate-argument clustering (PAC) model. The PI of this project is Helmut Schmid.
Machine translation
The DFG project MorphoSyntax uses advances in automatic linguistic analysis of syntax and morphology to advance statistical machine translation. The dependencies between morphology, syntax and translation should be directly modeled. This will lead to the creation of translation models and search algorithms that will dramatically improve translation quality for morphologically rich languages.
The EU project TTC (Terminology Extraction, Translation Tools and Comparable Corpora) aims at leveraging machine translation tools (MT tools), computer-assisted translation tools (CAT tools) and multilingual content management tools by automatically generating bilingual terminologies from comparable corpora in five European languages (English, French, German, Spanish and one under-resourced language, Latvian), as well as in Chinese and Russian. The PIs of this project are Ulrich Heid and Alex Fraser.
The DFG project ISIC-D5 Biased Learning for Disambiguation conducted research on using monolingual and multilingual corpora for improved statistical parsing. This project ended on June 30th 2010.
Information retrieval
The DFG project SVA pursues new approaches of using StatNLP methods in information retrieval. As part of a rich user interface, visualization techniques are employed.
The Google project PiggyBack conducts research on search-result-based NLP. Search results are used as a rich source of information about context.
Named entity recognition, coreference resolution, and sentiment analysis
The DFG project Nexus-E2 aims to develop methods for integrating external data into the Nexus context model using semantic methods. External data either originate in applications whose models (containing geographic as well as textual elements) need to be mapped to the context model; or external data consist of textual content, which we use to verify the consistency of the Nexus context model. We use machine learning methods for performing the mapping and an information extraction approach for text.
The Sukre project is concerned with semi-supervised coreference resolution: in order to overcome the bottleneck in training data for several NLP applications, the project proposes two remedies: cheap acquisition of new information and better exploitation of existing information. The focus is on coreference resolution, although the methods we want to develop are equally applicable to many other NLP tasks.
The DFG project ISIC-D7 views sentiment analysis as a highly context-dependent task. Many linguistic units can express a completely different sentiment depending on the topic or the situation in which they are used. This project aims at modeling this context-dependency through a generative statistical model which is able to represent relationships between words, to generalize these relationships, and to capture topical relations. Furthermore, interlingual contexts are modeled through graph-based representations.
Exemplar theory
The DFG project ISIC-A2 develops exemplar-theoretic models of a number of linguistic, phonetic and cognitive phenomena. The PIs of this project are Bernd Mobius and Hinrich Schutze.
Graph-theoretical methods for lexical acquisition
The goal of the DFG project WordGraph is to develop new approaches for the acquisition of lexical information from text corpora. These approaches are based on graph theory. The PIs of the project are Ulrich Heid and Hinrich Schutze.
Computational linguistics resources
The BMBF project D-SPIN is the German contribution to the European CLARIN-Projekt (Common Language Resources and Technology Infrastructure). D-SPIN provides the basis for a stable and sustainable infrastructure of language resources and language technologies, serving above all empirical research in humanities and social sciences. PIs are Ulrich Heid, Helmut Schmid and Hinrich Schutze.
Back to StatNLP Group