Probabilistic Grammars and DataOriented Parsing
Lecturers: Detlef
Prescher,
Remko Scha, and
Khalil
Sima'an
Date: Tuesday, 46 p.m.
Location: P.018
First Lecture: February 3, 2004
Het college wordt in het Engels gegeven! This course will be given
in English!
Content: Probabilistic grammars start where formal language
theory stops. Ambiguity resolution, robustness, efficiency,
learning/adaption and estimation from data is the starting point for
probabilistic grammars. The subject matter for these mathematical
models is Natural Language Processing (as oppose to Formal Language
Processing  e.g. compilation). The course concentrates on syntactic
parsing with ambiguity resolution as the core subject for
probabilistic grammars. We will look mainly at the following subjects
for Probabilistic ContextFree Grammars (PCFGs): (1) Mathematical
aspects of probabilistic grammars, (2) Estimation Theory for
supervised and unsupervised learning, (3) Algorithms for estimation of
PCFGs, (4) Efficient parsing under PCFGs. Then we will consider
aspects of Data Oriented Parsing and how it resembles/deviates from
PCFGs: parameter estimation, efficiency and linguistic issues. The
course has a slot for student presentations of a paper from a list of
papers provided by the lecturers.
References
Beside Remko Scha's background
material on DataOriented Parsing (DOP), (parts of) the
following papers or books are especially interesting for the class:
Syllabus
FIRST PART (Lectures 17, by Detlef Prescher)
Keywords: Motivation for working with probabilistic grammars (the
need to deal with uncertainty); definition of probabilistic
contextfree grammars (PCFG); statistical
estimation theory and how it applies to PCFGs; formal notions of a
corpus and a probability model;
maximumlikelihood estimation (MLE); relation of MLE to
relativefrequency estimation; other kinds of estimators:
maximumentropy estimation and the expectationmaximization (EM)
algorithm; treebank training for PCFGs and unsupervised training of
PCFGs using the EM algorithm; efficient parsing
under PCFGs: Viterbi, inside and outside algorithm; the insideoutside
algorithm for unupervised training of PCFGs and its relation to EM.
 Lecture 1 (February 3): Formal definition
of a contextfree grammar (CFG); Chomsky's very first contextfree
syntax tree; CFGs can be used as parsers and generators; sentence
generation with a CFG is a nondeterministic process; This leads to a
formal definition of PCFGs that is based on rule, tree and sentence
probabilities.
Reading: The subsection "Background: Probabilistic
Modeling of CFGs" of Section 5 of Prescher (2003).
Recommended: Chapter 9 of the book of Jurafsky and Martin.

Lecture 2 (February 10): Nonstandard PCFGs; Resolving
ambiguities with PCFGs; Two examples: PP attachment and
conjunctions; PCFGs can deal with these, but a
transformation of the underlying CFG should be done beforehand; The
parentencoding technique; Definition of treebank grammars.
Reading: The subsection "Background:
Resolving Ambiguities with PCFGs" of Section 5 of Prescher (2003).
Highly recommended: Charniak (1996), Mark Johnson (1999), and
Klein and Manning (2003).
For freaks: Sanchez and Benedi (1997), and Chi and Geman
(1998).
 Lecture 3 (February 17): General
Estimation Theory: Definition of a corpus and the relativefrequency
estimate and (restricted or unrestricted) probability models;
Definition of corpus probabilities and the maximumlikelihood
estimation; Problems for MLE: existence, uniqueness, and efficiency;
MLE and relativefrequency estimation: The relativefrequency
estimate is a ML estimate for a (restricted or unrestricted)
probability model, if and only if the relativefrequency estimate is a
model instance; The information inequality of information theory;
MLE and relativeentropy estimation: Maximizing the corpus
probability is equivalent to minimizing the relative entropy with
respect to the relativefrequency estimate.
Reading: Section 2 of Prescher (2003).
 Lecture 4 (February 24): Review of the
general estimation theory; Estimation of PCFGs: Definition of a CFG's
probability model; A CFG's probability model is restricted if the
CFG is read off from a treebank and if it generates a full parsetree
outside the treebank: The relativefrequency estimate on the treebank
is NOT an instance of this CFG's probability model! On the one
hand, this is good as ML estimates for unrestricted models necessarily
overfit their training corpus; On the other hand, this is bad: Is
there a ML estimate for a CFG's probability model on a given
treebank?! Is it unique?! Can it be efficiently calculated?!
Reading: The beginning of subsection "MaximumLikelihood
Estimation of PCFGs" of Section 5 of Prescher (2003).
 Lecture 5 (March 2): Estimation of
PCFGs: The ML estimate for a CFG's probability model on a given
treebank is the treebank grammar!
Reading: The end of subsection "MaximumLikelihood
Estimation of PCFGs" of Section 5 of Prescher (2003).
 Lecture 6 (March 9): Definition of the
input of the EM algorithm (a symbolic analyzer, an incompletedata
corpus, a completedata model, and a randomly generated starting
instance); Definition of the procedure of the EM algorithm (a loop
over E and Msteps: an Estep generates a completedata corpus
expected by the current model instance, and a Mstep looks for a
maximumlikelihood estimate for the completedata model on the
expected completedata corpus); Theorem about the output of the EM
algorithm (a sequence of instances of the completedata model,
socalled EM reestimates, such that the sequence of probabilities
allocated to the incompletedata corpus is monotonic increasing);
So the EM algorithm aims at doing MLE on incompletedata by
performing a series of MLE steps on completedata. Typically, the EM
algorithm will be applied in settings, where incompletedata MLE is
difficult but completedata MLE is easy! Example: The EM
algorithm for PCFGs.
Reading: Section 3 and the subsection "EMTraining of PCFGs"
of Section 5 of Prescher (2003).
 Lecture 7 (March 16): The
semiringparsing algorithm (comprising many standard parsing
algorithms like the parseforest, the CKY, the count, the inside, and
the Viterbi algorithm).
Reading: Goodman (1999).
MIDTERM PROJECT
(Three weeks: From March 16 to April 6)
The midterm project consists of an experiment and a report on this
experiment. It shall comprise the homework for about 7 lectures. [Note also
that there are no lectures in the week of March 2226.] The submission deadline for the project report is April
13.
SECOND PART (Lectures 814, mainly by Remko Scha)
Keywords: Introduction to dataoriented parsing (DOP). This final part
will also consist of presentations given by the students (Relevant
papers will be put on this page. The topics are: Shallow parsing,
parsing with PCFGs and unification grammars, dataoriented parsing,
etc.).
 Lecture 8 (March 30): DataOriented
Parsing I (ppt)
 Lecture 9 (April 6): DataOriented
Parsing II (ppt)
 Lecture 10 (April 13): DataOriented
Parsing and Estimation Theory
 Lecture 11 (April 20): Presentation (Klein and Manning, 2003).
 Lecture 12 (April 27): Presentation (Sabine Schulte im Walde, 2000)
 Lecture 13 (May 4): Presentation
(Bod and Kaplan, to appear)
 Lecture 14 (May 11): Presentation (Mike de Kreek, 2003)
FINAL PROJECT (Four weeks: From May 11
to June 8)
The final project is similiar to the midterm project but shall be a
bit bigger. The submission deadline for the project report is June 15.
The final mark consists of: 40%*midterm project
+ 40%*final project + 20%*presentation