Robust semantic role labeling
The natural language processing community has recently experienced a growth of interest in domain independent semantic role labeling. the process of semantic role labeling entails identifying all the predicates in a sentence, and then, identifying and classifying sets of word sequences, that represent the arguments (or, semantic roles) of each of these predicates. In other words, this is the process of assigning a WHO did WHAT to WHOM, WHEN, WHERE, WHY, HOW etc. structure to plain text, so as to facil itate enhancements to algorithms that deal with various higher-level natural language processing tasks, such as - information extraction, question answering, summarization, machine translation, etc., by providing them with a layer of semantic structure on top of the syntactic structure that they currently have access to. In recent years, there have been a few attempts at creating hand-tagged corpora that encode such information. Two such corpora are FrameNet and PropBank. One idea behind creating these cor¬pora was to make it possible for the community at large, to train supervised machine learning classi ers that can be used to automatically tag vast amount of unseen text with such shallow semantic information. There are various types of predicates, the most common being verb predicates and noun predicates. Most work prior to this thesis was focused on arguments of verb predicates. This thesis primarily addresses three issues: i) improving performance on the standard data sets, on which others have previously reported results, by using a better machine learning strategy and by incorporating novel features, ii) extending this work to parse arguments of nominal predicates, which also play an important role in conveying the semantics of a passage, and iii) investigating methods to improve the robustness of the classi er across di erent genre of text.