Understanding the process of multi-document summarization: Content selection, rewriting and evaluation
Recent years have seen unprecedented interest in news aggregation and browsing, with dedicated corporate and research websites becoming increasingly popular. Generic multi-document summarization can enhance users' experiences with such sites, and thus the development and evaluation of automatic summarization systems has become not only research, but a very practical challange. In this thesis, we describe a general modular automatic summarizer that achieves state of the art performance, present our experiments with rewrite of generic noun phrases and of references to people, and demonstrate how distinctions such as familiarity and salience of entities mentioned in the input can be automatically determined. We also propose an intrinsic evaluation method for summarization that incorporates the use of multiple models and allows a better study of human agreement in content selection. Our investigations and experiments have helped us to understand better the process of summarization and to formulate tasks that we believe will lead to future improvements in automatic summarization. It is well-known that humans do not fully agree on what content should be included in a summary. Traditionally, this phenomenon has been studied on the level of sentences, but sentences are a rather coarse level of granularity for content analysis. Here, we introduce an annotation method for semantically driven comparison of several texts for similarities and differences on the subsentential level. When applied to human summaries for the same input, the method allows for a better examination of human agreement, and also provides the basis for an evaluation method that incorporates the notion of importance of a content unit in a summary. Given the variability of human choices, we next address the questions of what features in the input are predictive for inclusion of content in the summary. We use a large collection of human written summaries and the respective inputs to study the predictive effect of one feature that has been widely used in summarization: frequency of occurance. We show that content units that are repeated frequently in the input tend to be included in at least some human summaries and that human summarizers tend to agree more on the inclusion of frequent content units. In addition, human summaries tend to have higher likelihood under a multinomial model estimated from the input than automatic summaries do. This empirical investigation leads us to propose an algorithm for a context sensitive frequency-based summarizer. We show that context sensitivity and a good choice of composition function for estimating the weight of a sentence lead to a summarizer that performs as well as the best supervised automatic summarizer. We then turn to exploring methods for summary rewrite; that is, techniques for automatic modification of the original author's wording of sentences that are included in a summary. The added flexibility of subsentential changes has potential benefits for improving content selection as well as summary readability. We show that human readers prefer summaries in which references to people have been rewritten to restore the fluency of the text. We further develop our work on references to people, by presenting an approach to automatic classification of entity salience and familiarity, based on robustly derivable lexical, syntactic and frequency features. Such information is necessary for the generation of appropriate referring expressions.