TAC 2008 Update Summarization Task Guidelines

I. Overview

The goal of the TAC Summarization track is to foster research on systems that produce summaries of documents. The focus is on systems that can produce well-organized, fluent summaries of text.

Piloted in DUC 2007, the TAC 2008 Update Summarization task is to generate short (~100 words) fluent multi-document summaries of news articles under the assumption that the user has already read a set of earlier articles. The purpose of each update summary will be to inform the reader of new information about a particular topic. The content of each submitted summary will be evaluated against multiple model summaries based on Columbia University's Pyramid method.

The test data for the update summarization task will be available on the TAC 2008 Summarization home page on July 1. Submissions are due at NIST on or before July 11, 2008. Each team may submit up to three runs (submissions) for the update summarization pilot task, ranked by priority. NIST will judge the first- and second-priority run from each team and (if resources allow) up to one additional run from each team. Runs must be fully automatic.

Task scenario

In the scenario for the update summarization task, you will be writing summaries to meet the information needs of various users. Assume that each user is an educated, adult US native who is aware of current events as they appear in the news. The user is interested in a particular news story and wants to track it as it develops over time, so he subscribes to a news feed that sends him relevant articles as they are submitted from various news services. However, either there's so much news that he can't keep up with it, or he has to leave for a while and then wants to catch up. Whenever he checks up on the news, it bothers him that most articles keep repeating the same information; he would like to read summaries that only talk about what's new or different.

In the scenario, a user initially gives you a topic statement (title and narrative) expressing his information need. News articles about the story then arrive in batches over time, and you are asked to write a summary for each batch of articles, that addresses the information need of the user.

II. Test Data

The test dataset comprises approximately 48 topics. Each topic has a topic statement (title and narrative) and 20 relevant documents which have been divided into 2 sets: Document Set A and Document Set B. Each document set has 10 documents, where all the documents in Set A chronologically precede the documents in Set B. The documents will come from the AQUAINT-2 collection of news articles.

The topic statements and documents will be in the format given in the sample topic statements and sample documents below:

Sample topic statements
Sample document sets (password-protected, gzipped tar file)

Test topic statements and document sets will be distributed by NIST via the TAC 2008 Summarization web page. Only TAC 2008 participants who have completed both the Agreement Concerning Dissemination of TAC Results and the AQUAINT-2 Organization forms will be allowed access to the test topics and AQUAINT-2 documents.

III. Submission guidelines

Submission format

System task: For each topic, you will write 2 summaries (one for Set A and one for Set B) that address the information need expressed in the corresponding topic statement.

The summary for Set A should be a straightforward query-focused summary.
The update summary for Set B is also query-focused but should be written under the assumption that the user of the summary has already read the documents in Set A.

Each summary should be well-organized, in English, using complete sentences. A blank line may used to separate paragraphs, but no other formatting is allowed (such as bulleted points, tables, bold-face type, etc.). Each summary can be no longer than 100 words (whitespace-delimited tokens). Summaries over the size limit will be truncated.

Within a topic, the document sets must be processed in chronological order; i.e., you cannot look at documents in Set B when generating the summary for Set A. However, the documents within a document set can be processed in any order.

A submission to the update summarization task will comprise exactly one file per summary, where the name of each summary file is the ID of its document set. Please include a file for each summary, even if the file is empty. Each file will be read and assessed as a plain text file, so no special characters or markups are allowed. The files must be in a directory whose name should be the concatenation of the Team ID and the priority of the run. (For example, if the Team ID is "SYSX" then the directory name for the first-priority run should be "SYSX1".) Please package the directory in a tarfile and gzip the tarfile before submitting it to NIST.

Submission procedure

Each team may submit up to three runs, ranked by priority (1-3). NIST will evaluate the first- and second-priority runs from each team. If resources allow, NIST will evaluate one additional run from each team.

NIST will post the test data on the TAC Summarization web site on July 1 and results will have to be submitted to NIST by 11:59 p.m. (EDT) on July 11, 2008. Results are submitted to NIST using an automatic submission procedure. Details about the submission procedure will be emailed to the [email protected] mailing list before the test data is released. At that time, NIST will release a routine that checks for common errors in submission files including such things as invalid ID, missing summaries, etc. Participants should check their runs with this script before submitting them to NIST because the automatic submission procedure will reject the submission if the script detects any errors.

All processing of documents and generation of summaries must be automatic. No changes can be made to any component of the summarization system or any resource used by the system in response to the current year's test data.

IV. Evaluation

All summaries will first be truncated to 100 words. Where sentences need to be identified for automatic evaluation, NIST will then use a simple Perl script for sentence segmentation.

Model summaries: NIST will conduct a manual evaluation of summary content based on the Pyramid Method. Multiple model summaries will be used in the evaluation. Each topic statement and its 2 document sets will be given to 4 different NIST assessors. For each document set, the assessor will create a 100-word summary that addresses the information need expressed in the topic statement. The assessors will be guided by the following:

Assessor instructions for writing model summaries.

In addition to the Pyramid evaluation, the assessor will give an overall responsiveness score to each summary. The overall responsiveness score will reflect both content and readability and will be judged on the following scale:

Very Poor
Poor
Barely Acceptable
Good
Very Good

Evaluation Tools:

V. Schedule

TAC 2008 Update Summarization Task Schedule
July 1	Release of test data
July 14, 6:00 AM (EDT)	EXTENDED Deadline for participants' submissions
late August	Release of individual evaluated results

BACK to TAC 2008 Summarization Track Homepage

Last updated:
Comments to: [email protected]