TAC 2010 Guided Summarization Task Guidelines

(Also see general TAC 2010 policies and guidelines at http://tac.nist.gov/2010/)

Contents:

Overview
Test data
Submission guidelines
Evaluation
Schedule

Overview

The Guided Summarization task aims to encourage summarization systems to make a deeper linguistic (semantic) analysis of the source documents instead of relying only on document word frequencies to select important concepts. The guided summarization task is to write a 100-word summary of a set of 10 newswire articles for a given topic, where the topic falls into a predefined category. There are five topic categories:

Accidents and Natural Disasters
Attacks
Health and Safety
Endangered Resources
Investigations and Trials

Participants (and human summarizers) are given a list of important aspects for each category, and a summary must cover all these aspects (if the information can be found in the documents). The summaries may also contain other information relevant to the topic.

Additionally, an "update" component of the guided summarization task is to write a 100-word "update" summary of a subsequent 10 newswire articles for the topic, under the assumption that the user has already read the earlier articles. (The update summarization task was run in the Summarization track of TAC 2008 and TAC 2009.) The TAC 2010 update summarization component is based on the following scenario: A user is interested in a particular news story and wants to track it as it develops over time, so she subscribes to a news feed that sends her relevant articles as they are submitted from various news services. However, either there's so much news that she can't read all the articles, or she reads some articles before leaving for a while, and then wants to catch up. Because many of the articles keep repeating the same information, she would like a summary of the important points of the articles, that provides new information from what she's already read.

The test data for the guided summarization task will be available on the TAC 2010 Summarization Track home page on June 22, 2010. Submissions are due at NIST on or before July 6. Each team may submit up to two runs (submissions), and all runs will be judged. Runs must be fully automatic.

Test Data

The test dataset is composed of approximately 44 topics, divided into five categories: Accidents and Natural Disasters, Attacks, Health and Safety, Endangered Resources, Investigations and Trials. Each topic has a topic ID, category, title, and 20 relevant documents which have been divided into 2 sets: Document Set A and Document Set B. Each document set has 10 documents, and all the documents in Set A chronologically precede the documents in Set B. Unlike in previous years, there is no topic narrative, because the category and its aspects already define what information the reader is looking for.

Test topics and document sets will be distributed by NIST via the TAC 2010 Summarization web page. Teams will need to use their TAC 2010 Team ID and Team Password to download data and submit results through the NIST web site. To activate the TAC 2010 team ID and password for the summarization track, teams must submit the following forms to NIST, even if these forms were already submitted in previous TAC cycles:

Agreement Concerning Dissemination of TAC Results
AQUAINT Organization form
AQUAINT-2 Organization form

Forms are available at the TAC User Agreements web page. When submitting forms, please also include the TAC 2010 team ID, the email address of the main TAC 2010 contact person for the team, and a comment saying that the form is from a TAC 2010 registered participant.

Documents

The documents for summarization come from the AQUAINT and AQUAINT-2 collections of news articles. The AQUAINT corpus of English News Text consists of documents taken from the New York Times, the Associated Press, and the Xinhua News Agency newswires (LDC catalog number LDC2002T31). The collection spans the years 1999-2000 (1996-2000 for Xinhua documents). The AQUAINT-2 collection spans the time period of October 2004 - March 2006; articles are in English and come from a variety of sources including Agence France Presse, Central News Agency (Taiwan), Xinhua News Agency, Los Angeles Times-Washington Post News Service, New York Times, and the Associated Press.

Test Data Format

The topic statements and documents will be in a similar format as the TAC 2009 Update Summarization Task, except this year there is no topic narrative. Instead, each topic's category ID is indicated in the topic tag. Sample topic statements and documents are included below:

Sample topic statements
Sample document sets (Available as Past TAC Data: 2010 Guided Summarization)

Submission guidelines

System task

Given a topic, the task is to write 2 summaries (one for Document Set A and one for Document Set B) that describe the event indicated in the topic title, according to the list of aspects given for the topic category.

The summary for Document Set A should be a straightforward query-focused summary.
The update summary for Document Set B is also query-focused but should be written under the assumption that the user of the summary has already read the documents in Document Set A.

Each summary should cover all the aspects relevant to its category, and it may contain other relevant information as well. The categories, their aspects, and their numerical IDs are as follows:

1. Accidents and Natural Disasters:: 1.1 WHAT: what happened; 1.2 WHEN: date, time, other temporal placement markers; 1.3 WHERE: physical location; 1.4 WHY: reasons for accident/disaster; 1.5 WHO_AFFECTED: casualties (death, injury), or individuals otherwise negatively affected by the accident/disaster; 1.6 DAMAGES: damages caused by the accident/disaster; 1.7 COUNTERMEASURES: countermeasures, rescue efforts, prevention efforts, other reactions to the accident/disaster
2. Attacks (Criminal/Terrorist):: 2.1 WHAT: what happened; 2.2 WHEN: date, time, other temporal placement markers; 2.3 WHERE: physical location; 2.4 PERPETRATORS: individuals or groups responsible for the attack; 2.5 WHY: reasons for the attack; 2.6 WHO_AFFECTED: casualties (death, injury), or individuals otherwise negatively affected by the attack; 2.7 DAMAGES: damages caused by the attack; 2.8 COUNTERMEASURES: countermeasures, rescue efforts, prevention efforts, other reactions to the attack (e.g. police investigations)
3. Health and Safety:: 3.1 WHAT: what is the issue; 3.2 WHO_AFFECTED: who is affected by the health/safety issue; 3.3 HOW: how they are affected; 3.4 WHY: why the health/safety issue occurs; 3.5 COUNTERMEASURES: countermeasures, prevention efforts
4. Endangered Resources:: 4.1 WHAT: description of resource; 4.2 IMPORTANCE: importance of resource; 4.3 THREATS: threats to the resource; 4.4 COUNTERMEASURES: countermeasures, prevention efforts
5. Investigations and Trials (Criminal/Legal/Other):: 5.1 WHO: who is a defendant or under investigation; 5.2 WHO_INV: who is investigating, prosecuting, or judging; 5.3 WHY: general reasons for the investigation/trial; 5.4 CHARGES: specific charges to the defendant; 5.5 PLEAD: defendant's reaction to charges, including admission of guilt, denial of charges, or explanations; 5.6 SENTENCE: sentence or other consequences to defendant

The categories and aspects were developed based on model summaries from past DUC and TAC summarization tasks. Examples of model summaries from TAC 2008 and TAC 2009, which have been annotated with the above aspects, can be downloaded here:

Annotated example summaries

These examples are provided only to show possible quality and distribution of the aspects in a summary. Participants' summaries should not be annotated or tagged with the aspect labels.

Each summary should be well-organized, in English, using complete sentences. A blank line may be used to separate paragraphs, but no other formatting is allowed (such as bulleted points, tables, bold-face type, etc.). Each summary can be no longer than 100 words (whitespace-delimited tokens). Summaries over the size limit will be truncated.

Within a topic, the document sets must be processed in chronological order; i.e., the summarizer cannot look at documents in Set B when generating the summary for Set A. However, the documents within a document set can be processed in any order.

All processing of documents and generation of summaries must be automatic. No changes can be made to any component of the summarization system or any resource used by the system in response to the current year's test data. Participants can use the list of categories and aspects in their generation process. This is not obligatory, and participants who are unable or do not wish to use the provided categories should still be able to produce query-focused summaries in a way similar to previous years.

Submission format

Each team may submit up to two runs. NIST will evaluate all submitted runs.

A run will comprise exactly one file per summary, where the name of each summary file is the ID of its document set. Please include a file for each summary, even if the file is empty. Each file will be read and assessed as a plain text file, so no special characters or markups are allowed. The files must be in a directory whose name should be the concatenation of the Team ID and a number (1-2) for the run. (For example, if the Team ID is "SYSX" then the directory name for the first run should be "SYSX1".) Please package the directory in a tarfile and gzip the tarfile before submitting it to NIST.

Submission procedure

NIST will post the test data on the TAC Summarization web site on June 22, 2010 and results must be submitted to NIST by 11:59 p.m. (EDT) on July 6, 2010. Results are submitted to NIST using an automatic submission procedure. Details about the submission procedure will be emailed to the [email protected] mailing list before the test data is released. At that time, NIST will release a routine that checks for common errors in submission files including such things as invalid ID, missing summaries, etc. Participants may wish to check their runs with this script before submitting them to NIST because the automatic submission procedure will reject the submission if the script detects any errors.

Evaluation

All summaries will first be truncated to 100 words. NIST will then manually evaluate each submitted summary for:

Content (based on Columbia University's Pyramid method)
Readability/Fluency
Overall responsiveness

Content: Multiple model summaries will be used in the Pyramid evaluation of summary content. Each topic statement and its 2 document sets will be given to 4 different NIST assessors. For each document set, the assessor will create a 100-word model summary covering all the aspects listed for the topic category (if such information can be found in the documents). The assessor can also include other information relevant to the topic. The assessors will be guided by the following:

Assessor instructions for writing model summaries.

In the Pyramid evaluation, the assessor will first extract Summary Content Units (SCUs) from the 4 model summaries for the document set, sorting the SCUs into aspect bins (one bin per aspect of a given category). Each SCU is assigned a weight that is equal to the number of model summaries in which it appears. Once all SCUs have been harvested from the model summaries, the assessor will determine which of these SCUs can be found in each of the peer summaries that are to be evaluated. Repetitive information is not rewarded, as each SCU contained in the peer summary is counted only once. The final Pyramid score for a peer summary is the sum of the weights of SCUs contained in the summary, divided by the maximum sum of SCU weights possible for summary of average length (where the average length is determined by the mean SCU count of the model summaries for this document set). For additional details, see:

R.J. Passonneau, A. Nenkova, K. McKeown, and S. Sigelman, Applying the Pyramid Method in DUC 2005
Columbia University's 2006 web page on Pyramids

The Pyramid evaluation will be adapted to provide detailed scores on the level of each category and each aspect. This way, participants can find out about their system's performance in extracting different types of information.

Readability/Fluency: The assessor will give a readability/fluency score to each summary. The score reflects the fluency and readability of the summary (independently of whether it contains any relevant information) and is based on factors such as the summary's grammaticality, non-redundancy, referential clarity, focus, and structure and coherence.

Overall Responsiveness: The assessor will give an overall responsiveness score to each summary. The overall responsiveness score is based on both content (coverage of all required aspects) and readability/fluency.

Readability and Overall Responsiveness will each be judged on the following 5-point scale:

1	Very Poor
2	Poor
3	Barely Acceptable
4	Good
5	Very Good

TAC 2010 Workshop Presentations and Papers

Each team that submits runs for evaluation is requested to write a paper for the TAC 2010 proceedings that reports how the runs were produced (to the extent that intellectual property concerns allow) and any additional experiments or analysis conducted using TAC 2010 data. A draft version of the proceedings papers is distributed as a notebook to TAC 2010 workshop attendees. Participants who would like to give oral presentations of their papers at the workshop should submit a presentation proposal by September 26, 2010, and the TAC Advisory Committee will select the groups who will present at the workshop. Please see guidelines for papers and presentation proposals at http://tac.nist.gov/2010/reporting_guidelines.html.

Schedule

TAC 2010 Summarization Track Schedule
June 22	Release of test data (Guided task)
July 6	Deadline for participants' submissions (Guided task)
August 23	Release of test data (AESOP)
August 29	Deadline for participants' submissions (AESOP)
September 7	Release of individual evaluated results (Guided task, AESOP)
September 26	Deadline for TAC 2010 workshop presentation proposals
October 27	Deadline for systems' reports

BACK to TAC 2010 Summarization Track Homepage

Last updated:
Comments to: [email protected]