TAC 2009 Update Summarization Task Guidelines
(Also see general TAC 2009 policies and guidelines at http://tac.nist.gov/2009/)
Contents:
Overview
The TAC 2009 update summarization task is based on the following scenario:
A user is interested in a particular news story and wants
to track it as it develops over time, so she subscribes to a
news feed that sends her relevant articles as they are submitted from
various news services. However, either there's so much news that she
can't keep up with it, or she has to leave for a while and then wants
to catch up. Whenever she checks up on the news, it bothers her that
most articles keep repeating the same information; she would like to
read summaries that only talk about what's new or different.
The TAC 2009 update summarization task is to generate short fluent
multi-document summaries of news articles. For each topic,
participants are given a topic statement expressing the information
need of a user, and two chronologically ordered batches of articles
about the topic. Participants are asked to generate a 100-word summary
for each batch of articles, that addresses the information need of the
user. The summary of the second batch of articles should be written
under the assumption that the user has already read the earlier batch
of articles and should inform the user of new information about the
topic.
The 2009 task repeats the TAC 2008 update summarization task, with the following changes:
-
In 2008, many of the topics had documents that spanned a wide time
period. In 2009, NIST assessors have been more careful
about selecting relevant documents that are as close together in time
as possibles, subject
to the availability of relevant documents in the AQUAINT-2 document
collection.
-
In 2009, overall responsiveness is being evaluated on a 10-point scale rather than a 5-point scale. The extended scale is intended to give this metric greater discriminative power. Values on the 10-point scale can be mapped to a 5-point scale to allow comparison with past years' evaluations.
The test data for the update summarization task will be available
on the TAC 2009 Summarization Track home page on
July 1, 2009. Submissions are due at NIST on or before July 15. Each
team may submit up to two runs (submissions), and all runs will be
judged. Runs must be fully automatic.
Test Data
The test dataset is composed of 44 topics. Each topic has a topic
statement (title and narrative) and 20 relevant documents which have
been divided into 2 sets: Document Set A and Document Set B. Each
document set has 10 documents, and all the documents in Set A
chronologically precede the documents in Set B.
Test topic statements and document sets will be distributed by NIST via the
TAC 2009 Summarization web page. Teams will need to use their TAC 2009 Team ID and Team Password to download data and submit results through the NIST web site. To activate the TAC 2009 team ID and password for the summarization track, teams must submit the following forms to NIST, even if these forms were already submitted in previous TAC cycles.
- Agreement Concerning Dissemination of TAC Results
- AQUAINT-2 Organization form
When submitting forms, please also include the TAC 2009 team ID, the email address of the main TAC 2009 contact person for the team, and a comment saying that the form is from a TAC 2009 registered participant.
Documents
The documents for summarization come from the AQUAINT-2
collection of news articles. The AQUAINT-2 collection is a subset of the LDC English Gigaword Third Edition (LDC catalog number LDC2007T07) and comprises approximately 2.5 GB of text (about 907K documents) spanning the time period of October 2004 - March 2006. Articles are in English and come from a variety of sources including Agence France Presse, Central News Agency (Taiwan), Xinhua News Agency, Los Angeles Times-Washington Post News Service, New York Times, and the Associated Press. Each document has an ID consisting of a source code, a date when the document was delivered to LDC, and 4 digits to differentiate documents that come from the same source on the same date; for example, document NYT_ENG_20050311.0029 was received from the New York Times on March 11, 2005.
Test Data Format
The topic statements and documents will be in the same format as the TAC 2008 Update Summarization topic statements and documents (sample given below):
Submission guidelines
System task
Given a topic, the task is to write 2 summaries (one for Document Set A
and one for Document Set B) that address the information need expressed in the
corresponding topic statement.
- The summary for Document Set A should be a straightforward query-focused summary.
- The update summary for Document Set B is also query-focused but should be written under the assumption that the user of the summary has already read the documents in Document Set A.
Each summary should be well-organized, in English, using complete
sentences. A blank line may be used to separate paragraphs, but
no other formatting is allowed (such as bulleted points, tables,
bold-face type, etc.). Each summary can be
no longer than 100 words (whitespace-delimited tokens). Summaries over
the size limit will be truncated.
Within a topic, the document sets must be processed in
chronological order; i.e., the summarizer cannot look at documents in Set
B when generating the summary for Set A. However, the documents
within a document set can be processed in any order.
All processing of documents and generation of summaries must be
automatic. No changes can be made to any component of the
summarization system or any resource used by the system in response to
the current year's test data.
Submission format
Each team may submit up to two runs. NIST will evaluate all submitted runs.
A run will comprise exactly
one file per summary, where the name of each summary file is the ID of
its document set. Please include a file for each summary, even if the
file is empty. Each file will be read and assessed as a plain text
file, so no special characters or markups are allowed. The files must
be in a directory whose name should be the concatenation of the Team
ID and a number (1-2) for the run. (For example, if the Team ID is "SYSX"
then the directory name for the first run should be "SYSX1".)
Please package the directory in a tarfile and gzip the tarfile before
submitting it to NIST.
Submission procedure
NIST will post the test data on the TAC Summarization web site on July
1, 2009 and results will have to be submitted to NIST by 11:59 p.m. (EDT) on
July 15, 2009. Results are submitted to NIST using an
automatic submission procedure. Details about the submission
procedure will be emailed to the [email protected] mailing list before
the test data is released. At that time, NIST will release a routine
that checks for common errors in submission files including such
things as invalid ID, missing summaries, etc. Participants may wish to
check their runs with this script before submitting them to NIST
because the automatic submission procedure will reject the submission
if the script detects any errors.
Evaluation
All summaries will first be truncated to 100 words. NIST will then manually evaluate each submitted summary for:
- Content (using Columbia University's Pyramid method)
- Readability/Fluency
- Overall responsiveness
Content:
Multiple model summaries will be used in the Pyramid evaluation of summary content.
Each topic statement and its 2 document sets will be given to 4
different NIST assessors. For each document set, the assessor will
create a 100-word model summary that addresses the information need
expressed in the topic statement. The assessors will be guided by the
following:
In the Pyramid evaluation,
the assessor will first extract Summary Content Units (SCUs) from the 4 model summaries
for the document set. Each SCU is assigned a weight that is equal to the number of model summaries in which it appears. Once all SCUs have been harvested from the model summaries, the assessor will determine
which of these SCUs can be found in each of the peer summaries that are to be evaluated. Repetitive information is not rewarded, as each SCU contained in the peer summary is counted only once. The final Pyramid score for a peer summary is the sum of the weights of
SCUs contained in the summary, divided by the maximum sum of SCU weights possible for summary of average length (where the average length is determined
by the mean SCU count of the model summaries for this document set). For additional details, see:
Readability/Fluency: The assessor will give a readability/fluency score to each summary. The score reflects the fluency and readability of the summary (independently of whether it contains any information that responds to the topic statement) and is based on factors such as the summary's grammaticality, non-redundancy, referential clarity, focus, and structure and coherence.
Overall Responsiveness: The assessor will give an overall responsiveness score to each summary. The overall responsiveness score is based on both content and readability/fluency.
Readability and Overall Responsiveness will each be judged on the following 10-point scale:
1-2 | Very Poor |
3-4 | Poor |
5-6 | Barely Acceptable |
7-8 | Good |
9-10 | Very Good |
TAC 2009 Workshop Presentations and Papers
Each team that submits runs for evaluation is requested to write a paper for the TAC 2009 proceedings that reports how the runs were produced (to the extent that intellectual property concerns allow) and any additional experiments or analysis conducted using TAC 2009 data. A draft version of the proceedings papers is distributed as a notebook to TAC 2009 workshop attendees. Participants who would like to give oral presentations of their papers at the workshop should submit a presentation proposal by September 25, 2009, and the TAC Advisory Committee will select the groups who will present at the workshop. Please see guidelines for papers and presentation proposals at http://tac.nist.gov/2009/reporting_guidelines.html.
Schedule
TAC 2009 Update Summarization Task Schedule |
July 1 | Release of test data |
July 15 | Deadline for participants' submissions |
September 4 | Release of individual evaluated results |
September 25 | Deadline for TAC 2009 workshop presentation proposals |
October 22 | Deadline for systems' reports |