TAC 2008 Update Summarization Task Guidelines
Contents:
I. Overview
The goal of the TAC Summarization track is to foster research on systems
that produce summaries of documents. The focus is on systems that
can produce well-organized, fluent summaries of text.
Piloted in DUC
2007, the TAC 2008 Update Summarization task is to generate short
(~100 words) fluent multi-document summaries of
news articles under the assumption that the user has already read
a set of earlier articles. The purpose of each update summary will be
to inform the reader of new information about a particular topic. The
content of each submitted summary will be evaluated against multiple model
summaries based on Columbia University's Pyramid
method.
The test data for the update summarization task will be
available on the TAC 2008 Summarization home
page on July 1. Submissions are due at NIST on or before
July 11, 2008. Each team may submit up to three runs
(submissions) for the update summarization pilot task, ranked by
priority. NIST will judge the first- and second-priority run from each team and
(if resources allow) up to one additional run from each team. Runs must
be fully automatic.
Task scenario
In the scenario for the update summarization task, you will be writing
summaries to meet the information needs of various users. Assume that each
user is an educated, adult US native who is aware of current events as
they appear in the news. The user is interested in a particular news story and wants
to track it as it develops over time, so he subscribes to a
news feed that sends him relevant articles as they are submitted from
various news services. However, either there's so much news that he
can't keep up with it, or he has to leave for a while and then wants
to catch up. Whenever he checks up on the news, it bothers him that
most articles keep repeating the same information; he would like to
read summaries that only talk about what's new or different.
In the scenario, a user initially gives you a topic
statement (title and narrative) expressing his information need. News
articles about the story then arrive in batches over time, and
you are asked to write a summary for each batch of articles, that
addresses the information need of the user.
II. Test Data
The test dataset comprises approximately 48 topics. Each topic has a topic
statement (title and narrative) and 20 relevant documents which have
been divided into 2 sets: Document Set A and Document Set B. Each
document set has 10 documents, where all the documents in Set A
chronologically precede the documents in Set B. The documents will
come from the AQUAINT-2
collection of news articles.
The topic statements and documents will be in the format given in the sample topic statements and sample documents below:
Test topic statements and document sets will be distributed by NIST via the
TAC 2008 Summarization web page. Only TAC
2008 participants who have completed both the Agreement Concerning
Dissemination of TAC Results and the AQUAINT-2 Organization forms will be
allowed access to the test topics and AQUAINT-2 documents.
III. Submission guidelines
Submission format
System task: For each topic, you will write 2 summaries (one for Set A
and one for Set B) that address the information need expressed in the
corresponding topic statement.
- The summary for Set A should be a straightforward query-focused summary.
- The update summary for Set B is also query-focused but should be written under the assumption that the user of the summary has already read the documents in Set A.
Each summary should be well-organized, in English, using complete
sentences. A blank line may used to separate paragraphs, but
no other formatting is allowed (such as bulleted points, tables,
bold-face type, etc.). Each summary can be
no longer than 100 words (whitespace-delimited tokens). Summaries over
the size limit will be truncated.
Within a topic, the document sets must be processed in
chronological order; i.e., you cannot look at documents in Set
B when generating the summary for Set A. However, the documents
within a document set can be processed in any order.
A submission to the update summarization task will comprise exactly
one file per summary, where the name of each summary file is the ID of
its document set. Please include a file for each summary, even if the
file is empty. Each file will be read and assessed as a plain text
file, so no special characters or markups are allowed. The files must
be in a directory whose name should be the concatenation of the Team
ID and the priority of the run. (For example, if the Team ID is "SYSX"
then the directory name for the first-priority run should be "SYSX1".)
Please package the directory in a tarfile and gzip the tarfile before
submitting it to NIST.
Submission procedure
Each team may submit up to three runs, ranked by priority (1-3).
NIST will evaluate the first- and second-priority runs from each team. If resources
allow, NIST will evaluate one additional run from each team.
NIST will post the test data on the TAC Summarization web site on July
1 and results will have to be submitted to NIST by 11:59 p.m. (EDT) on
July 11, 2008. Results are submitted to NIST using an
automatic submission procedure. Details about the submission
procedure will be emailed to the [email protected] mailing list before
the test data is released. At that time, NIST will release a routine
that checks for common errors in submission files including such
things as invalid ID, missing summaries, etc. Participants should
check their runs with this script before submitting them to NIST
because the automatic submission procedure will reject the submission
if the script detects any errors.
All processing of documents and generation of summaries must be
automatic. No changes can be made to any component of the
summarization system or any resource used by the system in response to
the current year's test data.
IV. Evaluation
All summaries will first be truncated to 100 words. Where sentences
need to be identified for automatic evaluation, NIST will then use a
simple Perl
script for sentence segmentation.
Model summaries: NIST will conduct a manual evaluation of
summary content based on the Pyramid
Method. Multiple model summaries will be used in the evaluation.
Each topic statement and its 2 document sets will be given to 4
different NIST assessors. For each document set, the assessor will
create a 100-word summary that addresses the information need
expressed in the topic statement. The assessors will be guided by the
following:
In addition to the Pyramid evaluation, the assessor will give an overall responsiveness score to each summary. The overall responsiveness score will reflect both content and readability and will be judged on the following scale:
- Very Poor
- Poor
- Barely Acceptable
- Good
- Very Good
Evaluation Tools:
V. Schedule
TAC 2008 Update Summarization Task Schedule |
July 1 | Release of test data |
July 14, 6:00 AM (EDT) | EXTENDED Deadline for participants' submissions |
late August | Release of individual evaluated results |