KBP Participant Annotation Guidelines

April 1, 2010

To allow for better system tuning for the KBP Slot-Filling Task than was possible last year,
the organizers are asking each site that wishes to participate in the Slot-Filling evaluation to
manually prepare responses for 6 entities -- 3 persons and 3 organizations.  This will complement
development data being prepared by LDC.  This may also be helpful to the overall evaluation in
raising questions regarding the guidelines.

Participants submitting their annotations by May 1, 2010 will get access to the annotations
prepared by all other participants.

To the extent possible, each entity will be assigned to two sites, who can -- after submitting
their initial annotation -- compare results and  possibly submit a revised annotation.  This
can produce better annotations and a crude estimate of inter-annotator agreement.

Participants will be sent a list of 6 entities by the organizers.  When they have finished the
annotation for these entities, they should send an email with their annotations to
Heng Ji <hengji@cs.qc.cuny.edu> and will be sent the ftp password;  then they can upload
their annotations and download others.  When submitting a revised annotation do not delete the
file with your original annotations.

When distributing the entities, the organizers are also providing an effective name search tool
developed by Zheng Chen to assist annotation. This tool doesn't disambiguate entities, so if the
query name is ambiguous, the participants should be responsible to disambiguate and return answers
for the most salient entity associated with the query.

Annotations should follow the Annotation Guidelines available through the KBP web site.  The
organizers have selected names which occur a few hundred times in the corpus, allowing all the
relevant documents for an entity to be scanned for potential slot fills in a few hours.  The
organizers have not checked for alternative name spellings for the entities.  If there are
multiple equivalent fills for a slot, you are only expected to provide one.

The annotations for the six entities should be prepared as a single file whose name is the same
as the submission id (field 3, below), with one line for each slot fill.  Each line will
consist of eleven tab-separated fields.  Annotations should conform to the following format
which is also being used by LDC for their training data and their adjudication data. Any questions
regarding the format should be sent to Ralph Grishman <grishman@cs.nyu.edu>.

field
field name
explanation
value for participant annotations
1
filler id
unique ID of this filler for this file
1-based monotically increasing integer
2
query id
entity id
provided by organizers
3
submission id
a unique id for the submission, consisting of your site id followed by an integer, starting with 1 for the first submission of training data and ncrementing thereafter if the site submits any revisions

4
slot name
e.g., "per:title"

5
doc id
id of document containing response, or "NIL" if the corpus contains no fill for this slot

6
starting offset
0-based character offset of start of un-normalized response in document.  Can leave "0" if not using a tool which computes offsets.
0
7
ending offset
0-based character offset of end of un-normalized response in document.  Can leave "0" if not using a tool which computes offsets. 0
8
un-normalized response
a string from the document. Any newlines, linefeeds, or tabs contained in the selection will be converted to a space character.    No other whitespace normalization will be done.
9
normalized response
a normalized response as described in the annotation guidelines: a normalized date, or the nominal form of a proper adjective (for some slots).  If no normalization is required, a copy of the un-normalized response.

10
equivalence class
provided for LDC adjudication files,
to link different but equivalent responses
0
11
judgment
provided for LDC adjudication files (1 => correct)
1

If field 5 is NIL (no fill for this slot), fields 6-9 should also be NIL.

Here are 4 sample lines, courtesy of LDC:
1	SF11	LDC1	per:title	CNN828-7.940923.LDC98T25	563	578	press secretary	press secretary	0	1
2 SF11 LDC1 per:date_of_birth NIL NIL NIL NIL NIL 0 1
3 SF48 LDC1 per:date_of_death AFP_ENG_20021211.0447.LDC2007T07 663 669 Monday 2002-12-09 0 1
4 SF48 LDC1 per:country_of_death AFP_ENG_20021211.0447.LDC2007T07 673 678 Italy Italy 0 1