KBP Participant Annotation Guidelines
April 1, 2010
To allow for better system tuning for the KBP Slot-Filling Task than
was possible last year,
the organizers are asking each site that wishes
to participate in the Slot-Filling evaluation to
manually prepare
responses for 6 entities -- 3 persons and 3 organizations. This
will complement
development data being prepared by LDC. This may
also be helpful to the overall evaluation in
raising questions
regarding the guidelines.
Participants submitting their annotations by May 1, 2010 will get access to the
annotations
prepared by all other participants.
To the extent possible, each entity will be assigned to two sites, who
can -- after submitting
their initial annotation -- compare results
and possibly submit a revised annotation. This
can produce
better annotations and a crude estimate of inter-annotator agreement.
Participants will be sent a list of 6 entities by the organizers.
When they have finished the
annotation for these entities, they should
send an email with their annotations to
Heng Ji <[email protected]> and will be sent the ftp
password; then they can upload
their annotations and download
others. When submitting a revised annotation do not delete the
file with your
original annotations.
When distributing the entities, the organizers are also providing an effective
name search tool
developed by
Zheng Chen to assist
annotation. This tool doesn't disambiguate entities, so if the
query name is ambiguous, the participants should be responsible to disambiguate
and return answers
for the most salient entity associated with the query.
Annotations should follow the Annotation Guidelines available through the KBP web
site. The
organizers have selected names which occur a few
hundred times in the corpus, allowing all the
relevant documents for an
entity to be scanned for potential slot fills in a few hours. The
organizers have not checked for alternative name spellings for the
entities. If there are
multiple equivalent fills for a slot, you
are only expected to provide one.
The annotations for the six entities should be prepared as a single
file whose name is the same
as the submission id (field 3, below), with
one line for each slot fill. Each line will
consist of eleven
tab-separated fields. Annotations should conform to the following
format
which is also being used by LDC for their training data and
their adjudication data. Any questions
regarding the format should be sent to
Ralph Grishman <[email protected]>.
field
|
field name
|
explanation
|
value for participant annotations
|
1
|
filler id
|
unique ID of this filler for
this file
|
1-based monotically increasing
integer
|
2
|
query id
|
entity id
|
provided by organizers
|
3
|
submission id
|
a unique id for the submission,
consisting of your site id followed by an integer, starting with 1 for
the first submission of training data and ncrementing thereafter if the
site submits any revisions
|
|
4
|
slot name
|
e.g., "per:title"
|
|
5
|
doc id
|
id of document containing
response, or "NIL" if the corpus contains no fill for this slot
|
|
6
|
starting offset
|
0-based character offset of
start of un-normalized response in document. Can leave "0" if not
using a tool which computes offsets.
|
0
|
7
|
ending offset
|
0-based character offset of end
of un-normalized response in
document. Can leave "0" if not using a tool which computes
offsets. |
0
|
8
|
un-normalized response
|
a string from the document. Any
newlines, linefeeds, or tabs contained in the selection will be
converted to a space character. No other whitespace
normalization will be done.
|
|
9
|
normalized response
|
a normalized response as
described in the annotation guidelines: a normalized date, or the
nominal form of a proper adjective (for some slots). If no
normalization is required, a copy of the un-normalized response.
|
|
10
|
equivalence class
|
provided for LDC adjudication
files,
to link different but equivalent responses
|
0
|
11
|
judgment
|
provided for LDC adjudication
files (1 => correct)
|
1
|
If field 5 is NIL (no fill for this slot), fields 6-9 should also be
NIL.
Here are 4 sample lines, courtesy of LDC:
1 SF11 LDC1 per:title CNN828-7.940923.LDC98T25 563 578 press secretary press secretary 0 1 2 SF11 LDC1 per:date_of_birth NIL NIL NIL NIL NIL 0 1 3 SF48 LDC1 per:date_of_death AFP_ENG_20021211.0447.LDC2007T07 663 669 Monday 2002-12-09 0 1 4 SF48 LDC1 per:country_of_death AFP_ENG_20021211.0447.LDC2007T07 673 678 Italy Italy 0 1
|