Knowledge Base Population (KBP2011) Track
Entity Linking Scoring by Javier Artiles (CUNY, firstname.lastname@example.org)
In KBP2011, we have a new requirement for entity linking, which is to cluster NIL queries. For a set of query names with source documents, an entity linking system is required to: (1) judge whether each query can be linked to any KB node; (2) Cluster all queries with NIL KB entries into clusters. Ultimately the system output can be viewed as a collection of various clusters; some clusters are labeled as KB node IDs. At the same time the answer key can also be viewed as a different collection of clusters. Therefore we will apply a modified B-Cubed metric (called B-Cubed+) to evaluate these clusters.
Entity Linking Scorer: el_scorer.py v0.6
Temporal Slot Filling Scoring
V0.3 Download: (revised by Qi Li, CUNY, email@example.com): handle V labels
Download (by Ralph
Download2 (revised by Qi Li, CUNY, firstname.lastname@example.org) : added detailed break down scores for each slot type
In 2011 evaluation, a uniform scoring metric will be used, based on
traditional measures of recall, precision, and F-measure, computed from
counts of correct, missing, and spurious responses.
A non-NIL response is correct if it matches a
verified non-NIL entry in the key (the human assessment file);
other non-NIL responses are spurious. A
NIL response where the key has a verified non-NIL
response is considered missing. NIL system responses matching verified
NIL entries in the key are not counted. For
single-valued slots only a single system response will be accepted. For list-valued slots, the verified non-NIL
responses will be grouped into equivalence classes.
Multiple responses to a query must come from
disjoint classes to be counted as correct; other
counted as spurious.
To run, download SFScore.java (SFScore.java V1.2 ; SFScore.java V0.9)
java SFScore response-file
key-file [flags ...]
where the possible flags are
trace -- print a line
assessment of each
-- judge response based only on answer
string, ignoring doc id
-- ignore case in matching answer string
slots=slotfile -- take list of
pairs from slotfile
of pairs is taken from system response)
The slotfile controls which slots are evaluated; if
you want your system evaluated on all slots for which you generate an
output, the "slots" parameter is not needed. In that case it is
important for your system to generate explicit NILs for slots it cannot
As the key file, you can use one of
the newly produced annotation files, or you can run UpdateSFKey (see
the tools page)
on the 2009 judgments file.
The anydoc and nocase flags are
designed to make the scorer more useful for development by supporting
soft match, but will not be used for official scoring.