TAC 2011 Knowledge Base Population (KBP2011) Track

Entity Linking Scoring by Javier Artiles (CUNY, javart@gmail.com)

In KBP2011, we have a new requirement for entity linking, which is to cluster NIL queries. For a set of query names with source documents, an entity linking system is required to: (1) judge whether each query can be linked to any KB node; (2) Cluster all queries with NIL KB entries into clusters.  Ultimately the system output can be viewed as a collection of various clusters; some clusters are labeled as KB node IDs. At the same time the answer key can also be viewed as a different collection of clusters.  Therefore we will apply a modified B-Cubed metric (called B-Cubed+) to evaluate these clusters.

Entity Linking Scorer: el_scorer.py v0.6

 

Temporal Slot Filling Scoring

V0.3 Download: (revised by Qi Li, CUNY, liqiearth@gmail.com): handle V labels

Archived V0.2:

Download (by Ralph Grishman)

Download2 (revised by Qi Li, CUNY, liqiearth@gmail.com) : added detailed break down scores for each slot type

Slot Filling Scoring (by Ralph Grishman)

In 2011 evaluation, a uniform scoring metric will be used, based on traditional measures of recall, precision, and F-measure, computed from counts of correct, missing, and spurious responses.  A non-NIL response is correct if it matches a verified non-NIL entry in the key (the human assessment file);  other non-NIL responses are spurious.  A NIL response where the key has a verified non-NIL response is considered missing. NIL system responses matching verified NIL entries in the key are not counted. For single-valued slots only a single system response will be accepted.  For list-valued slots, the verified non-NIL responses will be grouped into equivalence classes.  Multiple responses to a query must come from disjoint classes to be counted as correct;  other responses are counted as spurious.

To run, download SFScore.java (SFScore.java V1.2 ; SFScore.java V0.9)

javac SFScore.java
java SFScore response-file key-file [flags ...]

where the possible flags are

trace  -- print a line with assessment of each system response
anydoc -- judge response based only on answer string, ignoring doc id
nocase -- ignore case in matching answer string
slots=slotfile -- take list of entityId:slot pairs from slotfile
                 (otherwise list of pairs is taken from system response)

The slotfile controls which slots are evaluated;  if you want your system evaluated on all slots for which you generate an output, the "slots" parameter is not needed.  In that case it is important for your system to generate explicit NILs for slots it cannot fill.

As the key file, you can use one of the newly produced annotation files, or you can run UpdateSFKey (see the tools page) on the 2009 judgments file.

The anydoc and nocase flags are designed to make the scorer more useful for development by supporting soft match, but will not be used for official scoring.