TAC 2017 Cold Start KB Track

Overview

The Cold Start KB track builds a knowledge base from scratch, using a predefined KB schema and a collection of unstructured text. The KB schema for Cold Start 2017 consists of:

Entities: entities and entity mentions as defined in the main task of the EDL track
SF Relations: entity attributes ("slots") as defined in the SF track.
Events: events (hoppers) and event nuggets, as defined in the EN track
Event Arguments: event arguments, as defined in the EAL track
Sentiment: Sentiment from a source entity toward a target entity, as defined in the BeSt track.

The submitted Cold Start KBs are evaluated by both a composite query-based evaluation, and a set of component evaluations. The composite KB evaluation applies a set of Cold Start evaluation queries to each KB and assesses the correctness of the events, sentiment sources and targets, and SF slot fillers found.

The component evaluations are implemented by projecting out the individual components from the submitted KB, and evaluating each component output file as though it had been submitted directly to the standalone track for that component.

The following component files are projected from each submitted Cold Start KB:

EDL: An EDL file consisting of name and nominal mentions and links for PER, ORG, GPE, FAC, and LOC entities from the "core" documents. Links can be to either a node in the reference KB (TAC KBP Knowledge Base II - BaseKB) or (if the entity does not exist in the reference KB) a NIL node corresponding to an entity node in the submitted KB.
Slot Filling: An SF file consisting of slot fillers and justifications found in the KB by applying Cold Start evaluation queries that involve only SF predicates.
Event Nugget Detection and Coreference: An EN file consisting of event mentions and within-document coreference from the "core" documents.
Event Argument and Linking: A set of "arguments" files, each file consisting of event argument assertions (including justifications) from a "core" document; a set of "linking" files, each file consisting of coreference of assertions in the corresponding "arguments" file.
Sentiment: A set of predicted ERE xml files, each file consisting of name, nominal, and pronominal mentions and coreference for PER, ORG, GPE, FAC, and LOC entities from a "core" document; a set of BeSt xml files, each file consisting of sentiment (including provenance) from a source towards a target entity in the corresponding predicted ERE file.

The standalone EDL, EN, EAL, and BeSt tasks are evaluated using gold standard annotations on a common set of approximately 500 "core" documents, and are described fully on their respective track home pages. Below, we focus on the composite Cold Start KB Construction task and the component SF task, which are evaluated using post-submission assessment of responses to Cold Start evaluation queries.

Tasks

The Cold Start KB schema contains typed entity, event and string nodes; various kinds of mentions for each node; and SF predicates, sentiment predicates, and event predicates that can connect nodes in the KB.

Given a collection of approximately 90K English, Chinese, and Spanish documents, the Cold Start KB Construction system must find all entities, SF relations, events, event arguments, and sentiment (towards entities) that conform with the Cold Start KB schema, and output a KB file consisting of one assertion per line, where each assertion is a subject-predicate-object triple that is augmented with provenance and a confidence value. If a KB includes multiple assertions involving the same subject-predicate-object triple (but with different provenance), the assertion with the highest confidence value will be assessed, and additional assertions with lower confidence value will be assessed as resources permit; it is expected that approximately 3 assertions (each with different justifications) will be assessed for each subject-predicate-object triple involving SF, sentiment, or event predicates.

Each Cold Start KB undergoes a composite KB evaluation, in which a set of Cold Start evaluation queries is applied to the KB and the responses are assessed. A Cold Start evaluation query contains a name mention of an entity in the document collection (an "entry point"), and a sequence of one or more SF, sentiment, or event predicates (e.g., "per:date_of_birth", "org:is_liked_by", "per:conflict.attack_attacker"). The entry point selects a single corresponding entity node in the KB, and the sequence of predicates is followed to arrive at a set of terminal objects at the end of the sequence. The terminal objects are then assessed and scored as in the traditional English slot filling task. For example, a typical query may ask "What are the ages of the siblings of the Bart Simpson mentioned in Document 42?" or "What attack events have an attacker who is disliked by the Marge Simpson mentioned in Document 9?" Such "two-hop" queries will verify that the knowledge base is well-formed in a way that goes beyond the component tasks of entity discovery and linking, slot filling, event nugget detection and coreference, event argument extraction and linking, and sentiment detection.

Each evaluation query may have multiple entry points (i.e., multiple mentions of the same entity), in order to mitigate cascaded errors caused by submitted KBs that are not able to link every name mention to a KB entity node.

Systems partipating in the Cold Start KB Construction task also undergo a set of component evaluations, along the following dimensions:

Entity Discovery and Linking: Cold Start KB systems and Trilingual Entity Discovery and Linking systems are evaluated on the EDL dimension using the EDL annotations in the same set of "core" documents.
Slot Filling: Cold Start KB systems and SF systems are evaluated on the slot filling dimension using the same set of slot filling queries (which is the subset of Cold Start evaluation queries that involve only SF predicates).
Event Nugget Detection and Coreference: Cold Start KB systems and Event Nugget Detection and Coreference systems are evaluated on the event nugget dimension using the EN Detection and Coreference annotations in the same set of "core" documents. Only within-document coreference will be evaluated.
Event Argument Extraction and Linking: Cold Start KB systems and Event Arguments systems are evaluated on the event argument dimension using the EAL annotations in the same set of "core" documents. Only event argument and within-document linking will be evaluated.
Sentiment: Cold Start KB systems and BeSt systems are evaluated on the sentiment dimension using the BeSt annotations in the same set of "core" documents. For the purposes of comparing Cold Start systems with standalone BeSt systems, only sentiment from entities towards entities will be considered.

The Cold Start Slot Filling (SF) task removes the requirement that an entire text collection must be processed. Instead, Cold Start SF participants will receive the Cold Start evaluation queries that involve only SF predicates, and need only produce those entities and relations that would be found by the queries. A TAC slot filling system can easily be applied to this task by running initially from each evaluation query entry point, then recursively applying the system to the identified slot fillers. The justification spans for a single justification must come from the same document but (unlike CSKB systems) SF systems must return only a single (highest confidence) justification for each subject-predicate-object triple.

The 2017 Cold Start KB Construction tasks differs from the the 2017 tasks in the following significant ways:

Entities in the Cold Start KB should be linked to the same reference KB as in the EDL track, using a "link" predicate.
To support the evaluation of sentiment under the evaluation framework of the component BeSt track, the Cold Start KB is allowed to include pronominal_mention for entities, since a pronominal_mention of a sentiment target is sometimes what is required as provenance for a sentiment; pronominal_mention is not included in the EDL output file projected from the KB, and is not evaluated in the EDL component evaluation.
In addition to Entity nodes, the 2017 Cold Start KB will have Event nodes and String nodes.

An Event node contains all event mentions (nuggets) that refer to the same event; the event node is the same as a Rich ERE event hopper, except that event nuggets must be coreferenced across documents, and not just within a single document.
A String node allows the KB to group together different strings that represent the same predicate argument (e.g., "cardiac arrest" and "heart attack"). String valued arguments of predicates must be specified as a STRING node rather than just a quoted string.

In addition to SF predicates, the 2017 Cold Start KB will have event predicates and sentiment predicates.
The Cold Start KB should attempt to include more than one justification (if available in the Evaluation Source Corpus) for each triple in the KB when the triple involves an SF, event, or sentiment predicate.

For each justification, all justification spans must come from a single document. Credit for a correct triple will be proportional to the number of different documents in the set of correct justifications returned for that triple.

Mean Average Precision (MAP) will be the primary metric for the composite KB evaluation.

Track Coordinators

Hoa Dang (National Institute of Standards and Technology, [email protected])
Shahzad Rajput (National Institute of Standards and Technology, [email protected])

Last updated:
Comments to: [email protected]