Streaming Multimedia Knowledge Base Population (SM-KBP) 2018

Evaluation: February-November, 2018
Workshop: November 13-14, 2018

Conducted by:
U.S. National Institute of Standards and Technology (NIST)

With support from:
U.S. Department of Defense

Background

In scenarios such as natural disasters or international conflicts, analysts and the public are often confronted with a variety of information coming through multiple media sources. There is a need for technologies to analyze and extract knowledge from multimedia to develop and maintain an understanding of events, situations, and trends as they unfold around the world.

The goal of DARPA's Active Interpretation of Disparate Alternatives (AIDA) Program is to develop a multi-hypothesis semantic engine that generates explicit alternative interpretations of events, situations, and trends from a variety of unstructured sources, for use in noisy, conflicting, and potentially deceptive information environments. This engine must be capable of mapping knowledge elements (KE) automatically derived from multiple media sources into a common semantic representation, aggregating information derived from those sources, and generating and exploring multiple hypotheses about the events, situations, and trends of interest. This engine must establish confidence measures for the derived knowledge and hypotheses, based on the accuracy of the analysis and the coherence of the semantic representation of each hypothesis.

The streaming multimedia KBP track will assess the performance of systems that have been developed in support of AIDA program goals. Systems will be asked to extract knowledge elements from a stream of heterogeneous documents containing multilingual multimedia sources including text, speech, images, videos, and pdf files; aggregate the knowledge elements from multiple documents without access to the raw documents themselves (maintaining multiple interpretations and confidence values for KEs extracted or inferred from the documents); and develop semantically coherent hypotheses, each of which represents an interpretation of the document stream.

The SM-KBP tasks will be run at TAC/TRECVID 2018 as pilot evaluations whose goals are to test evaluation protocols and metrics and to learn lessons that can inform how subsequent evaluations will be structured. The purpose of the pilot is to exercise the evaluation infrastructure, not to test systems' performance. As such, the pilot intends to be flexible and at the same time to follow the protocol of the official evaluation. It is expected that the SM-KBP track will be run for 3 evaluation cycles after the initial pilot evaluation:

Pilot Evaluation: September-October 2018
Evaluation 1 (short cycle): March-April 2019
Evaluation 2 (18-month cycle): August-September 2020
Evaluation 3 (18-month cycle): March-April 2022

Overview

SM-KBP evaluation is over a small set of topics for a single scenario. There will be a new scenario and related set of languages for each evaluation cycle. For the 2018 pilot, the scenario is the Russian/Ukrainian conflict (2014-2015) and the scenario languages are English, Russian, and Ukrainian. Early in the evaluation cycle, all task participants will receive an ontology of entities, events, event arguments, relations, and SEC (sentiment, emotion, and cognitive state), defining the KEs that are in scope for the evaluation tasks. For the 2018 pilot, the ontology will be an extension of the DEFT Rich ERE entity, relation, and event types; a different (expanded) ontology is expected for subsequent evaluation cycles.

The SM-KBP track has three main evaluation tasks:

Task 1: Extraction of KEs and KE mentions from a stream of multi-media documents, including linking of mentions of the same KE within each document to produce a document-level knowledge graph for each document. Extraction and linking will be conditioned on two kinds of contexts:

a) generic background context
b) generic background context plus a "what if" hypothesis

Task 2: Construction of a KB by aggregating and linking document-level knowledge graphs produced by one or more Task 1 teams.
Task 3: Generation of hypotheses from KBs produced by one or more Task 2 teams.

Tasks 1a and 2 are open to all researchers who find the evaluation tasks of interest. Tasks 1b and 3 and limited to teams that are part of DARPA's AIDA program.

The source corpus for the pilot will comprise approximately 90K English, Russian, and Ukrainian documents. Systems in Task 1 will operate on the 90K documents in the source corpus; systems in Task 2 will operate on the output of one or more systems from Task 1a and will not have access to the source documents; systems in Task 3 will operate on the output of one or more systems from Task 2, and also will not have access to the source documents. There are many use cases in which analytic engines cannot have access to original documents; for example, provenance for an assertion may have never been recorded in the first place, or provenance may need to be redacted for legal or security reasons.

Novel characteristics of the open evaluation tasks (Task 1a and Task 2) include:

Task 1: Multimodal multilingual extraction and linking of information within a document
Task 1 and 2: Processing of streaming input
Task 1 and 2: Confidence estimation and maintenance of multiple possible interpretations
Task 2: Cross-document aggregation and linking of information without access to original documents

Novel characteristics of the AIDA program-internal evaluation tasks (Task 1b and Task 3) include:

Document-level extraction and linking conditioned on "feedback hypotheses" providing context.
Generation of semantically coherent hypotheses, each representing a different interpretation of the document stream.

Schedule

TAC SM-KBP 2018 Schedule (revised October 12, 2018)
July 15	Deadline for registration for track participation
September 10 - September 16	Task 1a Evaluation Window
September 17 - September 30	Task 2 Evaluation Window
September 17 - ~October 15	Queries applied to output of Task 1a
September 28 - ~October 4	Task 1b Evaluation Window
October 5 - ~October 15	Queries applied to output of Task 1b
October 5 - ~October 15	Queries applied to output of Task 2
October 5 - ~October 15	Task 3 Evaluation Window
October 16 - ~October 20	Queries applied to frozen output of Task 3
October 15	Deadline for short system descriptions
October 15	Deadline for workshop presentation proposals
October 20	Notification of acceptance of presentation proposals
November 1	Deadline for system reports (workshop notebook version)
Mid November	Release of partial preliminary evaluated results to participants
November 13-14	TAC 2018 workshop in Gaithersburg, Maryland, USA
March 1, 2019	Deadline for system reports (final proceedings version)

Mailing List

Join the sm-kbp group to subscribe yourself to the [email protected] mailing list (if not already subscribed):

sm-kbp group (click to join)

Registering to participate in a track does not automatically add you to the mailing list. If you were previously subscribed to the mailing list, you do not have to re-subscribe (the mailing list is for anyone interested in SM-KBP, rather than specifically for SM-KBP participants, and thus carries over from year to year).

Organizing Committee

Hoa Trang Dang (U.S. National Institute of Standards and Techonology)
Oleg Aulov (U.S. National Institute of Standards and Techonology)
George Awad (U.S. National Institute of Standards and Techonology)
Asad Butt (U.S. National Institute of Standards and Techonology)
Shahzad Rajput (U.S. National Institute of Standards and Techonology)
Jason Duncan (MITRE)
Boyan Onyshkevych (U.S. Department of Defense)
Stephanie Strassel (Linguistic Data Consortium)
Jennifer Tracey (Linguistic Data Consortium)

Last updated:
Comments to: [email protected]