Go to the NIST home page Go to the TAC Home Page TAC Banner

Return to TAC Homepage

TAC 2022 Tracks
  RUFES
  SM-KBP
      Guidelines
      Data
      Tools
      Schedule
      Mailing List
Track Registration
TAC 2022 Workshop




Streaming Multimedia Knowledge Base Population (SM-KBP) 2022

Evaluation: January 2022 - January 2023
Workshop: February 3, 2023

Conducted by:
U.S. National Institute of Standards and Technology (NIST)

With support from:
U.S. Department of Defense

Background

In scenarios such as natural disasters or international conflicts, analysts and the public are often confronted with a variety of information coming through multiple media sources. There is a need for technologies to analyze and extract knowledge from multimedia to develop and maintain an understanding of events, situations, and trends as they unfold around the world.

The goal of DARPA's Active Interpretation of Disparate Alternatives (AIDA) Program is to develop a multi-hypothesis semantic engine that generates explicit alternative interpretations of events, situations, and trends from a variety of unstructured sources, for use in noisy, conflicting, and potentially deceptive information environments. This engine must be capable of mapping knowledge elements (KE) automatically derived from multiple media sources into a common semantic representation, aggregating information derived from those sources, and generating and exploring multiple hypotheses about the events, situations, and trends of interest.

The streaming multimedia KBP track evaluates the performance of systems that have been developed in support of AIDA program goals. Following a pilot at TAC/TRECVID 2018, the SM-KBP track has evaluated AIDA systems over three phases of the program:

  • Phase 1 Evaluation: June-August 2019
  • Phase 2 Evaluation: August 2020 - January 2021
  • Phase 3 Evaluation: January 2022 - January 2023

Task Overview

The SM-KBP track has three evaluation tasks:

  • Task 1: Extract mentions of Knowledge Elements from a stream of multimedia documents (including text and image) and cluster together mentions of the same KE in each document to produce a document-level knowledge graph for each document.
  • Task 2: Aggregate and link the document-level knowledge graphs from Task 1 to construct a KB of the entire document stream without access to the raw documents themselves
  • Task 3: Generate hypotheses from a knowledge graph from Task 2, such that each hypothesis represents a semantically coherent interpretation of the document stream.

While tasks 2 and 3 and limited to teams that are part of DARPA's AIDA program, Tasks 1 is also open to non-AIDA researchers who are interested in multilingual multimedia information extraction.

Ontology: Teams will receive an ontology that defines the entities, relations, events, and event and relation roles and arguments that are in scope for what systems should be able to extract. The ontology for SM-KBP 2022 is the DARPA Wikidata (DWD), which is an enhanced version of Wikidata that defines entities, relations, events, and event/relation argument roles. Entity, relation, and event KEs in the submitted knowledge graphs must be limited to the types specified in DWD, and edge KEs in the submitted knowledge graph must be labeled with argument role labels defined in DWD. Wikidata is rich in Qnodes for entity classes and instances but has an impoverished representation of event and relation classes. Therefore, DWD adds an overlay of event and relation types to Wikidata, including argument roles and selectional preferences for those arguments.

Documents: Task 1 systems will process a set of approximately 2000 documents in English, Spanish, and Russian, and output a document-level knowledge graph for each document. A document may contain multiple document elements in multiple modalities (text, and image); therefore, cross-lingual and cross-modal entity, relation, and event coreference are required. For each document, systems must extract all mentions of entities, relations, and events and identify all arguments and temporal information for each event and relation.

Evaluation: System output will be scored by comparing against gold standard annotations for a subset of the documents.

Organizing Committee

Hoa Trang Dang (U.S. National Institute of Standards and Techonology)
George Awad (U.S. National Institute of Standards and Techonology)
Shahzad Rajput (U.S. National Institute of Standards and Techonology)
Wil Corvey (U.S. Department of Defense)
Jason Duncan (MITRE)
Lisa Ferro (MITRE)
Boyan Onyshkevych (U.S. Department of Defense)
Stephanie Strassel (Linguistic Data Consortium)
Jennifer Tracey (Linguistic Data Consortium)


NIST is an agency of the
U.S. Department of Commerce

privacy policy / security notice / accessibility statement
disclaimer
FOIA

Last updated: Wednesday, 14-Dec-2022 12:44:50 MST
Comments to: tac-web@nist.gov