
TAC 2022 Tracks
RUFES
SM-KBP
Guidelines
Data
Tools
Schedule
Mailing List
Track Registration
TAC 2022 Workshop

|
|
Streaming Multimedia Knowledge Base Population (SM-KBP) 2022
Evaluation: January 2022 - January 2023
Workshop: February 3, 2023
Conducted by:
U.S. National Institute of Standards and Technology (NIST)
With support from:
U.S. Department of Defense
Background
In scenarios such as natural disasters or international conflicts,
analysts and the public are often confronted with a variety of
information coming through multiple media sources. There is a need for
technologies to analyze and extract knowledge from multimedia to
develop and maintain an understanding of events, situations, and
trends as they unfold around the world.
The goal of DARPA's Active Interpretation of Disparate Alternatives
(AIDA) Program is to develop a multi-hypothesis semantic engine that
generates explicit alternative interpretations of events, situations,
and trends from a variety of unstructured sources, for use in noisy,
conflicting, and potentially deceptive information environments. This
engine must be capable of mapping knowledge elements (KE)
automatically derived from multiple media sources into a common
semantic representation, aggregating information derived from those
sources, and generating and exploring multiple hypotheses about the
events, situations, and trends of interest.
The streaming multimedia KBP track evaluates the performance of
systems that have been developed in support of AIDA program goals.
Following a pilot at TAC/TRECVID 2018, the SM-KBP track has
evaluated AIDA systems over three phases of the program:
- Phase 1 Evaluation: June-August 2019
- Phase 2 Evaluation: August 2020 - January 2021
- Phase 3 Evaluation: January 2022 - January 2023
Task Overview
The SM-KBP track has three evaluation tasks:
- Task 1: Extract mentions of Knowledge Elements from a stream of
multimedia documents (including text and image) and cluster together
mentions of the same KE in each document to produce a document-level
knowledge graph for each document.
- Task 2: Aggregate and link the document-level knowledge graphs
from Task 1 to construct a KB of the entire document stream without
access to the raw documents themselves
- Task 3: Generate hypotheses from a knowledge graph from Task 2,
such that each hypothesis represents a semantically coherent
interpretation of the document stream.
While tasks 2 and 3 and limited to teams that are part of
DARPA's AIDA program, Tasks 1 is also open to non-AIDA researchers who are
interested in multilingual multimedia information extraction.
Ontology: Teams will receive an ontology that defines the
entities, relations, events, and event and relation roles and
arguments that are in scope for what systems should be able to
extract. The ontology for SM-KBP 2022 is the DARPA Wikidata (DWD),
which is an enhanced version of Wikidata that defines entities,
relations, events, and event/relation argument roles. Entity,
relation, and event KEs in the submitted knowledge graphs must be
limited to the types specified in DWD, and edge KEs in the submitted
knowledge graph must be labeled with argument role labels defined in
DWD. Wikidata is rich in Qnodes for entity classes and instances but
has an impoverished representation of event and relation classes.
Therefore, DWD adds an overlay of event and relation types to
Wikidata, including argument roles and selectional preferences for
those arguments.
Documents: Task 1 systems will process a set of
approximately 2000 documents in English, Spanish, and Russian, and
output a document-level knowledge graph for each document. A document
may contain multiple document elements in multiple modalities (text,
and image); therefore, cross-lingual and cross-modal entity,
relation, and event coreference are required. For each document,
systems must extract all mentions of entities, relations, and events
and identify all arguments and temporal information for each event and
relation.
Evaluation: System output will be
scored by comparing against gold standard annotations for a subset of
the documents.
Organizing Committee
Hoa Trang Dang (U.S. National Institute of Standards and Techonology)
George Awad (U.S. National Institute of Standards and Techonology)
Shahzad Rajput (U.S. National Institute of Standards and Techonology)
Wil Corvey (U.S. Department of Defense)
Jason Duncan (MITRE)
Lisa Ferro (MITRE)
Boyan Onyshkevych (U.S. Department of Defense)
Stephanie Strassel (Linguistic Data Consortium)
Jennifer Tracey (Linguistic Data Consortium)
|