TAC 2023 Tracks
  PLABA
  RUFES
  CRUX
      Guidelines
      Data
      Tools
      Schedule
      Mailing List
      Reading List
Call for Participation
Track Registration
Reporting Guidelines
TAC 2023 Workshop

CRUX 2023 Data

Data that are required for the TAC 2023 tracks are distributed at no cost to track participants. Whenever possible, data are distributed by NIST or the Linguistic Data Consortium via Web download; data are mailed as physical disks only if they cannot be made available for download.

Access to TAC 2023 data is restricted to registered TAC 2023 participants who have submitted all the required User Agreement forms. Each participating team will provide a TAC 2023 Team ID and will receive a Team Password upon registration. Teams that have participated in past TAC evaluations must register and obtain a new TAC 2023 Team ID and password for 2023. Teams must also submit new (current) User Agreement forms for 2023. Once the required User Agreement forms have been received for that team, NIST will activate the TAC 2023 Team ID and Password to give access to password-protected TAC 2023 resources that are distributed by NIST (a different username and password are required for data that are downloaded from LDC's intranet).

Data resources available for CRUX 2023 system development are listed in the 2023 Text Analysis Conference (TAC) KBP Evaluation License Agreement and must be requested by name and catalog number from the LDC upon registration. When additional data become available for the CRUX track, they will automatically be distributed to TAC 2023 teams registered for the track.

The following development data and evaluation data are currently available to CRUX 2023 teams:

Development Data:

LDC2021E11: AIDA Phase 3 Practice Topic Source Data V2.0 (Download from LDC)
LDC2021E16: AIDA Phase 3 TA3 Practice Topic Annotation V5.1 (Download from LDC)
DWD snapshots of Wikidata:

wikidata-20210215-dwd-v2: https://drive.google.com/drive/folders/1OIZegxxrs_Hv2ZhDsSO-zLVARCR60P01?usp=sharing (DWD V2 is the version on which the KGTK similarity service is built)
wikidata-20220623-dwd-v6: https://drive.google.com/drive/folders/1a6cUI1UEWRTNbvqtLAfJU0wEJ4ssTqdz?usp=sharing

DWD overlay (defining event types and relation types, along with their argument roles and selectional preferences):

xpo_v5.1a.json (released July 20, 2023, based on Qnodes and Pnodes in the live Wikidata; most similar DWD version would be DWD V6)

Evaluation Data:

LDC2023E10: SMKBP 2023 Claim Frame Evaluation Source Data (Download from LDC)
CRUX 2023 Evaluation Topics: A tab-separated file with the topic, subtopic, and claim_template, for 3 new evaluation topics (evaluation topics are not included in LDC2021E16).
CRUX 2023 Task 1 Evaluation Document IDs (root_uids): 250 root_uids (one uid per line) for source documents in LDC2023E10, from which Task 1 systems should extract claim frames for the 3 evaluation topics in CRUX 2023 Evaluation Topics.
CRUX 2023 Task 1 Evaluation KBs: 250 comma-separated files (one file per CRUX 2023 Task 1 Evaluation Document root uid) with the fields "entID" and "entity_mention". The claim frames extracted from a source document must populate the X Variable, Claimer, Claimer Affiliation, Claim Location, and Claim Medium fields using an entity_mention from the KB for that document. Each KB has a default entity, "EntID_200,default: document author", which is the author of the source document and can be useful for populating the Claimer field of a claim frame.

CRUX2023 Task 2 Evaluation claim frames (request data from track coordinators after you are done submitting all of your Task 1 runs): Gold standard claim frames for the 3 evaluation topics and 250 evaluation documents, with one claim frame per line. Each claim frame has the following columns from the gold standard claim_frame.tabs annotation file: root_uid, claim_id, topic, subtopic, claim_template, x_variable, claimer, epistemic_status, claimer_affiliation, sentiment_status, claim_datetime, claim_location, claim_medium. For each pair of claim frames (claim_id_1, claim_id_2) on the same topic, your system should output one of the 4 claim relation values: identical, refuted_by, supported_by, related.

NIST is an agency of the
U.S. Department of Commerce

Last updated: Saturday, 04-Nov-2023 01:03:53 UTC
Comments to: [email protected]