TAC 2023 Tracks
PLABA
RUFES
CRUX
Guidelines
Data
Tools
Schedule
Mailing List
Reading List
Call for Participation
Track Registration
Reporting Guidelines
TAC 2023 Workshop
|
|
CRUX 2023 Data
Data that are required for the TAC 2023 tracks are distributed
at no cost to track participants. Whenever possible, data are
distributed by NIST or the Linguistic Data Consortium via Web
download; data are mailed as physical disks only if they cannot be
made available for download.
Access to TAC 2023 data is restricted to registered TAC 2023
participants who have submitted all the required User Agreement forms.
Each participating team will provide a TAC 2023 Team ID and will receive a Team
Password upon registration.
Teams that have participated in past TAC evaluations must register and
obtain a new TAC 2023 Team ID and password for 2023. Teams must also submit new (current)
User Agreement forms for 2023.
Once the
required User Agreement forms have been received for that team, NIST
will activate the TAC 2023 Team ID and Password to give access to
password-protected TAC 2023 resources that are distributed by NIST (a different username and password are required for data that are downloaded from LDC's intranet).
Data resources available for CRUX 2023 system development are listed in the 2023 Text Analysis Conference (TAC) KBP Evaluation License Agreement and must be requested by name and catalog number from the LDC upon registration. When additional data become available for the CRUX track, they will automatically be distributed to TAC 2023 teams registered for the track.
The following development data and evaluation data are currently available to CRUX 2023 teams:
Development Data:
- LDC2021E11: AIDA Phase 3 Practice Topic Source Data V2.0 (Download from LDC)
- LDC2021E16: AIDA Phase 3 TA3 Practice Topic Annotation V5.1 (Download from LDC)
- DWD snapshots of Wikidata:
- DWD overlay (defining event types and relation types, along with their argument roles and selectional preferences):
- xpo_v5.1a.json (released July 20, 2023, based on Qnodes and Pnodes in the live Wikidata; most similar DWD version would be DWD V6)
Evaluation Data:
- LDC2023E10: SMKBP 2023 Claim Frame Evaluation Source Data (Download from LDC)
- CRUX 2023 Evaluation Topics: A tab-separated file with the topic, subtopic, and claim_template, for 3 new evaluation topics (evaluation topics are not included in LDC2021E16).
- CRUX 2023 Task 1 Evaluation Document IDs (root_uids): 250 root_uids (one uid per line) for source documents in LDC2023E10, from which Task 1 systems should extract claim frames for the 3 evaluation topics in CRUX 2023 Evaluation Topics.
- CRUX 2023 Task 1 Evaluation KBs: 250 comma-separated files (one file per CRUX 2023 Task 1 Evaluation Document root uid) with the fields "entID" and "entity_mention". The claim frames extracted from a source document must populate the X Variable, Claimer, Claimer Affiliation, Claim Location, and Claim Medium fields using an entity_mention from the KB for that document. Each KB has a default entity, "EntID_200,default: document author", which is the author of the source document and can be useful for populating the Claimer field of a claim frame.
- CRUX2023 Task 2 Evaluation claim frames (request data from track coordinators after you are done submitting all of your Task 1 runs): Gold standard claim frames for the 3 evaluation topics and 250 evaluation documents, with one claim frame per line. Each claim frame has the following columns from the gold standard claim_frame.tabs annotation file: root_uid, claim_id, topic, subtopic, claim_template, x_variable, claimer, epistemic_status, claimer_affiliation, sentiment_status, claim_datetime, claim_location, claim_medium. For each pair of claim frames (claim_id_1, claim_id_2) on the same topic, your system should output one of the 4 claim relation values: identical, refuted_by, supported_by, related.
|