Systematic Review Information Extraction (SRIE) 2018

Evaluation: February-November, 2018
Workshop: November 13-14, 2018


The National Toxicology Program (NTP), an interagency program headquartered at the National Institute of Environmental Health Sciences (NIEHS, part of the National Institutes of Health), and the Environmental Protection Agency (EPA) conduct systematic reviews of environmental agents to identify potential human health hazards. These reviews collect toxicity or health effects information on different chemicals from the published scientific literature including study details such as experimental protocols, animal models, and results. Because this information can vary widely from study to study, the systematic review serves a critical purpose by providing a transparent, standardized, multistep approach to identify, select, assess, and synthesize information for developing objective, evidence-based conclusions about potential chemical hazards. Furthermore, because research studies done at different points in time may reflect different standards in experimental protocols or reporting procedures, the systematic review approach serves to promote transparency and facilitate reproducibility of literature-based evaluations on environmental agents.

Some elements of information extraction in these systematic reviews are straightforward, such as identifying the species or sex of the experimental models. Others are more complex as publications may report multiple experiments with various exposures and doses and evaluate multiple endpoints. Authors may report experimental details using different units, different chemical names, and other variations in terminology. In addition, this information may be located in the text of the publication, or in a table, figure caption, or the figure itself. Currently, the information extracted in a systematic review is collected through a labor-intensive, manual process that is slow and often costly. NTP and EPA are interested in adopting automated processes for information extraction in systematic reviews of environmental chemicals. The application of this task is to develop automated tools that could improve the efficiency of systematic review information extraction to reduce completion time and labor-costs while maintaining quality and reproducibility. The results of this task will inform future NTP and EPA efforts aimed at systematic review automation, including subsequent challenges.


The purpose of the Systematic Review Information Extraction (SRIE) track is to develop and evaluate Information Extraction (IE) approaches that can assist in the systematic reviews of environmental agents. This track focuses on IE of experimental design factors found in the Material and Methods section ("methods section") of published studies of experimental animals exposed to environmental chemicals. The first goal of the track is to identify and annotate the experimental design factors. The second goal of the track is to identify relations between different experimental design factors.


The SRIE track has two tasks:

  • Task 1: Experimental design factors for the categories of exposure, animal group, dose group, and endpoint should be identified and the appropriate annotation tag applied.
  • Task 2: Relations between experimental design factors from Task 1 should be identified and the appropriate annotation tag applied.

Task 2 builds on Task 1. Participants may choose to participate in only Task 1, or in both Task 1 and Task 2.

As training data for Task 1, text for 100 methods sections will be released with annotations for mentions of experimental design factors. As training data for Task 2, relations among experimental design factors will be annotated for a subset of the 100 methods sections.

For the test set, participants will be provided ~300 methods sections as text documents to test their IE approach for identifying experimental design factors and their relations. Participants will be evaluated on an unidentified subset hidden within the test set. Evaluation will be P/R/F1-measure on mentions of factors and relations.


    Preliminary TAC SRIE 2018 Schedule
    June 1Release of Task 1 training data
    July 15Release of Task 2 training data
    July 15Deadline for registration for track participation
    August 15Release of Tasks 1 and 2 test data
    September 15Deadline for submission of system results
    October 1Release of individual evaluated results to participants
    October 15Deadline for short system descriptions
    October 15Deadline for workshop presentation proposals
    October 20Notification of acceptance of presentation proposals
    November 1Deadline for system reports (workshop notebook version)
    November 13-14TAC 2018 workshop in Gaithersburg, Maryland, USA
    February 15, 2019Deadline for system reports (final proceedings version)

Organizing Committee

Charles Schmitt (charles.schmitt@nih.gov), Co-Track coordinator
Mary Wolfe (wolfe@niehs.nih.gov), Co-Track coordinator
Michelle Angrish (Angrish.Michelle@epa.gov)
Dina Demner-Fushman (consultant, ddemner@mail.nih.gov)
Kristan Markey (markey.kristan@epa.gov)
Andrew Rooney (rooney@niehs.nih.gov)
Michele Taylor (taylor.michelem@epa.gov)
Vickie Walker (vickie.walker@nih.gov)
Byron Wallace (consultant, byron@ccs.neu.edu)

