=====================================================================
TAC KBP 2016 EVENT ARGUMENT EXTRACTION AND LINKING EVALUATION RESULTS
=====================================================================


Team ID:  CMU_CS_Event
Organization:  Carnegie Mellon University

Run ID:  CMU_CS_Event1
Did the run access the live Web during the evaluation window:  No
Did the run perform any cross-sentence reasoning: No
Did the run return meaningful confidence values: No

Run ID:  CMU_CS_Event2
Did the run access the live Web during the evaluation window:  No
Did the run perform any cross-sentence reasoning: No
Did the run return meaningful confidence values: No

Run ID:  CMU_CS_Event3
Did the run access the live Web during the evaluation window:  No
Did the run perform any cross-sentence reasoning: No
Did the run return meaningful confidence values: No

Run ID:  CMU_CS_Event4
Did the run access the live Web during the evaluation window:  No
Did the run perform any cross-sentence reasoning: No
Did the run return meaningful confidence values: No

Run ID:  CMU_CS_Event5
Did the run access the live Web during the evaluation window:  No
Did the run perform any cross-sentence reasoning: No
Did the run return meaningful confidence values: No


*************************************************************
*************************************************************

Language: English
Number of participating teams: 7

This report first contains a summary of the scores of your submissions
compared to those of the other systems based on the official metric
for each sub-task.  This is followed by more detailed information
about your system's performance.  In the charts below, max is the best
scoring submission across all participants.  If there were at least
three submissions for this language, the median score over the best
submissions from each team will be given as well.  All summary scores
are given as percentiles based on bootstrap resampling .

Document Level Argument Summary Score:

              System	        5%	       50%	      95%

       CMU_CS_Event3	       1.7	       2.0	       2.3
       CMU_CS_Event5	       1.8	       2.3	       2.6
       CMU_CS_Event2	       1.6	       2.0	       2.3
       CMU_CS_Event1	       2.4	       2.9	       3.4
       CMU_CS_Event4	       1.8	       2.2	       2.6
                 Max	       8.4	       9.5	      10.6
              Rank 4	       2.4	       2.9	       3.4

The argument score is described in section 7.1 of the task guidelines.

              System	        5%	       50%	      95%

       CMU_CS_Event3	       0.1	       0.2	       0.4
       CMU_CS_Event5	       0.6	       0.7	       0.9
       CMU_CS_Event2	       0.2	       0.3	       0.4
       CMU_CS_Event1	       1.0	       1.2	       1.5
       CMU_CS_Event4	       0.6	       0.8	       1.1
                 Max	       7.6	       8.5	       9.4
              Rank 4	       1.2	       1.5	       1.9

The linking score is described in section 7.1 of the task guidelines.

Score details:
TP = # of true positive document-level arguments found
FP = # of false positive document-level arguments found
FN = # of false negative document-level arguments
ArgP = precision of finding document-level arguments
ArgR = recall of finding document-level arguments
F1 = F1 measure of finding document-level arguments
ArgScore = official document-level argument finding score
LinkScore = official document-level linking score

All scores scaled 0-100

    System	    TP	    FP	    FN	  ArgP	  ArgR	 ArgF1	ArgScore	LinkScore
CMU_CS_Event3	 147.0	 142.0	6488.0	  50.9	   2.2	   4.2	   2.0	   0.2
CMU_CS_Event5	 250.0	 628.0	6376.0	  28.5	   3.8	   6.7	   2.3	   0.7
CMU_CS_Event2	 142.0	 145.0	6493.0	  49.5	   2.1	   4.1	   2.0	   0.3
CMU_CS_Event1	 317.0	 704.0	6308.0	  31.0	   4.8	   8.3	   2.9	   1.2
CMU_CS_Event4	 252.0	 635.0	6373.0	  28.4	   3.8	   6.7	   2.2	   0.8

*************************************************************

Language: Chinese
Number of participating teams: 1


This report first contains a summary of the scores of your submissions
compared to those of the other systems based on the official metric
for each sub-task.  This is followed by more detailed information
about your system's performance.  In the charts below, max is the best
scoring submission across all participants.  If there were at least
three submissions for this language, the median score over the best
submissions from each team will be given as well.  All summary scores
are given as percentiles based on bootstrap resampling .

Document Level Argument Summary Score:

              System	        5%	       50%	      95%

       CMU_CS_Event5	       3.4	       4.2	       5.0
       CMU_CS_Event4	       3.6	       4.5	       5.2
       CMU_CS_Event3	       1.4	       1.8	       2.2
       CMU_CS_Event2	       1.3	       1.6	       2.0
       CMU_CS_Event1	       3.9	       4.7	       5.5
                 Max	       3.9	       4.7	       5.5

The argument score is described in section 7.1 of the task guidelines.

              System	        5%	       50%	      95%

       CMU_CS_Event5	       0.9	       1.3	       1.7
       CMU_CS_Event4	       1.0	       1.4	       1.8
       CMU_CS_Event3	       0.3	       0.5	       0.7
       CMU_CS_Event2	       0.3	       0.5	       0.7
       CMU_CS_Event1	       1.3	       1.7	       2.2
                 Max	       1.3	       1.7	       2.2

The linking score is described in section 7.1 of the task guidelines.

Score details:
TP = # of true positive document-level arguments found
FP = # of false positive document-level arguments found
FN = # of false negative document-level arguments
ArgP = precision of finding document-level arguments
ArgR = recall of finding document-level arguments
F1 = F1 measure of finding document-level arguments
ArgScore = official document-level argument finding score
LinkScore = official document-level linking score

All scores scaled 0-100

    System	    TP	    FP	    FN	  ArgP	  ArgR	 ArgF1	ArgScore	LinkScore
CMU_CS_Event5	 163.0	 211.0	3285.0	  43.6	   4.7	   8.5	   4.2	   1.3
CMU_CS_Event4	 164.0	 187.0	3284.0	  46.7	   4.8	   8.6	   4.5	   1.4
CMU_CS_Event3	  51.0	  59.0	3399.0	  46.4	   1.5	   2.9	   1.8	   0.5
CMU_CS_Event2	  48.0	  54.0	3402.0	  47.1	   1.4	   2.7	   1.6	   0.5
CMU_CS_Event1	 183.0	 219.0	3265.0	  45.5	   5.3	   9.5	   4.7	   1.7

*************************************************************

Language: Spanish
Number of participating teams: 2


This report first contains a summary of the scores of your submissions
compared to those of the other systems based on the official metric
for each sub-task.  This is followed by more detailed information
about your system's performance.  In the charts below, max is the best
scoring submission across all participants.  If there were at least
three submissions for this language, the median score over the best
submissions from each team will be given as well.  All summary scores
are given as percentiles based on bootstrap resampling .

Document Level Argument Summary Score:

              System	        5%	       50%	      95%

       CMU_CS_Event5	       0.0	       0.2	       0.3
       CMU_CS_Event4	       0.0	       0.3	       0.4
       CMU_CS_Event1	       1.3	       1.6	       1.9
       CMU_CS_Event2	       1.3	       1.6	       1.9
       CMU_CS_Event3	       2.0	       2.4	       2.8
                 Max	       2.0	       2.4	       2.8

The argument score is described in section 7.1 of the task guidelines.

              System	        5%	       50%	      95%

       CMU_CS_Event5	       0.0	       0.0	       0.1
       CMU_CS_Event4	       0.0	       0.1	       0.2
       CMU_CS_Event1	       0.3	       0.5	       0.7
       CMU_CS_Event2	       0.3	       0.4	       0.7
       CMU_CS_Event3	       0.3	       0.5	       0.8
                 Max	       0.3	       0.5	       0.8

The linking score is described in section 7.1 of the task guidelines.

Score details:
TP = # of true positive document-level arguments found
FP = # of false positive document-level arguments found
FN = # of false negative document-level arguments
ArgP = precision of finding document-level arguments
ArgR = recall of finding document-level arguments
F1 = F1 measure of finding document-level arguments
ArgScore = official document-level argument finding score
LinkScore = official document-level linking score

All scores scaled 0-100

    System	    TP	    FP	    FN	  ArgP	  ArgR	 ArgF1	ArgScore	LinkScore
CMU_CS_Event5	   4.0	  15.0	3503.0	  21.1	   0.1	   0.2	   0.2	   0.0
CMU_CS_Event4	   4.0	   8.0	3503.0	  33.3	   0.1	   0.2	   0.3	   0.1
CMU_CS_Event1	  51.0	  55.0	3456.0	  48.1	   1.5	   2.8	   1.6	   0.5
CMU_CS_Event2	  50.0	  47.0	3457.0	  51.5	   1.4	   2.8	   1.6	   0.4
CMU_CS_Event3	  75.0	  76.0	3432.0	  49.7	   2.1	   4.1	   2.4	   0.5