===================================================================== TAC KBP 2016 EVENT ARGUMENT EXTRACTION AND LINKING EVALUATION RESULTS ===================================================================== Team ID: CMU_CS_Event Organization: Carnegie Mellon University Run ID: CMU_CS_Event1 Did the run access the live Web during the evaluation window: No Did the run perform any cross-sentence reasoning: No Did the run return meaningful confidence values: No Run ID: CMU_CS_Event2 Did the run access the live Web during the evaluation window: No Did the run perform any cross-sentence reasoning: No Did the run return meaningful confidence values: No Run ID: CMU_CS_Event3 Did the run access the live Web during the evaluation window: No Did the run perform any cross-sentence reasoning: No Did the run return meaningful confidence values: No Run ID: CMU_CS_Event4 Did the run access the live Web during the evaluation window: No Did the run perform any cross-sentence reasoning: No Did the run return meaningful confidence values: No Run ID: CMU_CS_Event5 Did the run access the live Web during the evaluation window: No Did the run perform any cross-sentence reasoning: No Did the run return meaningful confidence values: No ************************************************************* ************************************************************* Language: English Number of participating teams: 7 This report first contains a summary of the scores of your submissions compared to those of the other systems based on the official metric for each sub-task. This is followed by more detailed information about your system's performance. In the charts below, max is the best scoring submission across all participants. If there were at least three submissions for this language, the median score over the best submissions from each team will be given as well. All summary scores are given as percentiles based on bootstrap resampling . Document Level Argument Summary Score: System 5% 50% 95% CMU_CS_Event3 1.7 2.0 2.3 CMU_CS_Event5 1.8 2.3 2.6 CMU_CS_Event2 1.6 2.0 2.3 CMU_CS_Event1 2.4 2.9 3.4 CMU_CS_Event4 1.8 2.2 2.6 Max 8.4 9.5 10.6 Rank 4 2.4 2.9 3.4 The argument score is described in section 7.1 of the task guidelines. System 5% 50% 95% CMU_CS_Event3 0.1 0.2 0.4 CMU_CS_Event5 0.6 0.7 0.9 CMU_CS_Event2 0.2 0.3 0.4 CMU_CS_Event1 1.0 1.2 1.5 CMU_CS_Event4 0.6 0.8 1.1 Max 7.6 8.5 9.4 Rank 4 1.2 1.5 1.9 The linking score is described in section 7.1 of the task guidelines. Score details: TP = # of true positive document-level arguments found FP = # of false positive document-level arguments found FN = # of false negative document-level arguments ArgP = precision of finding document-level arguments ArgR = recall of finding document-level arguments F1 = F1 measure of finding document-level arguments ArgScore = official document-level argument finding score LinkScore = official document-level linking score All scores scaled 0-100 System TP FP FN ArgP ArgR ArgF1 ArgScore LinkScore CMU_CS_Event3 147.0 142.0 6488.0 50.9 2.2 4.2 2.0 0.2 CMU_CS_Event5 250.0 628.0 6376.0 28.5 3.8 6.7 2.3 0.7 CMU_CS_Event2 142.0 145.0 6493.0 49.5 2.1 4.1 2.0 0.3 CMU_CS_Event1 317.0 704.0 6308.0 31.0 4.8 8.3 2.9 1.2 CMU_CS_Event4 252.0 635.0 6373.0 28.4 3.8 6.7 2.2 0.8 ************************************************************* Language: Chinese Number of participating teams: 1 This report first contains a summary of the scores of your submissions compared to those of the other systems based on the official metric for each sub-task. This is followed by more detailed information about your system's performance. In the charts below, max is the best scoring submission across all participants. If there were at least three submissions for this language, the median score over the best submissions from each team will be given as well. All summary scores are given as percentiles based on bootstrap resampling . Document Level Argument Summary Score: System 5% 50% 95% CMU_CS_Event5 3.4 4.2 5.0 CMU_CS_Event4 3.6 4.5 5.2 CMU_CS_Event3 1.4 1.8 2.2 CMU_CS_Event2 1.3 1.6 2.0 CMU_CS_Event1 3.9 4.7 5.5 Max 3.9 4.7 5.5 The argument score is described in section 7.1 of the task guidelines. System 5% 50% 95% CMU_CS_Event5 0.9 1.3 1.7 CMU_CS_Event4 1.0 1.4 1.8 CMU_CS_Event3 0.3 0.5 0.7 CMU_CS_Event2 0.3 0.5 0.7 CMU_CS_Event1 1.3 1.7 2.2 Max 1.3 1.7 2.2 The linking score is described in section 7.1 of the task guidelines. Score details: TP = # of true positive document-level arguments found FP = # of false positive document-level arguments found FN = # of false negative document-level arguments ArgP = precision of finding document-level arguments ArgR = recall of finding document-level arguments F1 = F1 measure of finding document-level arguments ArgScore = official document-level argument finding score LinkScore = official document-level linking score All scores scaled 0-100 System TP FP FN ArgP ArgR ArgF1 ArgScore LinkScore CMU_CS_Event5 163.0 211.0 3285.0 43.6 4.7 8.5 4.2 1.3 CMU_CS_Event4 164.0 187.0 3284.0 46.7 4.8 8.6 4.5 1.4 CMU_CS_Event3 51.0 59.0 3399.0 46.4 1.5 2.9 1.8 0.5 CMU_CS_Event2 48.0 54.0 3402.0 47.1 1.4 2.7 1.6 0.5 CMU_CS_Event1 183.0 219.0 3265.0 45.5 5.3 9.5 4.7 1.7 ************************************************************* Language: Spanish Number of participating teams: 2 This report first contains a summary of the scores of your submissions compared to those of the other systems based on the official metric for each sub-task. This is followed by more detailed information about your system's performance. In the charts below, max is the best scoring submission across all participants. If there were at least three submissions for this language, the median score over the best submissions from each team will be given as well. All summary scores are given as percentiles based on bootstrap resampling . Document Level Argument Summary Score: System 5% 50% 95% CMU_CS_Event5 0.0 0.2 0.3 CMU_CS_Event4 0.0 0.3 0.4 CMU_CS_Event1 1.3 1.6 1.9 CMU_CS_Event2 1.3 1.6 1.9 CMU_CS_Event3 2.0 2.4 2.8 Max 2.0 2.4 2.8 The argument score is described in section 7.1 of the task guidelines. System 5% 50% 95% CMU_CS_Event5 0.0 0.0 0.1 CMU_CS_Event4 0.0 0.1 0.2 CMU_CS_Event1 0.3 0.5 0.7 CMU_CS_Event2 0.3 0.4 0.7 CMU_CS_Event3 0.3 0.5 0.8 Max 0.3 0.5 0.8 The linking score is described in section 7.1 of the task guidelines. Score details: TP = # of true positive document-level arguments found FP = # of false positive document-level arguments found FN = # of false negative document-level arguments ArgP = precision of finding document-level arguments ArgR = recall of finding document-level arguments F1 = F1 measure of finding document-level arguments ArgScore = official document-level argument finding score LinkScore = official document-level linking score All scores scaled 0-100 System TP FP FN ArgP ArgR ArgF1 ArgScore LinkScore CMU_CS_Event5 4.0 15.0 3503.0 21.1 0.1 0.2 0.2 0.0 CMU_CS_Event4 4.0 8.0 3503.0 33.3 0.1 0.2 0.3 0.1 CMU_CS_Event1 51.0 55.0 3456.0 48.1 1.5 2.8 1.6 0.5 CMU_CS_Event2 50.0 47.0 3457.0 51.5 1.4 2.8 1.6 0.4 CMU_CS_Event3 75.0 76.0 3432.0 49.7 2.1 4.1 2.4 0.5