======================================================================= TAC KBP 2015 EVENT ARGUMENT VERIFICATION AND LINKING EVALUATION RESULTS ======================================================================= Team ID: CMU_CS_event Organization: CMU-CS Run ID: CMU_CS_event1 Did the run access the live Web during the evaluation window: No Did the run use the source text: No Did the run use all of the Event Argument Extraction submissions: Yes Did the run use the EA system-provided confidence values: No Did the run use any distributed representations (e.g., of words): No Run ID: CMU_CS_event2 Did the run access the live Web during the evaluation window: No Did the run use the source text: No Did the run use all of the Event Argument Extraction submissions: Yes Did the run use the EA system-provided confidence values: No Did the run use any distributed representations (e.g., of words): No Run ID: CMU_CS_event3 Did the run access the live Web during the evaluation window: No Did the run use the source text: No Did the run use all of the Event Argument Extraction submissions: Yes Did the run use the EA system-provided confidence values: No Did the run use any distributed representations (e.g., of words): No Run ID: CMU_CS_event4 Did the run access the live Web during the evaluation window: No Did the run use the source text: No Did the run use all of the Event Argument Extraction submissions: Yes Did the run use the EA system-provided confidence values: No Did the run use any distributed representations (e.g., of words): No Run ID: CMU_CS_event5 Did the run access the live Web during the evaluation window: No Did the run use the source text: No Did the run use all of the Event Argument Extraction submissions: Yes Did the run use the EA system-provided confidence values: No Did the run use any distributed representations (e.g., of words): No ************************************************************* ### The following are scores for from the TAC 2015 Event Argument and Linking Evaluation. ### For all scoring breakdowns, the summaries report: Precision, Recall, F1, EAArg Score, and Overall score. ### Details of the scoring and the scoring software can be found on the TAC 2015 EAL webpage. ### ### Scores are reported on the full data set (all_genre) and broken down by genre-- discussion forum only(df) newswire only(nw). ### ### The official score (withRealis) incorporates the correctness of the (ACTUAL, GENERIC, and OTHER) distinction ### and the correctness of canonical argument string resolution. As a diagnostic, we also report (a) a score ### that ignores the realis distinction (neutralizeRealis) and (b) a score that ignores both the realis distinction ### and canonical argument string resolution(neutraliseRealisCoref). ### ### Scores are reported over two data sets. Dataset1 (all_event_types), consists of 81 documents assessed for the ### full TAC EAL event taxonomy as specified in the 2015 evaluation plan. Dataset 2(restricted_event_types), ### consists of 201 documents assessed for only 6 event types (assertions outside of the 6 were ignored). Dataset2 ### includes the documents in Dataset1. Dataset 2 was assessed to allow a more in depth evaluation of event-specific ### performance (and variance across performance by event type). The 6 event types included in Dataset2 are: ### - Transaction.Transfer-Money ### - Movement.Transport-Artifact ### - Life.Marry ### - Contact.Meet ### - Conflict.Demonstrate ### - Conflict.Attack ### ### One participant (ZJU) submitted an submission an offset error. This system output was automatically fixed by BBN (the organizer) and ### the system by ZJU (the participant). Because the modifications were different, both numbers are reported. ### ### One participant (ver-CMU) participated in "verification" version of the task. This system took as its input all ### other system submissions. This submission included the ZJU submission which had broken offsets and ### did not include either BBN's fix or ZJU's fix. Thus it is not comparable to the other systems in task performed. ### ### The LDC submission was produced with an LDC annotator spending 45-60 minutes on the task of extracting arguments ### and grouping them. The low recall of the LDC submission is due at least in part to the time limitation. ### ### While all scores provide interesting diagnostic information, the "official" evaluation metric is Dataset1(all_event_types) on the ### both genres (all_genre) using the official(withRealis) metric. #################################### ###### All Event Types ###### ####### Genre: all_genre ####### ##### Scoring Configuration: withRealis ##### submission P R F1 EAArg EALink Overall ver-CMU_CS_event1 15.0 47.8 22.8 5.6 21.1 13.4 ver-CMU_CS_event2 15.0 47.8 22.8 5.5 22.2 13.9 ver-CMU_CS_event3 31.5 38.2 34.5 19.5 16.1 17.8 ver-CMU_CS_event4 31.5 38.2 34.5 19.5 16.4 17.9 ver-CMU_CS_event5 14.3 51.0 22.3 5.0 26.4 15.7 ##### Scoring Configuration: neutralizeRealisCoref ##### submission P R F1 EAArg EALink Overall ver-CMU_CS_event1 24.6 63.9 35.5 20.5 30.8 25.7 ver-CMU_CS_event2 24.6 63.9 35.5 20.6 31.5 26.1 ver-CMU_CS_event3 45.6 51.2 48.2 36.5 22.1 29.3 ver-CMU_CS_event4 45.7 51.1 48.2 36.5 22.4 29.4 ver-CMU_CS_event5 23.4 68.7 34.9 19.4 37.2 28.3 ##### Scoring Configuration: neutralizeRealis ##### submission P R F1 EAArg EALink Overall ver-CMU_CS_event1 21.0 57.6 30.8 12.8 25.8 19.3 ver-CMU_CS_event2 21.0 57.7 30.8 12.8 26.1 19.5 ver-CMU_CS_event3 38.7 45.9 42.0 28.6 18.8 23.7 ver-CMU_CS_event4 38.7 45.8 42.0 28.5 19.0 23.8 ver-CMU_CS_event5 20.1 62.5 30.4 12.0 31.9 22.0 ####### Genre: df ####### ##### Scoring Configuration: withRealis ##### submission P R F1 EAArg EALink Overall ver-CMU_CS_event1 13.4 47.4 20.9 7.3 21.5 14.4 ver-CMU_CS_event2 13.4 47.6 20.9 7.3 22.9 15.1 ver-CMU_CS_event3 29.4 37.8 33.1 18.4 17.8 18.1 ver-CMU_CS_event4 29.4 37.9 33.1 18.6 17.5 18.0 ver-CMU_CS_event5 13.0 51.4 20.8 7.1 25.9 16.5 ##### Scoring Configuration: neutralizeRealisCoref ##### submission P R F1 EAArg EALink Overall ver-CMU_CS_event1 22.5 62.4 33.1 20.2 33.2 26.7 ver-CMU_CS_event2 22.6 62.5 33.2 20.4 33.4 26.9 ver-CMU_CS_event3 42.9 50.2 46.3 34.9 25.5 30.2 ver-CMU_CS_event4 42.9 50.2 46.3 34.9 25.6 30.3 ver-CMU_CS_event5 21.7 67.7 32.9 19.8 37.6 28.7 ##### Scoring Configuration: neutralizeRealis ##### submission P R F1 EAArg EALink Overall ver-CMU_CS_event1 19.1 55.8 28.5 14.0 26.8 20.4 ver-CMU_CS_event2 19.2 56.1 28.6 14.3 27.4 20.9 ver-CMU_CS_event3 35.7 44.3 39.5 26.4 20.6 23.5 ver-CMU_CS_event4 35.8 44.4 39.6 26.6 20.5 23.5 ver-CMU_CS_event5 18.6 61.5 28.6 13.1 32.2 22.7 ####### Genre: nw ####### ##### Scoring Configuration: withRealis ##### submission P R F1 EAArg EALink Overall ver-CMU_CS_event1 16.2 48.0 24.2 4.5 20.9 12.7 ver-CMU_CS_event2 16.2 47.9 24.2 4.4 21.8 13.1 ver-CMU_CS_event3 33.0 38.5 35.5 20.1 15.1 17.6 ver-CMU_CS_event4 32.9 38.3 35.4 20.0 15.8 17.9 ver-CMU_CS_event5 15.3 50.8 23.5 3.7 26.7 15.2 ##### Scoring Configuration: neutralizeRealisCoref ##### submission P R F1 EAArg EALink Overall ver-CMU_CS_event1 26.1 64.8 37.2 20.7 29.4 25.1 ver-CMU_CS_event2 26.0 64.8 37.1 20.7 30.4 25.6 ver-CMU_CS_event3 47.4 51.8 49.5 37.5 20.0 28.7 ver-CMU_CS_event4 47.5 51.8 49.6 37.5 20.4 28.9 ver-CMU_CS_event5 24.6 69.3 36.3 19.1 37.0 28.1 ##### Scoring Configuration: neutralizeRealis ##### submission P R F1 EAArg EALink Overall ver-CMU_CS_event1 22.3 58.8 32.3 12.1 25.1 18.6 ver-CMU_CS_event2 22.3 58.7 32.3 11.9 25.3 18.6 ver-CMU_CS_event3 40.7 46.9 43.6 29.9 17.8 23.8 ver-CMU_CS_event4 40.6 46.7 43.4 29.8 18.1 23.9 ver-CMU_CS_event5 21.1 63.1 31.6 11.4 31.8 21.6 #################################### ###### Restricted Event Types ###### ####### Genre: all_genre ####### ##### Scoring Configuration: withRealis ##### submission P R F1 EAArg EALink Overall ver-CMU_CS_event1 12.6 48.2 20.0 8.6 22.2 15.4 ver-CMU_CS_event2 12.6 48.4 20.0 8.7 23.0 15.9 ver-CMU_CS_event3 29.3 37.4 32.9 21.9 14.7 18.3 ver-CMU_CS_event4 29.3 37.4 32.9 21.9 15.3 18.6 ver-CMU_CS_event5 12.2 52.2 19.8 8.9 27.5 18.2 ##### Scoring Configuration: neutralizeRealisCoref ##### submission P R F1 EAArg EALink Overall ver-CMU_CS_event1 20.9 64.7 31.6 22.6 31.6 27.1 ver-CMU_CS_event2 20.9 64.7 31.6 22.6 32.6 27.6 ver-CMU_CS_event3 42.5 49.4 45.7 35.7 21.6 28.6 ver-CMU_CS_event4 42.4 49.4 45.6 35.6 21.9 28.8 ver-CMU_CS_event5 20.1 70.1 31.2 22.3 38.6 30.5 ##### Scoring Configuration: neutralizeRealis ##### submission P R F1 EAArg EALink Overall ver-CMU_CS_event1 17.9 57.6 27.3 15.6 26.4 21.0 ver-CMU_CS_event2 17.9 57.7 27.3 15.7 26.8 21.2 ver-CMU_CS_event3 36.7 44.5 40.2 29.2 17.6 23.4 ver-CMU_CS_event4 36.6 44.5 40.2 29.2 18.2 23.7 ver-CMU_CS_event5 17.3 63.1 27.2 15.5 33.1 24.3 ####### Genre: df ####### ##### Scoring Configuration: withRealis ##### submission P R F1 EAArg EALink Overall ver-CMU_CS_event1 11.0 43.4 17.6 7.4 20.4 13.9 ver-CMU_CS_event2 11.0 43.5 17.6 7.4 20.7 14.1 ver-CMU_CS_event3 23.2 31.2 26.6 15.6 12.9 14.3 ver-CMU_CS_event4 23.4 31.5 26.9 15.8 13.2 14.5 ver-CMU_CS_event5 10.9 48.3 17.8 8.2 25.2 16.7 ##### Scoring Configuration: neutralizeRealisCoref ##### submission P R F1 EAArg EALink Overall ver-CMU_CS_event1 20.2 60.8 30.3 22.1 32.5 27.3 ver-CMU_CS_event2 20.2 60.8 30.3 22.1 33.0 27.6 ver-CMU_CS_event3 36.9 44.8 40.5 30.0 21.6 25.8 ver-CMU_CS_event4 36.9 44.9 40.5 30.1 21.1 25.6 ver-CMU_CS_event5 19.7 67.4 30.5 22.3 39.8 31.1 ##### Scoring Configuration: neutralizeRealis ##### submission P R F1 EAArg EALink Overall ver-CMU_CS_event1 16.8 52.5 25.5 14.9 25.0 19.9 ver-CMU_CS_event2 16.8 52.6 25.5 15.0 24.8 19.9 ver-CMU_CS_event3 30.5 38.5 34.0 22.5 15.4 18.9 ver-CMU_CS_event4 30.6 38.7 34.2 22.8 16.1 19.4 ver-CMU_CS_event5 16.6 59.3 25.9 15.3 31.4 23.4 ####### Genre: nw ####### ##### Scoring Configuration: withRealis ##### submission P R F1 EAArg EALink Overall ver-CMU_CS_event1 13.9 52.4 22.0 9.7 23.6 16.6 ver-CMU_CS_event2 14.0 52.5 22.1 9.8 24.8 17.3 ver-CMU_CS_event3 35.0 42.6 38.4 27.3 16.1 21.7 ver-CMU_CS_event4 34.9 42.4 38.3 27.0 16.9 22.0 ver-CMU_CS_event5 13.4 55.5 21.6 9.5 29.4 19.4 ##### Scoring Configuration: neutralizeRealisCoref ##### submission P R F1 EAArg EALink Overall ver-CMU_CS_event1 21.4 68.0 32.6 22.9 30.8 26.9 ver-CMU_CS_event2 21.4 68.1 32.6 23.1 32.3 27.7 ver-CMU_CS_event3 47.7 53.4 50.4 40.6 21.6 31.1 ver-CMU_CS_event4 47.6 53.3 50.3 40.4 22.6 31.5 ver-CMU_CS_event5 20.5 72.5 32.0 22.2 37.7 30.0 ##### Scoring Configuration: neutralizeRealis ##### submission P R F1 EAArg EALink Overall ver-CMU_CS_event1 18.8 62.1 28.9 16.1 27.6 21.9 ver-CMU_CS_event2 18.8 62.2 28.9 16.3 28.4 22.3 ver-CMU_CS_event3 42.5 49.8 45.9 35.0 19.5 27.2 ver-CMU_CS_event4 42.3 49.5 45.6 34.7 20.0 27.3 ver-CMU_CS_event5 18.0 66.4 28.3 15.7 34.5 25.1