===================================================================== TAC KBP 2015 EVENT ARGUMENT EXTRACTION AND LINKING EVALUATION RESULTS ===================================================================== Team ID: LDC Organization: Linguistic Data Consortium Run ID: LDC Did the run access the live Web during the evaluation window: No Did the run perform any cross-sentence reasoning: Yes Did the run use any distributed representations (e.g., of words): Yes Did the run return meaningful confidence values: No ************************************************************* ### The following are scores for from the TAC 2015 Event Argument and Linking Evaluation. ### For all scoring breakdowns, the summaries report: Precision, Recall, F1, EAArg Score, and Overall score. ### Details of the scoring and the scoring software can be found on the TAC 2015 EAL webpage. ### ### Scores are reported on the full data set (all_genre) and broken down by genre-- discussion forum only(df) newswire only(nw). ### ### The official score (withRealis) incorporates the correctness of the (ACTUAL, GENERIC, and OTHER) distinction ### and the correctness of canonical argument string resolution. As a diagnostic, we also report (a) a score ### that ignores the realis distinction (neutralizeRealis) and (b) a score that ignores both the realis distinction ### and canonical argument string resolution(neutraliseRealisCoref). ### ### Scores are reported over two data sets. Dataset1 (all_event_types), consists of 81 documents assessed for the ### full TAC EAL event taxonomy as specified in the 2015 evaluation plan. Dataset 2(restricted_event_types), ### consists of 201 documents assessed for only 6 event types (assertions outside of the 6 were ignored). Dataset2 ### includes the documents in Dataset1. Dataset 2 was assessed to allow a more in depth evaluation of event-specific ### performance (and variance across performance by event type). The 6 event types included in Dataset2 are: ### - Transaction.Transfer-Money ### - Movement.Transport-Artifact ### - Life.Marry ### - Contact.Meet ### - Conflict.Demonstrate ### - Conflict.Attack ### ### One participant (ZJU) submitted an submission an offset error. This system output was automatically fixed by BBN (the organizer) and ### the system by ZJU (the participant). Because the modifications were different, both numbers are reported. ### ### One participant (ver-CMU) participated in "verification" version of the task. This system took as its input all ### other system submissions. This submission included the ZJU submission which had broken offsets and ### did not include either BBN's fix or ZJU's fix. Thus it is not comparable to the other systems in task performed. ### ### The LDC submission was produced with an LDC annotator spending 45-60 minutes on the task of extracting arguments ### and grouping them. The low recall of the LDC submission is due at least in part to the time limitation. ### ### While all scores provide interesting diagnostic information, the "official" evaluation metric is Dataset1(all_event_types) on the ### both genres (all_genre) using the official(withRealis) metric. #################################### ###### All Event Types ###### ####### Genre: all_genre ####### ##### Scoring Configuration: withRealis ##### submission P R F1 EAArg EALink Overall LDC 75.5 40.0 52.3 36.7 33.7 35.2 ##### Scoring Configuration: neutralizeRealisCoref ##### submission P R F1 EAArg EALink Overall LDC 84.5 42.4 56.5 40.5 36.0 38.2 ##### Scoring Configuration: neutralizeRealis ##### submission P R F1 EAArg EALink Overall LDC 81.8 41.6 55.2 39.3 35.5 37.4 ####### Genre: df ####### ##### Scoring Configuration: withRealis ##### submission P R F1 EAArg EALink Overall LDC 74.2 41.9 53.6 38.2 37.3 37.8 ##### Scoring Configuration: neutralizeRealisCoref ##### submission P R F1 EAArg EALink Overall LDC 81.6 43.0 56.3 40.6 39.0 39.8 ##### Scoring Configuration: neutralizeRealis ##### submission P R F1 EAArg EALink Overall LDC 78.9 42.3 55.1 39.5 38.5 39.0 ####### Genre: nw ####### ##### Scoring Configuration: withRealis ##### submission P R F1 EAArg EALink Overall LDC 76.5 38.8 51.5 35.8 31.6 33.7 ##### Scoring Configuration: neutralizeRealisCoref ##### submission P R F1 EAArg EALink Overall LDC 86.4 42.0 56.5 40.4 34.1 37.3 ##### Scoring Configuration: neutralizeRealis ##### submission P R F1 EAArg EALink Overall LDC 83.7 41.1 55.1 39.1 33.6 36.4 #################################### ###### Restricted Event Types ###### ####### Genre: all_genre ####### ##### Scoring Configuration: withRealis ##### submission P R F1 EAArg EALink Overall LDC 73.5 31.0 43.6 28.9 25.0 26.9 ##### Scoring Configuration: neutralizeRealisCoref ##### submission P R F1 EAArg EALink Overall LDC 89.0 35.0 50.2 34.0 28.4 31.2 ##### Scoring Configuration: neutralizeRealis ##### submission P R F1 EAArg EALink Overall LDC 86.0 34.4 49.1 33.1 27.7 30.4 ####### Genre: df ####### ##### Scoring Configuration: withRealis ##### submission P R F1 EAArg EALink Overall LDC 72.6 34.5 46.8 32.0 29.3 30.6 ##### Scoring Configuration: neutralizeRealisCoref ##### submission P R F1 EAArg EALink Overall LDC 88.6 39.0 54.2 38.0 33.3 35.7 ##### Scoring Configuration: neutralizeRealis ##### submission P R F1 EAArg EALink Overall LDC 86.2 38.5 53.2 37.2 32.8 35.0 ####### Genre: nw ####### ##### Scoring Configuration: withRealis ##### submission P R F1 EAArg EALink Overall LDC 74.5 28.1 40.8 26.2 21.7 23.9 ##### Scoring Configuration: neutralizeRealisCoref ##### submission P R F1 EAArg EALink Overall LDC 89.4 31.5 46.6 30.6 24.3 27.4 ##### Scoring Configuration: neutralizeRealis ##### submission P R F1 EAArg EALink Overall LDC 85.9 30.8 45.3 29.5 23.6 26.6