===================================================================== TAC KBP 2015 EVENT ARGUMENT EXTRACTION AND LINKING EVALUATION RESULTS ===================================================================== Team ID: BBN Organization: BBN Run ID: BBN1 Did the run access the live Web during the evaluation window: No Did the run perform any cross-sentence reasoning: Yes Did the run use any distributed representations (e.g., of words): Yes Did the run return meaningful confidence values: Yes Run ID: BBN2 Did the run access the live Web during the evaluation window: No Did the run perform any cross-sentence reasoning: Yes Did the run use any distributed representations (e.g., of words): Yes Did the run return meaningful confidence values: Yes Run ID: BBN3 Did the run access the live Web during the evaluation window: No Did the run perform any cross-sentence reasoning: Yes Did the run use any distributed representations (e.g., of words): Yes Did the run return meaningful confidence values: Yes Run ID: BBN4 Did the run access the live Web during the evaluation window: No Did the run perform any cross-sentence reasoning: Yes Did the run use any distributed representations (e.g., of words): Yes Did the run return meaningful confidence values: Yes Run ID: BBN5 Did the run access the live Web during the evaluation window: No Did the run perform any cross-sentence reasoning: Yes Did the run use any distributed representations (e.g., of words): Yes Did the run return meaningful confidence values: Yes ************************************************************* ### The following are scores for from the TAC 2015 Event Argument and Linking Evaluation. ### For all scoring breakdowns, the summaries report: Precision, Recall, F1, EAArg Score, and Overall score. ### Details of the scoring and the scoring software can be found on the TAC 2015 EAL webpage. ### ### Scores are reported on the full data set (all_genre) and broken down by genre-- discussion forum only(df) newswire only(nw). ### ### The official score (withRealis) incorporates the correctness of the (ACTUAL, GENERIC, and OTHER) distinction ### and the correctness of canonical argument string resolution. As a diagnostic, we also report (a) a score ### that ignores the realis distinction (neutralizeRealis) and (b) a score that ignores both the realis distinction ### and canonical argument string resolution(neutraliseRealisCoref). ### ### Scores are reported over two data sets. Dataset1 (all_event_types), consists of 81 documents assessed for the ### full TAC EAL event taxonomy as specified in the 2015 evaluation plan. Dataset 2(restricted_event_types), ### consists of 201 documents assessed for only 6 event types (assertions outside of the 6 were ignored). Dataset2 ### includes the documents in Dataset1. Dataset 2 was assessed to allow a more in depth evaluation of event-specific ### performance (and variance across performance by event type). The 6 event types included in Dataset2 are: ### - Transaction.Transfer-Money ### - Movement.Transport-Artifact ### - Life.Marry ### - Contact.Meet ### - Conflict.Demonstrate ### - Conflict.Attack ### ### One participant (ZJU) submitted an submission an offset error. This system output was automatically fixed by BBN (the organizer) and ### the system by ZJU (the participant). Because the modifications were different, both numbers are reported. ### ### One participant (ver-CMU) participated in "verification" version of the task. This system took as its input all ### other system submissions. This submission included the ZJU submission which had broken offsets and ### did not include either BBN's fix or ZJU's fix. Thus it is not comparable to the other systems in task performed. ### ### The LDC submission was produced with an LDC annotator spending 45-60 minutes on the task of extracting arguments ### and grouping them. The low recall of the LDC submission is due at least in part to the time limitation. ### ### While all scores provide interesting diagnostic information, the "official" evaluation metric is Dataset1(all_event_types) on the ### both genres (all_genre) using the official(withRealis) metric. #################################### ###### All Event Types ###### ####### Genre: all_genre ####### ##### Scoring Configuration: withRealis ##### submission P R F1 EAArg EALink Overall BBN1 36.8 39.2 38.0 23.6 23.3 23.5 BBN2 34.2 36.8 35.5 21.1 22.6 21.9 BBN3 36.9 35.8 36.3 21.5 20.3 20.9 BBN4 36.9 38.8 37.8 23.6 23.3 23.4 BBN5 46.1 29.2 35.8 21.3 16.2 18.7 ##### Scoring Configuration: neutralizeRealisCoref ##### submission P R F1 EAArg EALink Overall BBN1 52.0 49.9 50.9 38.6 30.0 34.3 BBN2 49.5 48.0 48.7 36.1 29.0 32.5 BBN3 52.5 45.6 48.8 35.4 26.0 30.7 BBN4 52.3 49.6 50.9 38.5 30.0 34.3 BBN5 64.5 36.9 46.9 31.9 20.5 26.2 ##### Scoring Configuration: neutralizeRealis ##### submission P R F1 EAArg EALink Overall BBN1 45.4 45.5 45.4 32.3 26.6 29.4 BBN2 42.5 43.1 42.8 29.1 25.9 27.5 BBN3 45.7 41.7 43.6 29.5 23.1 26.3 BBN4 45.7 45.4 45.5 32.2 26.6 29.4 BBN5 55.3 33.4 41.6 26.8 18.1 22.4 ####### Genre: df ####### ##### Scoring Configuration: withRealis ##### submission P R F1 EAArg EALink Overall BBN1 39.0 38.5 38.7 25.6 24.9 25.2 BBN2 35.4 35.0 35.2 22.0 23.5 22.8 BBN3 38.5 34.4 36.3 22.6 21.2 21.9 BBN4 38.6 37.9 38.2 24.9 24.6 24.7 BBN5 49.1 29.2 36.6 22.6 17.7 20.1 ##### Scoring Configuration: neutralizeRealisCoref ##### submission P R F1 EAArg EALink Overall BBN1 54.0 46.8 50.1 37.4 31.2 34.3 BBN2 51.3 44.2 47.5 34.4 28.8 31.6 BBN3 56.2 43.6 49.1 35.4 26.2 30.8 BBN4 53.7 46.6 49.9 37.1 30.7 33.9 BBN5 68.5 36.5 47.6 32.5 21.6 27.1 ##### Scoring Configuration: neutralizeRealis ##### submission P R F1 EAArg EALink Overall BBN1 47.6 42.9 45.1 32.1 28.0 30.1 BBN2 44.1 39.8 41.8 28.5 26.3 27.4 BBN3 48.9 39.7 43.8 29.9 23.4 26.6 BBN4 46.8 42.5 44.5 31.3 27.4 29.3 BBN5 59.5 33.0 42.5 27.8 19.6 23.7 ####### Genre: nw ####### ##### Scoring Configuration: withRealis ##### submission P R F1 EAArg EALink Overall BBN1 35.5 39.7 37.5 22.4 22.4 22.4 BBN2 33.6 37.9 35.6 20.6 22.1 21.3 BBN3 36.0 36.6 36.3 20.9 19.8 20.3 BBN4 36.0 39.4 37.6 22.8 22.6 22.7 BBN5 44.5 29.2 35.3 20.5 15.3 17.9 ##### Scoring Configuration: neutralizeRealisCoref ##### submission P R F1 EAArg EALink Overall BBN1 51.0 51.8 51.4 39.4 29.3 34.3 BBN2 48.6 50.4 49.5 37.1 29.1 33.1 BBN3 50.7 46.8 48.7 35.4 25.9 30.7 BBN4 51.5 51.5 51.5 39.4 29.6 34.5 BBN5 62.3 37.2 46.6 31.6 19.9 25.7 ##### Scoring Configuration: neutralizeRealis ##### submission P R F1 EAArg EALink Overall BBN1 44.3 47.2 45.7 32.4 25.7 29.1 BBN2 41.6 45.2 43.3 29.5 25.7 27.6 BBN3 44.0 42.9 43.4 29.3 23.0 26.2 BBN4 45.0 47.2 46.1 32.8 26.2 29.5 BBN5 53.0 33.6 41.1 26.2 17.1 21.6 #################################### ###### Restricted Event Types ###### ####### Genre: all_genre ####### ##### Scoring Configuration: withRealis ##### submission P R F1 EAArg EALink Overall BBN1 36.7 40.7 38.6 27.3 23.5 25.4 BBN2 34.6 39.3 36.8 25.6 23.8 24.7 BBN3 37.5 38.3 37.9 26.1 21.0 23.6 BBN4 36.8 40.2 38.4 26.8 23.0 24.9 BBN5 45.2 28.8 35.2 22.3 15.6 18.9 ##### Scoring Configuration: neutralizeRealisCoref ##### submission P R F1 EAArg EALink Overall BBN1 54.1 52.2 53.1 42.5 30.4 36.5 BBN2 50.7 50.5 50.6 40.1 30.3 35.2 BBN3 55.4 48.7 51.8 40.2 27.7 33.9 BBN4 54.5 51.3 52.9 41.8 29.8 35.8 BBN5 66.2 37.2 47.6 33.0 20.3 26.7 ##### Scoring Configuration: neutralizeRealis ##### submission P R F1 EAArg EALink Overall BBN1 47.6 47.8 47.7 36.3 27.1 31.7 BBN2 44.1 45.9 45.0 33.8 27.3 30.5 BBN3 48.4 44.4 46.3 34.1 24.4 29.3 BBN4 47.8 47.0 47.4 35.8 26.6 31.2 BBN5 57.1 33.4 42.1 28.0 17.8 22.9 ####### Genre: df ####### ##### Scoring Configuration: withRealis ##### submission P R F1 EAArg EALink Overall BBN1 33.1 33.6 33.3 22.0 20.5 21.2 BBN2 33.0 33.8 33.4 22.1 21.0 21.6 BBN3 32.5 31.1 31.8 19.7 17.9 18.8 BBN4 33.1 33.1 33.1 21.2 19.2 20.2 BBN5 38.3 22.4 28.3 16.5 12.5 14.5 ##### Scoring Configuration: neutralizeRealisCoref ##### submission P R F1 EAArg EALink Overall BBN1 52.5 44.9 48.4 36.8 27.3 32.0 BBN2 50.6 43.5 46.8 34.8 27.5 31.2 BBN3 53.2 42.0 46.9 34.6 25.1 29.9 BBN4 52.5 43.9 47.8 35.8 26.3 31.1 BBN5 61.8 30.5 40.8 26.9 16.6 21.7 ##### Scoring Configuration: neutralizeRealis ##### submission P R F1 EAArg EALink Overall BBN1 45.1 40.0 42.4 30.7 23.2 26.9 BBN2 44.0 39.6 41.7 30.0 24.3 27.2 BBN3 46.1 37.8 41.5 28.9 21.4 25.1 BBN4 45.0 39.2 41.9 29.8 22.0 25.9 BBN5 52.3 26.7 35.4 22.1 13.8 17.9 ####### Genre: nw ####### ##### Scoring Configuration: withRealis ##### submission P R F1 EAArg EALink Overall BBN1 39.3 46.7 42.7 31.7 26.0 28.9 BBN2 35.7 44.0 39.4 28.5 26.1 27.3 BBN3 41.2 44.5 42.8 31.6 23.5 27.5 BBN4 39.5 46.2 42.6 31.6 26.0 28.8 BBN5 50.3 34.1 40.6 27.1 18.0 22.6 ##### Scoring Configuration: neutralizeRealisCoref ##### submission P R F1 EAArg EALink Overall BBN1 55.3 58.6 56.9 47.4 32.9 40.2 BBN2 50.8 56.5 53.5 44.7 32.5 38.6 BBN3 56.9 54.5 55.7 45.0 29.8 37.4 BBN4 55.9 57.7 56.8 47.0 32.6 39.8 BBN5 69.3 43.0 53.1 38.4 23.4 30.9 ##### Scoring Configuration: neutralizeRealis ##### submission P R F1 EAArg EALink Overall BBN1 49.3 54.5 51.8 41.2 30.4 35.8 BBN2 44.2 51.3 47.5 37.1 29.6 33.4 BBN3 50.1 50.1 50.1 38.6 26.9 32.8 BBN4 49.7 53.9 51.7 40.9 30.4 35.6 BBN5 60.4 39.3 47.6 33.0 21.1 27.0