======================================================= TAC KBP 2016 ENGLISH KB CONSTRUCTION EVALUATION RESULTS ======================================================= Team ID: Stanford Organization: Stanford University ************************************************************* Run ID: Stanford_KB_ENG_1 Did the run access the live Web during the evaluation window: No Did the run extract relations from the Cold Start source corpus: Yes Did the run generate meaningful confidence values: Yes Entity Discovery Evaluation: ONLY English documents: Prec Recall F1 Metric 0.733 0.538 0.620 strong_mention_match 0.664 0.487 0.562 strong_typed_mention_match 0.000 0.000 0.000 entity_match 0.632 0.344 0.445 b_cubed 0.647 0.475 0.548 mention_ceaf 0.600 0.441 0.508 typed_mention_ceaf Slot Filling Evaluation: Metric RunID Hop Prec Recall F1 SF-ALL-Micro Stanford_KB_ENG_1 0 0.3551 0.1835 0.2420 SF-ALL-Micro Stanford_KB_ENG_1 1 0.1853 0.1054 0.1344 SF-ALL-Micro Stanford_KB_ENG_1 ALL 0.2941 0.1572 0.2049 SF-ALL-Macro Stanford_KB_ENG_1 0 0.2589 0.2517 0.2310 SF-ALL-Macro Stanford_KB_ENG_1 1 0.1297 0.1347 0.1243 SF-ALL-Macro Stanford_KB_ENG_1 ALL 0.2098 0.2073 0.1905 LDC-MAX-ALL-Micro Stanford_KB_ENG_1 0 0.3544 0.1830 0.2414 LDC-MAX-ALL-Micro Stanford_KB_ENG_1 1 0.1965 0.1104 0.1414 LDC-MAX-ALL-Micro Stanford_KB_ENG_1 ALL 0.2986 0.1587 0.2072 LDC-MAX-ALL-Macro Stanford_KB_ENG_1 0 0.2791 0.2582 0.2420 LDC-MAX-ALL-Macro Stanford_KB_ENG_1 1 0.1238 0.1322 0.1205 LDC-MAX-ALL-Macro Stanford_KB_ENG_1 ALL 0.2181 0.2087 0.1943 LDC-MEAN-ALL-Macro Stanford_KB_ENG_1 0 0.2562 0.2375 0.2217 LDC-MEAN-ALL-Macro Stanford_KB_ENG_1 1 0.1171 0.1256 0.1139 LDC-MEAN-ALL-Macro Stanford_KB_ENG_1 ALL 0.2016 0.1936 0.1794 *ALL-Macro Prec, Recall and F1 refer to mean-precision, mean-recall and mean-F1. NIL-DETECTION P/R/F1: 0.2995 0.9350 0.4537 ************************************************************* Run ID: Stanford_KB_ENG_2 Did the run access the live Web during the evaluation window: No Did the run extract relations from the Cold Start source corpus: Yes Did the run generate meaningful confidence values: Yes Entity Discovery Evaluation: ONLY English documents: Prec Recall F1 Metric 0.733 0.538 0.620 strong_mention_match 0.664 0.487 0.562 strong_typed_mention_match 0.000 0.000 0.000 entity_match 0.632 0.344 0.445 b_cubed 0.647 0.475 0.548 mention_ceaf 0.600 0.441 0.508 typed_mention_ceaf Slot Filling Evaluation: Metric RunID Hop Prec Recall F1 SF-ALL-Micro Stanford_KB_ENG_2 0 0.4757 0.1099 0.1785 SF-ALL-Micro Stanford_KB_ENG_2 1 0.4510 0.0564 0.1002 SF-ALL-Micro Stanford_KB_ENG_2 ALL 0.4703 0.0918 0.1536 SF-ALL-Macro Stanford_KB_ENG_2 0 0.1718 0.1514 0.1450 SF-ALL-Macro Stanford_KB_ENG_2 1 0.0673 0.0731 0.0678 SF-ALL-Macro Stanford_KB_ENG_2 ALL 0.1321 0.1216 0.1157 LDC-MAX-ALL-Micro Stanford_KB_ENG_2 0 0.5036 0.1127 0.1842 LDC-MAX-ALL-Micro Stanford_KB_ENG_2 1 0.4762 0.0649 0.1143 LDC-MAX-ALL-Micro Stanford_KB_ENG_2 ALL 0.4972 0.0967 0.1620 LDC-MAX-ALL-Macro Stanford_KB_ENG_2 0 0.1879 0.1621 0.1581 LDC-MAX-ALL-Macro Stanford_KB_ENG_2 1 0.0735 0.0779 0.0730 LDC-MAX-ALL-Macro Stanford_KB_ENG_2 ALL 0.1430 0.1291 0.1247 LDC-MEAN-ALL-Macro Stanford_KB_ENG_2 0 0.1742 0.1470 0.1441 LDC-MEAN-ALL-Macro Stanford_KB_ENG_2 1 0.0735 0.0779 0.0730 LDC-MEAN-ALL-Macro Stanford_KB_ENG_2 ALL 0.1347 0.1199 0.1162 *ALL-Macro Prec, Recall and F1 refer to mean-precision, mean-recall and mean-F1. NIL-DETECTION P/R/F1: 0.3103 0.9837 0.4718 ************************************************************* Run ID: Stanford_KB_ENG_3 Did the run access the live Web during the evaluation window: No Did the run extract relations from the Cold Start source corpus: Yes Did the run generate meaningful confidence values: Yes Entity Discovery Evaluation: ONLY English documents: Prec Recall F1 Metric 0.732 0.538 0.620 strong_mention_match 0.664 0.488 0.563 strong_typed_mention_match 0.000 0.000 0.000 entity_match 0.631 0.344 0.445 b_cubed 0.647 0.475 0.548 mention_ceaf 0.601 0.441 0.508 typed_mention_ceaf Slot Filling Evaluation: Metric RunID Hop Prec Recall F1 SF-ALL-Micro Stanford_KB_ENG_3 0 0.2625 0.2697 0.2660 SF-ALL-Micro Stanford_KB_ENG_3 1 0.0913 0.1716 0.1191 SF-ALL-Micro Stanford_KB_ENG_3 ALL 0.1799 0.2366 0.2044 SF-ALL-Macro Stanford_KB_ENG_3 0 0.2493 0.3055 0.2518 SF-ALL-Macro Stanford_KB_ENG_3 1 0.1882 0.2132 0.1890 SF-ALL-Macro Stanford_KB_ENG_3 ALL 0.2261 0.2705 0.2279 LDC-MAX-ALL-Micro Stanford_KB_ENG_3 0 0.2739 0.2810 0.2774 LDC-MAX-ALL-Micro Stanford_KB_ENG_3 1 0.0858 0.1786 0.1159 LDC-MAX-ALL-Micro Stanford_KB_ENG_3 ALL 0.1789 0.2467 0.2074 LDC-MAX-ALL-Macro Stanford_KB_ENG_3 0 0.2754 0.3240 0.2722 LDC-MAX-ALL-Macro Stanford_KB_ENG_3 1 0.1773 0.1991 0.1769 LDC-MAX-ALL-Macro Stanford_KB_ENG_3 ALL 0.2369 0.2749 0.2348 LDC-MEAN-ALL-Macro Stanford_KB_ENG_3 0 0.2543 0.3022 0.2523 LDC-MEAN-ALL-Macro Stanford_KB_ENG_3 1 0.1706 0.1924 0.1702 LDC-MEAN-ALL-Macro Stanford_KB_ENG_3 ALL 0.2214 0.2591 0.2201 *ALL-Macro Prec, Recall and F1 refer to mean-precision, mean-recall and mean-F1. NIL-DETECTION P/R/F1: 0.2902 0.8943 0.4382