============================================================= TAC KBP 2016 CROSS-LINGUAL KB CONSTRUCTION EVALUATION RESULTS ============================================================= Team ID: Stanford Organization: Stanford University ************************************************************* Run ID: Stanford_KB_XLING_1 Did the run access the live Web during the evaluation window: No Did the run extract relations from the Cold Start source corpus: Yes Did the run generate meaningful confidence values: Yes Run number of the English KB system that is most closely configured to the English component of this run: 1 Run number of the Spanish KB system that is most closely configured to the Spanish component of this run: NA Run number of the Chinese KB system that is most closely configured to the Chinese component of this run: 1 Entity Discovery Evaluation: ALL English, Chinese, and Spanish documents: Prec Recall F1 Metric 0.750 0.350 0.478 strong_mention_match 0.677 0.316 0.431 strong_typed_mention_match 0.000 0.000 0.000 entity_match 0.667 0.173 0.275 b_cubed 0.663 0.310 0.422 mention_ceaf 0.607 0.283 0.386 typed_mention_ceaf ONLY English documents: Prec Recall F1 Metric 0.733 0.538 0.620 strong_mention_match 0.664 0.487 0.562 strong_typed_mention_match 0.000 0.000 0.000 entity_match 0.632 0.344 0.445 b_cubed 0.647 0.475 0.548 mention_ceaf 0.600 0.441 0.508 typed_mention_ceaf ONLY Chinese documents: Prec Recall F1 Metric 0.774 0.431 0.554 strong_mention_match 0.694 0.386 0.496 strong_typed_mention_match 0.000 0.000 0.000 entity_match 0.728 0.273 0.397 b_cubed 0.734 0.408 0.524 mention_ceaf 0.661 0.368 0.472 typed_mention_ceaf ONLY Spanish documents: Prec Recall F1 Metric 0.000 0.000 0.000 strong_mention_match 0.000 0.000 0.000 strong_typed_mention_match 0.000 0.000 0.000 entity_match 0.000 0.000 0.000 b_cubed 0.000 0.000 0.000 mention_ceaf 0.000 0.000 0.000 typed_mention_ceaf Slot Filling Evaluation: Metric RunID Hop Prec Recall F1 SF-ALL-Micro Stanford_KB_XLING_1 0 0.4017 0.0684 0.1169 SF-ALL-Micro Stanford_KB_XLING_1 1 0.1890 0.0351 0.0592 SF-ALL-Micro Stanford_KB_XLING_1 ALL 0.3286 0.0576 0.0980 SF-ALL-Macro Stanford_KB_XLING_1 0 0.1317 0.1065 0.1030 SF-ALL-Macro Stanford_KB_XLING_1 1 0.0394 0.0383 0.0374 SF-ALL-Macro Stanford_KB_XLING_1 ALL 0.0845 0.0716 0.0694 LDC-MAX-ALL-Micro Stanford_KB_XLING_1 0 0.4626 0.1332 0.2068 LDC-MAX-ALL-Micro Stanford_KB_XLING_1 1 0.2323 0.0754 0.1139 LDC-MAX-ALL-Micro Stanford_KB_XLING_1 ALL 0.3791 0.1138 0.1751 LDC-MAX-ALL-Macro Stanford_KB_XLING_1 0 0.2444 0.1922 0.1912 LDC-MAX-ALL-Macro Stanford_KB_XLING_1 1 0.0848 0.0835 0.0816 LDC-MAX-ALL-Macro Stanford_KB_XLING_1 ALL 0.1654 0.1384 0.1370 LDC-MEAN-ALL-Macro Stanford_KB_XLING_1 0 0.1371 0.1149 0.1116 LDC-MEAN-ALL-Macro Stanford_KB_XLING_1 1 0.0478 0.0455 0.0446 LDC-MEAN-ALL-Macro Stanford_KB_XLING_1 ALL 0.0929 0.0806 0.0785 *ALL-Macro Prec, Recall and F1 refer to mean-precision, mean-recall and mean-F1. NIL-DETECTION P/R/F1: 0.2995 0.9350 0.4537 ************************************************************* Run ID: Stanford_KB_XLING_2 Did the run access the live Web during the evaluation window: No Did the run extract relations from the Cold Start source corpus: Yes Did the run generate meaningful confidence values: Yes Run number of the English KB system that is most closely configured to the English component of this run: 3 Run number of the Spanish KB system that is most closely configured to the Spanish component of this run: NA Run number of the Chinese KB system that is most closely configured to the Chinese component of this run: 2 Entity Discovery Evaluation: ALL English, Chinese, and Spanish documents: Prec Recall F1 Metric 0.751 0.350 0.478 strong_mention_match 0.677 0.316 0.431 strong_typed_mention_match 0.000 0.000 0.000 entity_match 0.667 0.173 0.275 b_cubed 0.663 0.310 0.422 mention_ceaf 0.607 0.283 0.387 typed_mention_ceaf ONLY English documents: Prec Recall F1 Metric 0.733 0.538 0.620 strong_mention_match 0.664 0.487 0.562 strong_typed_mention_match 0.000 0.000 0.000 entity_match 0.632 0.344 0.445 b_cubed 0.647 0.475 0.548 mention_ceaf 0.600 0.441 0.508 typed_mention_ceaf ONLY Chinese documents: Prec Recall F1 Metric 0.775 0.431 0.554 strong_mention_match 0.695 0.386 0.497 strong_typed_mention_match 0.000 0.000 0.000 entity_match 0.729 0.273 0.397 b_cubed 0.734 0.408 0.525 mention_ceaf 0.661 0.368 0.473 typed_mention_ceaf ONLY Spanish documents: Prec Recall F1 Metric 0.000 0.000 0.000 strong_mention_match 0.000 0.000 0.000 strong_typed_mention_match 0.000 0.000 0.000 entity_match 0.000 0.000 0.000 b_cubed 0.000 0.000 0.000 mention_ceaf 0.000 0.000 0.000 typed_mention_ceaf Slot Filling Evaluation: Metric RunID Hop Prec Recall F1 SF-ALL-Micro Stanford_KB_XLING_2 0 0.1855 0.1302 0.1530 SF-ALL-Micro Stanford_KB_XLING_2 1 0.0297 0.0672 0.0412 SF-ALL-Micro Stanford_KB_XLING_2 ALL 0.0909 0.1098 0.0995 SF-ALL-Macro Stanford_KB_XLING_2 0 0.1122 0.1224 0.1030 SF-ALL-Macro Stanford_KB_XLING_2 1 0.0630 0.0772 0.0647 SF-ALL-Macro Stanford_KB_XLING_2 ALL 0.0870 0.0992 0.0834 LDC-MAX-ALL-Micro Stanford_KB_XLING_2 0 0.2076 0.2440 0.2243 LDC-MAX-ALL-Micro Stanford_KB_XLING_2 1 0.0411 0.1311 0.0626 LDC-MAX-ALL-Micro Stanford_KB_XLING_2 ALL 0.1113 0.2062 0.1446 LDC-MAX-ALL-Macro Stanford_KB_XLING_2 0 0.2132 0.2235 0.1930 LDC-MAX-ALL-Macro Stanford_KB_XLING_2 1 0.1291 0.1559 0.1328 LDC-MAX-ALL-Macro Stanford_KB_XLING_2 ALL 0.1716 0.1901 0.1632 LDC-MEAN-ALL-Macro Stanford_KB_XLING_2 0 0.1224 0.1333 0.1144 LDC-MEAN-ALL-Macro Stanford_KB_XLING_2 1 0.0726 0.0888 0.0738 LDC-MEAN-ALL-Macro Stanford_KB_XLING_2 ALL 0.0977 0.1113 0.0944 *ALL-Macro Prec, Recall and F1 refer to mean-precision, mean-recall and mean-F1. NIL-DETECTION P/R/F1: 0.2690 0.8049 0.4032 ************************************************************* Run ID: Stanford_KB_XLING_3 Did the run access the live Web during the evaluation window: No Did the run extract relations from the Cold Start source corpus: Yes Did the run generate meaningful confidence values: Yes Run number of the English KB system that is most closely configured to the English component of this run: 2 Run number of the Spanish KB system that is most closely configured to the Spanish component of this run: NA Run number of the Chinese KB system that is most closely configured to the Chinese component of this run: 1 Entity Discovery Evaluation: ALL English, Chinese, and Spanish documents: Prec Recall F1 Metric 0.750 0.350 0.478 strong_mention_match 0.677 0.316 0.431 strong_typed_mention_match 0.000 0.000 0.000 entity_match 0.667 0.173 0.275 b_cubed 0.663 0.310 0.422 mention_ceaf 0.607 0.283 0.386 typed_mention_ceaf ONLY English documents: Prec Recall F1 Metric 0.733 0.538 0.620 strong_mention_match 0.664 0.487 0.562 strong_typed_mention_match 0.000 0.000 0.000 entity_match 0.632 0.344 0.445 b_cubed 0.647 0.475 0.548 mention_ceaf 0.600 0.441 0.508 typed_mention_ceaf ONLY Chinese documents: Prec Recall F1 Metric 0.774 0.431 0.554 strong_mention_match 0.694 0.386 0.496 strong_typed_mention_match 0.000 0.000 0.000 entity_match 0.728 0.273 0.397 b_cubed 0.734 0.408 0.524 mention_ceaf 0.661 0.368 0.472 typed_mention_ceaf ONLY Spanish documents: Prec Recall F1 Metric 0.000 0.000 0.000 strong_mention_match 0.000 0.000 0.000 strong_typed_mention_match 0.000 0.000 0.000 entity_match 0.000 0.000 0.000 b_cubed 0.000 0.000 0.000 mention_ceaf 0.000 0.000 0.000 typed_mention_ceaf Slot Filling Evaluation: Metric RunID Hop Prec Recall F1 SF-ALL-Micro Stanford_KB_XLING_3 0 0.5277 0.0489 0.0894 SF-ALL-Micro Stanford_KB_XLING_3 1 0.4000 0.0163 0.0313 SF-ALL-Micro Stanford_KB_XLING_3 ALL 0.5054 0.0383 0.0712 SF-ALL-Macro Stanford_KB_XLING_3 0 0.0979 0.0707 0.0720 SF-ALL-Macro Stanford_KB_XLING_3 1 0.0215 0.0224 0.0215 SF-ALL-Macro Stanford_KB_XLING_3 ALL 0.0588 0.0460 0.0462 LDC-MAX-ALL-Micro Stanford_KB_XLING_3 0 0.5616 0.0943 0.1615 LDC-MAX-ALL-Micro Stanford_KB_XLING_3 1 0.4800 0.0393 0.0727 LDC-MAX-ALL-Micro Stanford_KB_XLING_3 ALL 0.5455 0.0759 0.1332 LDC-MAX-ALL-Macro Stanford_KB_XLING_3 0 0.1809 0.1325 0.1352 LDC-MAX-ALL-Macro Stanford_KB_XLING_3 1 0.0520 0.0522 0.0511 LDC-MAX-ALL-Macro Stanford_KB_XLING_3 ALL 0.1171 0.0928 0.0936 LDC-MEAN-ALL-Macro Stanford_KB_XLING_3 0 0.1016 0.0775 0.0785 LDC-MEAN-ALL-Macro Stanford_KB_XLING_3 1 0.0281 0.0274 0.0269 LDC-MEAN-ALL-Macro Stanford_KB_XLING_3 ALL 0.0652 0.0527 0.0530 *ALL-Macro Prec, Recall and F1 refer to mean-precision, mean-recall and mean-F1. NIL-DETECTION P/R/F1: 0.3085 0.9756 0.4688 ************************************************************* Run ID: Stanford_KB_XLING_4 Did the run access the live Web during the evaluation window: No Did the run extract relations from the Cold Start source corpus: Yes Did the run generate meaningful confidence values: Yes Run number of the English KB system that is most closely configured to the English component of this run: 1 Run number of the Spanish KB system that is most closely configured to the Spanish component of this run: NA Run number of the Chinese KB system that is most closely configured to the Chinese component of this run: 2 Entity Discovery Evaluation: ALL English, Chinese, and Spanish documents: Prec Recall F1 Metric 0.750 0.350 0.478 strong_mention_match 0.677 0.316 0.431 strong_typed_mention_match 0.000 0.000 0.000 entity_match 0.667 0.173 0.275 b_cubed 0.663 0.310 0.422 mention_ceaf 0.607 0.284 0.387 typed_mention_ceaf ONLY English documents: Prec Recall F1 Metric 0.732 0.538 0.620 strong_mention_match 0.664 0.488 0.563 strong_typed_mention_match 0.000 0.000 0.000 entity_match 0.631 0.344 0.445 b_cubed 0.647 0.475 0.548 mention_ceaf 0.601 0.441 0.508 typed_mention_ceaf ONLY Chinese documents: Prec Recall F1 Metric 0.775 0.431 0.554 strong_mention_match 0.695 0.386 0.497 strong_typed_mention_match 0.000 0.000 0.000 entity_match 0.729 0.273 0.397 b_cubed 0.734 0.408 0.525 mention_ceaf 0.661 0.368 0.473 typed_mention_ceaf ONLY Spanish documents: Prec Recall F1 Metric 0.000 0.000 0.000 strong_mention_match 0.000 0.000 0.000 strong_typed_mention_match 0.000 0.000 0.000 entity_match 0.000 0.000 0.000 b_cubed 0.000 0.000 0.000 mention_ceaf 0.000 0.000 0.000 typed_mention_ceaf Slot Filling Evaluation: Metric RunID Hop Prec Recall F1 SF-ALL-Micro Stanford_KB_XLING_4 0 0.1934 0.1622 0.1764 SF-ALL-Micro Stanford_KB_XLING_4 1 0.0311 0.0891 0.0461 SF-ALL-Micro Stanford_KB_XLING_4 ALL 0.0927 0.1385 0.1110 SF-ALL-Macro Stanford_KB_XLING_4 0 0.1164 0.1492 0.1167 SF-ALL-Macro Stanford_KB_XLING_4 1 0.0826 0.1049 0.0867 SF-ALL-Macro Stanford_KB_XLING_4 ALL 0.0991 0.1265 0.1014 LDC-MAX-ALL-Micro Stanford_KB_XLING_4 0 0.2024 0.3036 0.2429 LDC-MAX-ALL-Micro Stanford_KB_XLING_4 1 0.0271 0.1738 0.0470 LDC-MAX-ALL-Micro Stanford_KB_XLING_4 ALL 0.0827 0.2600 0.1255 LDC-MAX-ALL-Macro Stanford_KB_XLING_4 0 0.2189 0.2719 0.2179 LDC-MAX-ALL-Macro Stanford_KB_XLING_4 1 0.1593 0.2009 0.1669 LDC-MAX-ALL-Macro Stanford_KB_XLING_4 ALL 0.1894 0.2368 0.1927 LDC-MEAN-ALL-Macro Stanford_KB_XLING_4 0 0.1287 0.1614 0.1296 LDC-MEAN-ALL-Macro Stanford_KB_XLING_4 1 0.0890 0.1151 0.0932 LDC-MEAN-ALL-Macro Stanford_KB_XLING_4 ALL 0.1091 0.1385 0.1116 *ALL-Macro Prec, Recall and F1 refer to mean-precision, mean-recall and mean-F1. NIL-DETECTION P/R/F1: 0.2590 0.7642 0.3869