==================================================== TAC KBP 2017 ENGLISH SLOT FILLING EVALUATION RESULTS ==================================================== Team ID: STANFORD Organization: Stanford University ************************************************************* Run ID: STANFORD_SF_ENG_1 Did the run access the live Web during the evaluation window: No Did the run extract relations from the Cold Start source corpus: Yes Did the run generate meaningful confidence values: Yes Slot Filling Evaluation (Queries involve ONLY SF slots): Scores based on P/R/F1; only k=1 justification allowed: Metric RunID Hop Prec Recall F1 SF-ALL-Micro STANFORD_SF_ENG_1 0 0.3203 0.2686 0.2922 SF-ALL-Micro STANFORD_SF_ENG_1 1 0.0688 0.1858 0.1004 SF-ALL-Micro STANFORD_SF_ENG_1 ALL 0.1935 0.2487 0.2176 SF-ALL-Macro STANFORD_SF_ENG_1 0 0.2366 0.2913 0.2348 SF-ALL-Macro STANFORD_SF_ENG_1 1 0.1730 0.1924 0.1668 SF-ALL-Macro STANFORD_SF_ENG_1 ALL 0.2114 0.2521 0.2079 LDC-MAX-ALL-Micro STANFORD_SF_ENG_1 0 0.3517 0.2920 0.3191 LDC-MAX-ALL-Micro STANFORD_SF_ENG_1 1 0.0795 0.2089 0.1152 LDC-MAX-ALL-Micro STANFORD_SF_ENG_1 ALL 0.2188 0.2727 0.2428 LDC-MAX-ALL-Macro STANFORD_SF_ENG_1 0 0.2868 0.3451 0.2847 LDC-MAX-ALL-Macro STANFORD_SF_ENG_1 1 0.1916 0.2162 0.1868 LDC-MAX-ALL-Macro STANFORD_SF_ENG_1 ALL 0.2498 0.2950 0.2467 LDC-MEAN-ALL-Macro STANFORD_SF_ENG_1 0 0.2570 0.3182 0.2573 LDC-MEAN-ALL-Macro STANFORD_SF_ENG_1 1 0.1802 0.2061 0.1774 LDC-MEAN-ALL-Macro STANFORD_SF_ENG_1 ALL 0.2272 0.2747 0.2262 *ALL-Macro Prec, Recall and F1 refer to mean-precision, mean-recall and mean-F1. Scores based on confidence values and Average Precision (AP); only k=1 justification allowed: Metric RunID Hop AP SF-ALL-Macro STANFORD_SF_ENG_1 0 0.2500 SF-ALL-Macro STANFORD_SF_ENG_1 1 0.0685 SF-ALL-Macro STANFORD_SF_ENG_1 ALL 0.1952 LDC-MEAN-ALL-Macro STANFORD_SF_ENG_1 0 0.2712 LDC-MEAN-ALL-Macro STANFORD_SF_ENG_1 1 0.0797 LDC-MEAN-ALL-Macro STANFORD_SF_ENG_1 ALL 0.2157 (PRIMARY METRIC) *ALL-Macro AP refer to mean of corresponding AP values. ************************************************************* Run ID: STANFORD_SF_ENG_2 Did the run access the live Web during the evaluation window: No Did the run extract relations from the Cold Start source corpus: Yes Did the run generate meaningful confidence values: Yes Slot Filling Evaluation (Queries involve ONLY SF slots): Scores based on P/R/F1; only k=1 justification allowed: Metric RunID Hop Prec Recall F1 SF-ALL-Micro STANFORD_SF_ENG_2 0 0.3571 0.2288 0.2789 SF-ALL-Micro STANFORD_SF_ENG_2 1 0.1291 0.1393 0.1340 SF-ALL-Micro STANFORD_SF_ENG_2 ALL 0.2779 0.2073 0.2375 SF-ALL-Macro STANFORD_SF_ENG_2 0 0.2217 0.2397 0.2097 SF-ALL-Macro STANFORD_SF_ENG_2 1 0.1485 0.1485 0.1401 SF-ALL-Macro STANFORD_SF_ENG_2 ALL 0.1927 0.2036 0.1821 LDC-MAX-ALL-Micro STANFORD_SF_ENG_2 0 0.3957 0.2462 0.3035 LDC-MAX-ALL-Micro STANFORD_SF_ENG_2 1 0.1491 0.1519 0.1505 LDC-MAX-ALL-Micro STANFORD_SF_ENG_2 ALL 0.3142 0.2243 0.2618 LDC-MAX-ALL-Macro STANFORD_SF_ENG_2 0 0.2642 0.2816 0.2492 LDC-MAX-ALL-Macro STANFORD_SF_ENG_2 1 0.1591 0.1596 0.1511 LDC-MAX-ALL-Macro STANFORD_SF_ENG_2 ALL 0.2234 0.2342 0.2111 LDC-MEAN-ALL-Macro STANFORD_SF_ENG_2 0 0.2397 0.2637 0.2296 LDC-MEAN-ALL-Macro STANFORD_SF_ENG_2 1 0.1463 0.1479 0.1399 LDC-MEAN-ALL-Macro STANFORD_SF_ENG_2 ALL 0.2034 0.2188 0.1947 *ALL-Macro Prec, Recall and F1 refer to mean-precision, mean-recall and mean-F1. Scores based on confidence values and Average Precision (AP); only k=1 justification allowed: Metric RunID Hop AP SF-ALL-Macro STANFORD_SF_ENG_2 0 0.2142 SF-ALL-Macro STANFORD_SF_ENG_2 1 0.0726 SF-ALL-Macro STANFORD_SF_ENG_2 ALL 0.1715 LDC-MEAN-ALL-Macro STANFORD_SF_ENG_2 0 0.2348 LDC-MEAN-ALL-Macro STANFORD_SF_ENG_2 1 0.0693 LDC-MEAN-ALL-Macro STANFORD_SF_ENG_2 ALL 0.1903 (PRIMARY METRIC) *ALL-Macro AP refer to mean of corresponding AP values. ************************************************************* Run ID: STANFORD_SF_ENG_3 Did the run access the live Web during the evaluation window: No Did the run extract relations from the Cold Start source corpus: Yes Did the run generate meaningful confidence values: Yes Slot Filling Evaluation (Queries involve ONLY SF slots): Scores based on P/R/F1; only k=1 justification allowed: Metric RunID Hop Prec Recall F1 SF-ALL-Micro STANFORD_SF_ENG_3 0 0.5087 0.2012 0.2884 SF-ALL-Micro STANFORD_SF_ENG_3 1 0.5161 0.1311 0.2092 SF-ALL-Micro STANFORD_SF_ENG_3 ALL 0.5100 0.1844 0.2708 SF-ALL-Macro STANFORD_SF_ENG_3 0 0.2256 0.1979 0.1908 SF-ALL-Macro STANFORD_SF_ENG_3 1 0.1336 0.1333 0.1284 SF-ALL-Macro STANFORD_SF_ENG_3 ALL 0.1891 0.1723 0.1661 LDC-MAX-ALL-Micro STANFORD_SF_ENG_3 0 0.5275 0.2195 0.3100 LDC-MAX-ALL-Micro STANFORD_SF_ENG_3 1 0.5250 0.1329 0.2121 LDC-MAX-ALL-Micro STANFORD_SF_ENG_3 ALL 0.5271 0.1994 0.2894 LDC-MAX-ALL-Macro STANFORD_SF_ENG_3 0 0.2588 0.2306 0.2246 LDC-MAX-ALL-Macro STANFORD_SF_ENG_3 1 0.1384 0.1313 0.1294 LDC-MAX-ALL-Macro STANFORD_SF_ENG_3 ALL 0.2120 0.1921 0.1877 LDC-MEAN-ALL-Macro STANFORD_SF_ENG_3 0 0.2430 0.2176 0.2097 LDC-MEAN-ALL-Macro STANFORD_SF_ENG_3 1 0.1302 0.1243 0.1229 LDC-MEAN-ALL-Macro STANFORD_SF_ENG_3 ALL 0.1992 0.1814 0.1760 *ALL-Macro Prec, Recall and F1 refer to mean-precision, mean-recall and mean-F1. Scores based on confidence values and Average Precision (AP); only k=1 justification allowed: Metric RunID Hop AP SF-ALL-Macro STANFORD_SF_ENG_3 0 0.1827 SF-ALL-Macro STANFORD_SF_ENG_3 1 0.0665 SF-ALL-Macro STANFORD_SF_ENG_3 ALL 0.1493 LDC-MEAN-ALL-Macro STANFORD_SF_ENG_3 0 0.1999 LDC-MEAN-ALL-Macro STANFORD_SF_ENG_3 1 0.0573 LDC-MEAN-ALL-Macro STANFORD_SF_ENG_3 ALL 0.1638 (PRIMARY METRIC) *ALL-Macro AP refer to mean of corresponding AP values. ************************************************************* Run ID: STANFORD_SF_ENG_4 Did the run access the live Web during the evaluation window: No Did the run extract relations from the Cold Start source corpus: Yes Did the run generate meaningful confidence values: Yes Slot Filling Evaluation (Queries involve ONLY SF slots): Scores based on P/R/F1; only k=1 justification allowed: Metric RunID Hop Prec Recall F1 SF-ALL-Micro STANFORD_SF_ENG_4 0 0.3505 0.2522 0.2933 SF-ALL-Micro STANFORD_SF_ENG_4 1 0.1115 0.1721 0.1353 SF-ALL-Micro STANFORD_SF_ENG_4 ALL 0.2539 0.2329 0.2430 SF-ALL-Macro STANFORD_SF_ENG_4 0 0.2490 0.2817 0.2403 SF-ALL-Macro STANFORD_SF_ENG_4 1 0.1858 0.1800 0.1734 SF-ALL-Macro STANFORD_SF_ENG_4 ALL 0.2240 0.2414 0.2137 LDC-MAX-ALL-Micro STANFORD_SF_ENG_4 0 0.3840 0.2748 0.3204 LDC-MAX-ALL-Micro STANFORD_SF_ENG_4 1 0.1111 0.1962 0.1419 LDC-MAX-ALL-Micro STANFORD_SF_ENG_4 ALL 0.2676 0.2566 0.2620 LDC-MAX-ALL-Macro STANFORD_SF_ENG_4 0 0.3007 0.3367 0.2905 LDC-MAX-ALL-Macro STANFORD_SF_ENG_4 1 0.2050 0.2091 0.1919 LDC-MAX-ALL-Macro STANFORD_SF_ENG_4 ALL 0.2635 0.2871 0.2522 LDC-MEAN-ALL-Macro STANFORD_SF_ENG_4 0 0.2701 0.3098 0.2634 LDC-MEAN-ALL-Macro STANFORD_SF_ENG_4 1 0.1924 0.1927 0.1807 LDC-MEAN-ALL-Macro STANFORD_SF_ENG_4 ALL 0.2399 0.2643 0.2313 *ALL-Macro Prec, Recall and F1 refer to mean-precision, mean-recall and mean-F1. Scores based on confidence values and Average Precision (AP); only k=1 justification allowed: Metric RunID Hop AP SF-ALL-Macro STANFORD_SF_ENG_4 0 0.2512 SF-ALL-Macro STANFORD_SF_ENG_4 1 0.0760 SF-ALL-Macro STANFORD_SF_ENG_4 ALL 0.1970 LDC-MEAN-ALL-Macro STANFORD_SF_ENG_4 0 0.2738 LDC-MEAN-ALL-Macro STANFORD_SF_ENG_4 1 0.0934 LDC-MEAN-ALL-Macro STANFORD_SF_ENG_4 ALL 0.2186 (PRIMARY METRIC) *ALL-Macro AP refer to mean of corresponding AP values.