========================================================================= TAC KBP 2016 SLOT FILLER VALIDATION ENGLISH ENSEMBLING EVALUATION RESULTS ========================================================================= Team ID: IRTSX Organization: IRT-SystemX & LIMSI ************************************************************* Run ID: IRTSX1 Did the run access the live Web during the evaluation window: No Did this run judge each candidate slot filler independently of all other candidate slot fillers in the evaluation dataset: No Did this run judge candidate slot fillers for each slot-filling run independently of all other slot-filling runs in the evaluation dataset: No Did this run judge candidate slot fillers for each slot-filling team independently of all other slot-filling teams in the evaluation dataset: Yes Did this run make use of the slot filler or justification offsets provided for each candidate slot filler: Yes Did this run make use of the confidence values provided for each candidate slot filler: Yes Did this run make use of the system profiles for the slot filling runs: Yes Did this run make use of the preliminary assessments provided for some of the slot filler candidates: No CSSF micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F IRTSX1.ENG.ensemble 0.2469 0.3205 0.2789 0.1489 0.1437 0.1463 0.2202 0.2613 0.2390 BBN_KB_ENG_4 (best hop0 F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 BBN_KB_ENG_1 (best hop1 F1 input) 0.4657 0.2408 0.3174 0.2542 0.1320 0.1737 0.3947 0.2043 0.2693 BBN_KB_ENG_4 (best ALL F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 MAX micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F IRTSX1.ENG.ensemble 0.2659 0.3282 0.2938 0.1402 0.1467 0.1434 0.2287 0.2679 0.2468 BBN_KB_ENG_4 (best hop0 F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 BBN_KB_ENG_1 (best hop1 F1 input) 0.4838 0.2572 0.3358 0.2252 0.1313 0.1659 0.3925 0.2154 0.2781 BBN_KB_ENG_4 (best ALL F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 MEAN macro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F IRTSX1.ENG.ensemble 0.2618 0.3661 0.2695 0.1306 0.1665 0.1377 0.2100 0.2873 0.2174 Stanford_SF_ENG_3 (best hop0 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best hop1 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best ALL F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 ************************************************************* Run ID: IRTSX2 Did the run access the live Web during the evaluation window: No Did this run judge each candidate slot filler independently of all other candidate slot fillers in the evaluation dataset: No Did this run judge candidate slot fillers for each slot-filling run independently of all other slot-filling runs in the evaluation dataset: No Did this run judge candidate slot fillers for each slot-filling team independently of all other slot-filling teams in the evaluation dataset: Yes Did this run make use of the slot filler or justification offsets provided for each candidate slot filler: Yes Did this run make use of the confidence values provided for each candidate slot filler: Yes Did this run make use of the system profiles for the slot filling runs: Yes Did this run make use of the preliminary assessments provided for some of the slot filler candidates: No CSSF micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F IRTSX2.ENG.ensemble 0.2644 0.3397 0.2973 0.1348 0.1466 0.1404 0.2256 0.2750 0.2479 BBN_KB_ENG_4 (best hop0 F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 BBN_KB_ENG_1 (best hop1 F1 input) 0.4657 0.2408 0.3174 0.2542 0.1320 0.1737 0.3947 0.2043 0.2693 BBN_KB_ENG_4 (best ALL F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 MAX micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F IRTSX2.ENG.ensemble 0.2736 0.3397 0.3031 0.1284 0.1467 0.1369 0.2280 0.2756 0.2496 BBN_KB_ENG_4 (best hop0 F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 BBN_KB_ENG_1 (best hop1 F1 input) 0.4838 0.2572 0.3358 0.2252 0.1313 0.1659 0.3925 0.2154 0.2781 BBN_KB_ENG_4 (best ALL F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 MEAN macro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F IRTSX2.ENG.ensemble 0.3214 0.4234 0.3314 0.1380 0.1703 0.1398 0.2489 0.3234 0.2557 Stanford_SF_ENG_3 (best hop0 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best hop1 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best ALL F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 ************************************************************* Run ID: IRTSX3 Did the run access the live Web during the evaluation window: No Did this run judge each candidate slot filler independently of all other candidate slot fillers in the evaluation dataset: No Did this run judge candidate slot fillers for each slot-filling run independently of all other slot-filling runs in the evaluation dataset: No Did this run judge candidate slot fillers for each slot-filling team independently of all other slot-filling teams in the evaluation dataset: Yes Did this run make use of the slot filler or justification offsets provided for each candidate slot filler: Yes Did this run make use of the confidence values provided for each candidate slot filler: Yes Did this run make use of the system profiles for the slot filling runs: Yes Did this run make use of the preliminary assessments provided for some of the slot filler candidates: No CSSF micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F IRTSX3.ENG.ensemble 0.1996 0.2747 0.2312 0.1082 0.1085 0.1083 0.1750 0.2191 0.1946 BBN_KB_ENG_4 (best hop0 F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 BBN_KB_ENG_1 (best hop1 F1 input) 0.4657 0.2408 0.3174 0.2542 0.1320 0.1737 0.3947 0.2043 0.2693 BBN_KB_ENG_4 (best ALL F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 MAX micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F IRTSX3.ENG.ensemble 0.2149 0.2821 0.2440 0.1042 0.1158 0.1097 0.1821 0.2269 0.2021 BBN_KB_ENG_4 (best hop0 F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 BBN_KB_ENG_1 (best hop1 F1 input) 0.4838 0.2572 0.3358 0.2252 0.1313 0.1659 0.3925 0.2154 0.2781 BBN_KB_ENG_4 (best ALL F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 MEAN macro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F IRTSX3.ENG.ensemble 0.2518 0.3514 0.2568 0.1112 0.1270 0.1111 0.1962 0.2627 0.1992 Stanford_SF_ENG_3 (best hop0 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best hop1 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best ALL F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 ************************************************************* Run ID: IRTSX4 Did the run access the live Web during the evaluation window: No Did this run judge each candidate slot filler independently of all other candidate slot fillers in the evaluation dataset: No Did this run judge candidate slot fillers for each slot-filling run independently of all other slot-filling runs in the evaluation dataset: No Did this run judge candidate slot fillers for each slot-filling team independently of all other slot-filling teams in the evaluation dataset: Yes Did this run make use of the slot filler or justification offsets provided for each candidate slot filler: Yes Did this run make use of the confidence values provided for each candidate slot filler: Yes Did this run make use of the system profiles for the slot filling runs: Yes Did this run make use of the preliminary assessments provided for some of the slot filler candidates: No CSSF micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F IRTSX4.ENG.ensemble 0.2047 0.3840 0.2671 0.0523 0.1261 0.0739 0.1448 0.2976 0.1949 BBN_KB_ENG_4 (best hop0 F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 BBN_KB_ENG_1 (best hop1 F1 input) 0.4657 0.2408 0.3174 0.2542 0.1320 0.1737 0.3947 0.2043 0.2693 BBN_KB_ENG_4 (best ALL F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 MAX micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F IRTSX4.ENG.ensemble 0.2173 0.3896 0.2790 0.0466 0.1236 0.0677 0.1450 0.3013 0.1958 BBN_KB_ENG_4 (best hop0 F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 BBN_KB_ENG_1 (best hop1 F1 input) 0.4838 0.2572 0.3358 0.2252 0.1313 0.1659 0.3925 0.2154 0.2781 BBN_KB_ENG_4 (best ALL F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 MEAN macro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F IRTSX4.ENG.ensemble 0.3750 0.4796 0.3856 0.1482 0.1646 0.1483 0.2854 0.3551 0.2918 Stanford_SF_ENG_3 (best hop0 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best hop1 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best ALL F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244