========================================================================= TAC KBP 2016 SLOT FILLER VALIDATION ENGLISH ENSEMBLING EVALUATION RESULTS ========================================================================= Team ID: SAFT_ISI Organization: USC Information Sciences Institute ************************************************************* Run ID: SAFT_ISI1 Did the run access the live Web during the evaluation window: No Did this run judge each candidate slot filler independently of all other candidate slot fillers in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling run independently of all other slot-filling runs in the evaluation dataset: No Did this run judge candidate slot fillers for each slot-filling team independently of all other slot-filling teams in the evaluation dataset: No Did this run make use of the slot filler or justification offsets provided for each candidate slot filler: Yes Did this run make use of the confidence values provided for each candidate slot filler: No Did this run make use of the system profiles for the slot filling runs: No Did this run make use of the preliminary assessments provided for some of the slot filler candidates: Yes CSSF micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F SAFT_ISI1.ENG.ensemble 0.3952 0.3146 0.3503 0.1849 0.1290 0.1520 0.3308 0.2525 0.2864 BBN_KB_ENG_4 (best hop0 F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 BBN_KB_ENG_1 (best hop1 F1 input) 0.4657 0.2408 0.3174 0.2542 0.1320 0.1737 0.3947 0.2043 0.2693 BBN_KB_ENG_4 (best ALL F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 MAX micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F SAFT_ISI1.ENG.ensemble 0.4060 0.3109 0.3522 0.1675 0.1236 0.1422 0.3288 0.2487 0.2832 BBN_KB_ENG_4 (best hop0 F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 BBN_KB_ENG_1 (best hop1 F1 input) 0.4838 0.2572 0.3358 0.2252 0.1313 0.1659 0.3925 0.2154 0.2781 BBN_KB_ENG_4 (best ALL F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 MEAN macro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F SAFT_ISI1.ENG.ensemble 0.3045 0.3259 0.2880 0.0721 0.1142 0.0833 0.2127 0.2422 0.2071 Stanford_SF_ENG_3 (best hop0 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best hop1 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best ALL F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 ************************************************************* Run ID: SAFT_ISI2 Did the run access the live Web during the evaluation window: No Did this run judge each candidate slot filler independently of all other candidate slot fillers in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling run independently of all other slot-filling runs in the evaluation dataset: No Did this run judge candidate slot fillers for each slot-filling team independently of all other slot-filling teams in the evaluation dataset: No Did this run make use of the slot filler or justification offsets provided for each candidate slot filler: Yes Did this run make use of the confidence values provided for each candidate slot filler: No Did this run make use of the system profiles for the slot filling runs: No Did this run make use of the preliminary assessments provided for some of the slot filler candidates: Yes CSSF micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F SAFT_ISI2.ENG.ensemble 0.3952 0.3146 0.3503 0.1849 0.1290 0.1520 0.3308 0.2525 0.2864 BBN_KB_ENG_4 (best hop0 F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 BBN_KB_ENG_1 (best hop1 F1 input) 0.4657 0.2408 0.3174 0.2542 0.1320 0.1737 0.3947 0.2043 0.2693 BBN_KB_ENG_4 (best ALL F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 MAX micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F SAFT_ISI2.ENG.ensemble 0.4060 0.3109 0.3522 0.1675 0.1236 0.1422 0.3288 0.2487 0.2832 BBN_KB_ENG_4 (best hop0 F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 BBN_KB_ENG_1 (best hop1 F1 input) 0.4838 0.2572 0.3358 0.2252 0.1313 0.1659 0.3925 0.2154 0.2781 BBN_KB_ENG_4 (best ALL F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 MEAN macro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F SAFT_ISI2.ENG.ensemble 0.3045 0.3259 0.2880 0.0721 0.1142 0.0833 0.2127 0.2422 0.2071 Stanford_SF_ENG_3 (best hop0 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best hop1 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best ALL F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 ************************************************************* Run ID: SAFT_ISI3 Did the run access the live Web during the evaluation window: No Did this run judge each candidate slot filler independently of all other candidate slot fillers in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling run independently of all other slot-filling runs in the evaluation dataset: No Did this run judge candidate slot fillers for each slot-filling team independently of all other slot-filling teams in the evaluation dataset: No Did this run make use of the slot filler or justification offsets provided for each candidate slot filler: Yes Did this run make use of the confidence values provided for each candidate slot filler: No Did this run make use of the system profiles for the slot filling runs: No Did this run make use of the preliminary assessments provided for some of the slot filler candidates: Yes CSSF micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F SAFT_ISI3.ENG.ensemble 0.2677 0.3752 0.3124 0.1753 0.1496 0.1614 0.2460 0.2996 0.2702 BBN_KB_ENG_4 (best hop0 F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 BBN_KB_ENG_1 (best hop1 F1 input) 0.4657 0.2408 0.3174 0.2542 0.1320 0.1737 0.3947 0.2043 0.2693 BBN_KB_ENG_4 (best ALL F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 MAX micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F SAFT_ISI3.ENG.ensemble 0.2832 0.3685 0.3203 0.1631 0.1467 0.1545 0.2525 0.2949 0.2720 BBN_KB_ENG_4 (best hop0 F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 BBN_KB_ENG_1 (best hop1 F1 input) 0.4838 0.2572 0.3358 0.2252 0.1313 0.1659 0.3925 0.2154 0.2781 BBN_KB_ENG_4 (best ALL F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 MEAN macro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F SAFT_ISI3.ENG.ensemble 0.2996 0.4219 0.3101 0.0811 0.1460 0.0974 0.2133 0.3129 0.2261 Stanford_SF_ENG_3 (best hop0 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best hop1 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best ALL F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244