========================================================================= TAC KBP 2016 SLOT FILLER VALIDATION ENGLISH ENSEMBLING EVALUATION RESULTS ========================================================================= Team ID: gator_dsr Organization: University of Florida ************************************************************* Run ID: gator_dsr1 Did the run access the live Web during the evaluation window: No Did this run judge each candidate slot filler independently of all other candidate slot fillers in the evaluation dataset: No Did this run judge candidate slot fillers for each slot-filling run independently of all other slot-filling runs in the evaluation dataset: No Did this run judge candidate slot fillers for each slot-filling team independently of all other slot-filling teams in the evaluation dataset: No Did this run make use of the slot filler or justification offsets provided for each candidate slot filler: Yes Did this run make use of the confidence values provided for each candidate slot filler: Yes Did this run make use of the system profiles for the slot filling runs: No Did this run make use of the preliminary assessments provided for some of the slot filler candidates: No CSSF micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F gator_dsr1.ENG.ensemble 0.3716 0.3589 0.3651 0.1307 0.2786 0.1779 0.2448 0.3320 0.2818 BBN_KB_ENG_4 (best hop0 F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 BBN_KB_ENG_1 (best hop1 F1 input) 0.4657 0.2408 0.3174 0.2542 0.1320 0.1737 0.3947 0.2043 0.2693 BBN_KB_ENG_4 (best ALL F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 MAX micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F gator_dsr1.ENG.ensemble 0.3840 0.3589 0.3710 0.1130 0.2780 0.1607 0.2304 0.3321 0.2721 BBN_KB_ENG_4 (best hop0 F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 BBN_KB_ENG_1 (best hop1 F1 input) 0.4838 0.2572 0.3358 0.2252 0.1313 0.1659 0.3925 0.2154 0.2781 BBN_KB_ENG_4 (best ALL F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 MEAN macro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F gator_dsr1.ENG.ensemble 0.3667 0.4239 0.3633 0.2217 0.2898 0.2340 0.3094 0.3709 0.3122 Stanford_SF_ENG_3 (best hop0 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best hop1 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best ALL F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 ************************************************************* Run ID: gator_dsr2 Did the run access the live Web during the evaluation window: No Did this run judge each candidate slot filler independently of all other candidate slot fillers in the evaluation dataset: No Did this run judge candidate slot fillers for each slot-filling run independently of all other slot-filling runs in the evaluation dataset: No Did this run judge candidate slot fillers for each slot-filling team independently of all other slot-filling teams in the evaluation dataset: No Did this run make use of the slot filler or justification offsets provided for each candidate slot filler: Yes Did this run make use of the confidence values provided for each candidate slot filler: Yes Did this run make use of the system profiles for the slot filling runs: No Did this run make use of the preliminary assessments provided for some of the slot filler candidates: No CSSF micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F gator_dsr2.ENG.ensemble 0.3831 0.3338 0.3567 0.1279 0.2757 0.1747 0.2415 0.3143 0.2732 BBN_KB_ENG_4 (best hop0 F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 BBN_KB_ENG_1 (best hop1 F1 input) 0.4657 0.2408 0.3174 0.2542 0.1320 0.1737 0.3947 0.2043 0.2693 BBN_KB_ENG_4 (best ALL F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 MAX micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F gator_dsr2.ENG.ensemble 0.4000 0.3340 0.3640 0.1116 0.2780 0.1593 0.2278 0.3154 0.2645 BBN_KB_ENG_4 (best hop0 F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 BBN_KB_ENG_1 (best hop1 F1 input) 0.4838 0.2572 0.3358 0.2252 0.1313 0.1659 0.3925 0.2154 0.2781 BBN_KB_ENG_4 (best ALL F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 MEAN macro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F gator_dsr2.ENG.ensemble 0.3694 0.4132 0.3575 0.1998 0.2683 0.2122 0.3024 0.3560 0.3001 Stanford_SF_ENG_3 (best hop0 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best hop1 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best ALL F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 ************************************************************* Run ID: gator_dsr4 Did the run access the live Web during the evaluation window: No Did this run judge each candidate slot filler independently of all other candidate slot fillers in the evaluation dataset: No Did this run judge candidate slot fillers for each slot-filling run independently of all other slot-filling runs in the evaluation dataset: No Did this run judge candidate slot fillers for each slot-filling team independently of all other slot-filling teams in the evaluation dataset: No Did this run make use of the slot filler or justification offsets provided for each candidate slot filler: Yes Did this run make use of the confidence values provided for each candidate slot filler: Yes Did this run make use of the system profiles for the slot filling runs: No Did this run make use of the preliminary assessments provided for some of the slot filler candidates: No CSSF micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F gator_dsr4.ENG.ensemble 0.3689 0.3368 0.3521 0.4150 0.1789 0.2500 0.3778 0.2839 0.3242 BBN_KB_ENG_4 (best hop0 F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 BBN_KB_ENG_1 (best hop1 F1 input) 0.4657 0.2408 0.3174 0.2542 0.1320 0.1737 0.3947 0.2043 0.2693 BBN_KB_ENG_4 (best ALL F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 MAX micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F gator_dsr4.ENG.ensemble 0.3879 0.3455 0.3655 0.4057 0.1660 0.2356 0.3912 0.2859 0.3304 BBN_KB_ENG_4 (best hop0 F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 BBN_KB_ENG_1 (best hop1 F1 input) 0.4838 0.2572 0.3358 0.2252 0.1313 0.1659 0.3925 0.2154 0.2781 BBN_KB_ENG_4 (best ALL F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 MEAN macro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F gator_dsr4.ENG.ensemble 0.3065 0.3640 0.3044 0.2282 0.2319 0.2197 0.2756 0.3118 0.2710 Stanford_SF_ENG_3 (best hop0 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best hop1 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best ALL F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244