========================================================================= TAC KBP 2016 SLOT FILLER VALIDATION ENGLISH ENSEMBLING EVALUATION RESULTS ========================================================================= Team ID: UTAustin Organization: University of Texas at Austin ************************************************************* Run ID: UTAustin1 Did the run access the live Web during the evaluation window: No Did this run judge each candidate slot filler independently of all other candidate slot fillers in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling run independently of all other slot-filling runs in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling team independently of all other slot-filling teams in the evaluation dataset: Yes Did this run make use of the slot filler or justification offsets provided for each candidate slot filler: Yes Did this run make use of the confidence values provided for each candidate slot filler: Yes Did this run make use of the system profiles for the slot filling runs: No Did this run make use of the preliminary assessments provided for some of the slot filler candidates: Yes CSSF micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin1.ENG.ensemble 0.1346 0.4077 0.2023 0.0769 0.0176 0.0286 0.1325 0.2770 0.1792 BBN_KB_ENG_4 (best hop0 F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 BBN_KB_ENG_1 (best hop1 F1 input) 0.4657 0.2408 0.3174 0.2542 0.1320 0.1737 0.3947 0.2043 0.2693 BBN_KB_ENG_4 (best ALL F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 MAX micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin1.ENG.ensemble 0.1477 0.4107 0.2173 0.0820 0.0193 0.0312 0.1450 0.2808 0.1913 BBN_KB_ENG_4 (best hop0 F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 BBN_KB_ENG_1 (best hop1 F1 input) 0.4838 0.2572 0.3358 0.2252 0.1313 0.1659 0.3925 0.2154 0.2781 BBN_KB_ENG_4 (best ALL F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 MEAN macro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin1.ENG.ensemble 0.2114 0.4254 0.2523 0.0166 0.0221 0.0183 0.1345 0.2661 0.1599 Stanford_SF_ENG_3 (best hop0 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best hop1 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best ALL F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 ************************************************************* Run ID: UTAustin2 Did the run access the live Web during the evaluation window: No Did this run judge each candidate slot filler independently of all other candidate slot fillers in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling run independently of all other slot-filling runs in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling team independently of all other slot-filling teams in the evaluation dataset: Yes Did this run make use of the slot filler or justification offsets provided for each candidate slot filler: Yes Did this run make use of the confidence values provided for each candidate slot filler: Yes Did this run make use of the system profiles for the slot filling runs: No Did this run make use of the preliminary assessments provided for some of the slot filler candidates: Yes CSSF micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin2.ENG.ensemble 0.1421 0.4535 0.2164 0.1000 0.0205 0.0341 0.1408 0.3084 0.1933 BBN_KB_ENG_4 (best hop0 F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 BBN_KB_ENG_1 (best hop1 F1 input) 0.4657 0.2408 0.3174 0.2542 0.1320 0.1737 0.3947 0.2043 0.2693 BBN_KB_ENG_4 (best ALL F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 MAX micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin2.ENG.ensemble 0.1541 0.4568 0.2305 0.1017 0.0232 0.0377 0.1522 0.3128 0.2048 BBN_KB_ENG_4 (best hop0 F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 BBN_KB_ENG_1 (best hop1 F1 input) 0.4838 0.2572 0.3358 0.2252 0.1313 0.1659 0.3925 0.2154 0.2781 BBN_KB_ENG_4 (best ALL F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 MEAN macro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin2.ENG.ensemble 0.2279 0.4500 0.2675 0.0283 0.0299 0.0288 0.1491 0.2841 0.1732 Stanford_SF_ENG_3 (best hop0 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best hop1 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best ALL F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 ************************************************************* Run ID: UTAustin3 Did the run access the live Web during the evaluation window: No Did this run judge each candidate slot filler independently of all other candidate slot fillers in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling run independently of all other slot-filling runs in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling team independently of all other slot-filling teams in the evaluation dataset: Yes Did this run make use of the slot filler or justification offsets provided for each candidate slot filler: Yes Did this run make use of the confidence values provided for each candidate slot filler: Yes Did this run make use of the system profiles for the slot filling runs: No Did this run make use of the preliminary assessments provided for some of the slot filler candidates: Yes CSSF micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin3.ENG.ensemble 0.1308 0.4815 0.2057 0.1239 0.0411 0.0617 0.1305 0.3340 0.1876 BBN_KB_ENG_4 (best hop0 F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 BBN_KB_ENG_1 (best hop1 F1 input) 0.4657 0.2408 0.3174 0.2542 0.1320 0.1737 0.3947 0.2043 0.2693 BBN_KB_ENG_4 (best ALL F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 MAX micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin3.ENG.ensemble 0.1404 0.4779 0.2170 0.1316 0.0386 0.0597 0.1400 0.3321 0.1970 BBN_KB_ENG_4 (best hop0 F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 BBN_KB_ENG_1 (best hop1 F1 input) 0.4838 0.2572 0.3358 0.2252 0.1313 0.1659 0.3925 0.2154 0.2781 BBN_KB_ENG_4 (best ALL F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 MEAN macro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin3.ENG.ensemble 0.2183 0.4979 0.2690 0.0525 0.0527 0.0481 0.1528 0.3221 0.1817 Stanford_SF_ENG_3 (best hop0 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best hop1 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best ALL F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 ************************************************************* Run ID: UTAustin4 Did the run access the live Web during the evaluation window: No Did this run judge each candidate slot filler independently of all other candidate slot fillers in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling run independently of all other slot-filling runs in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling team independently of all other slot-filling teams in the evaluation dataset: Yes Did this run make use of the slot filler or justification offsets provided for each candidate slot filler: Yes Did this run make use of the confidence values provided for each candidate slot filler: Yes Did this run make use of the system profiles for the slot filling runs: No Did this run make use of the preliminary assessments provided for some of the slot filler candidates: Yes CSSF micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin4.ENG.ensemble 0.2480 0.4195 0.3117 0.2022 0.1056 0.1387 0.2419 0.3143 0.2734 BBN_KB_ENG_4 (best hop0 F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 BBN_KB_ENG_1 (best hop1 F1 input) 0.4657 0.2408 0.3174 0.2542 0.1320 0.1737 0.3947 0.2043 0.2693 BBN_KB_ENG_4 (best ALL F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 MAX micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin4.ENG.ensemble 0.2613 0.4088 0.3189 0.1879 0.1081 0.1373 0.2500 0.3090 0.2764 BBN_KB_ENG_4 (best hop0 F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 BBN_KB_ENG_1 (best hop1 F1 input) 0.4838 0.2572 0.3358 0.2252 0.1313 0.1659 0.3925 0.2154 0.2781 BBN_KB_ENG_4 (best ALL F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 MEAN macro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin4.ENG.ensemble 0.2634 0.4299 0.2960 0.1005 0.1160 0.1031 0.1990 0.3059 0.2198 Stanford_SF_ENG_3 (best hop0 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best hop1 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best ALL F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 ************************************************************* Run ID: UTAustin5 Did the run access the live Web during the evaluation window: No Did this run judge each candidate slot filler independently of all other candidate slot fillers in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling run independently of all other slot-filling runs in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling team independently of all other slot-filling teams in the evaluation dataset: Yes Did this run make use of the slot filler or justification offsets provided for each candidate slot filler: Yes Did this run make use of the confidence values provided for each candidate slot filler: Yes Did this run make use of the system profiles for the slot filling runs: No Did this run make use of the preliminary assessments provided for some of the slot filler candidates: Yes CSSF micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin5.ENG.ensemble 0.2696 0.3767 0.3142 0.2222 0.0117 0.0223 0.2687 0.2544 0.2614 BBN_KB_ENG_4 (best hop0 F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 BBN_KB_ENG_1 (best hop1 F1 input) 0.4657 0.2408 0.3174 0.2542 0.1320 0.1737 0.3947 0.2043 0.2693 BBN_KB_ENG_4 (best ALL F1 input) 0.4609 0.2437 0.3188 0.2528 0.1320 0.1734 0.3918 0.2063 0.2703 MAX micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin5.ENG.ensemble 0.2859 0.3743 0.3242 0.2353 0.0154 0.0290 0.2847 0.2551 0.2691 BBN_KB_ENG_4 (best hop0 F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 BBN_KB_ENG_1 (best hop1 F1 input) 0.4838 0.2572 0.3358 0.2252 0.1313 0.1659 0.3925 0.2154 0.2781 BBN_KB_ENG_4 (best ALL F1 input) 0.4839 0.2591 0.3375 0.2237 0.1313 0.1655 0.3921 0.2167 0.2791 MEAN macro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin5.ENG.ensemble 0.2813 0.3907 0.3008 0.0234 0.0215 0.0221 0.1794 0.2448 0.1907 Stanford_SF_ENG_3 (best hop0 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best hop1 F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244 Stanford_SF_ENG_3 (best ALL F1 input) 0.2716 0.3016 0.2640 0.1567 0.1977 0.1638 0.2262 0.2605 0.2244