========================================================================= TAC KBP 2016 SLOT FILLER VALIDATION CHINESE ENSEMBLING EVALUATION RESULTS ========================================================================= Team ID: UTAustin Organization: University of Texas at Austin ************************************************************* Run ID: UTAustin1 Did the run access the live Web during the evaluation window: No Did this run judge each candidate slot filler independently of all other candidate slot fillers in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling run independently of all other slot-filling runs in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling team independently of all other slot-filling teams in the evaluation dataset: Yes Did this run make use of the slot filler or justification offsets provided for each candidate slot filler: Yes Did this run make use of the confidence values provided for each candidate slot filler: Yes Did this run make use of the system profiles for the slot filling runs: No Did this run make use of the preliminary assessments provided for some of the slot filler candidates: Yes CSSF micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin1.CMN.ensemble 0.1640 0.2025 0.1812 0.0613 0.1415 0.0855 0.1260 0.1879 0.1508 hltcoe_KB_CMN_4 (best hop0 F1 input) 0.5200 0.1994 0.2882 0.2905 0.2537 0.2708 0.4242 0.2124 0.2830 hltcoe_KB_CMN_3 (best hop1 F1 input) 0.5246 0.1963 0.2857 0.2989 0.2537 0.2744 0.4306 0.2100 0.2824 hltcoe_KB_CMN_4 (best ALL F1 input) 0.5200 0.1994 0.2882 0.2905 0.2537 0.2708 0.4242 0.2124 0.2830 MAX micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin1.CMN.ensemble 0.1674 0.2521 0.2012 0.0671 0.1758 0.0972 0.1295 0.2323 0.1663 hltcoe_KB_CMN_4 (best hop0 F1 input) 0.5378 0.2564 0.3472 0.2905 0.3152 0.3023 0.4282 0.2716 0.3324 hltcoe_KB_CMN_1 (best hop1 F1 input) 0.5459 0.2521 0.3449 0.3023 0.3152 0.3086 0.4385 0.2684 0.3330 hltcoe_KB_CMN_1 (best ALL F1 input) 0.5459 0.2521 0.3449 0.3023 0.3152 0.3086 0.4385 0.2684 0.3330 MEAN macro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin1.CMN.ensemble 0.1940 0.2498 0.1931 0.1317 0.1441 0.1345 0.1653 0.2012 0.1661 hltcoe_KB_CMN_4 (best hop0 F1 input) 0.1856 0.1796 0.1752 0.1327 0.1465 0.1348 0.1613 0.1644 0.1566 Stanford_KB_CMN_2 (best hop1 F1 input) 0.1206 0.2031 0.1324 0.1242 0.1804 0.1391 0.1223 0.1927 0.1355 hltcoe_KB_CMN_4 (best ALL F1 input) 0.1856 0.1796 0.1752 0.1327 0.1465 0.1348 0.1613 0.1644 0.1566 ************************************************************* Run ID: UTAustin2 Did the run access the live Web during the evaluation window: No Did this run judge each candidate slot filler independently of all other candidate slot fillers in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling run independently of all other slot-filling runs in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling team independently of all other slot-filling teams in the evaluation dataset: Yes Did this run make use of the slot filler or justification offsets provided for each candidate slot filler: Yes Did this run make use of the confidence values provided for each candidate slot filler: Yes Did this run make use of the system profiles for the slot filling runs: No Did this run make use of the preliminary assessments provided for some of the slot filler candidates: Yes CSSF micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin2.CMN.ensemble 0.3353 0.1748 0.2298 1.0000 0.0049 0.0097 0.3372 0.1342 0.1920 hltcoe_KB_CMN_4 (best hop0 F1 input) 0.5200 0.1994 0.2882 0.2905 0.2537 0.2708 0.4242 0.2124 0.2830 hltcoe_KB_CMN_3 (best hop1 F1 input) 0.5246 0.1963 0.2857 0.2989 0.2537 0.2744 0.4306 0.2100 0.2824 hltcoe_KB_CMN_4 (best ALL F1 input) 0.5200 0.1994 0.2882 0.2905 0.2537 0.2708 0.4242 0.2124 0.2830 MAX micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin2.CMN.ensemble 0.3469 0.1992 0.2530 1.0000 0.0061 0.0120 0.3493 0.1491 0.2090 hltcoe_KB_CMN_4 (best hop0 F1 input) 0.5378 0.2564 0.3472 0.2905 0.3152 0.3023 0.4282 0.2716 0.3324 hltcoe_KB_CMN_1 (best hop1 F1 input) 0.5459 0.2521 0.3449 0.3023 0.3152 0.3086 0.4385 0.2684 0.3330 hltcoe_KB_CMN_1 (best ALL F1 input) 0.5459 0.2521 0.3449 0.3023 0.3152 0.3086 0.4385 0.2684 0.3330 MEAN macro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin2.CMN.ensemble 0.1288 0.1396 0.1225 0.0103 0.0103 0.0103 0.0743 0.0802 0.0710 hltcoe_KB_CMN_4 (best hop0 F1 input) 0.1856 0.1796 0.1752 0.1327 0.1465 0.1348 0.1613 0.1644 0.1566 Stanford_KB_CMN_2 (best hop1 F1 input) 0.1206 0.2031 0.1324 0.1242 0.1804 0.1391 0.1223 0.1927 0.1355 hltcoe_KB_CMN_4 (best ALL F1 input) 0.1856 0.1796 0.1752 0.1327 0.1465 0.1348 0.1613 0.1644 0.1566 ************************************************************* Run ID: UTAustin3 Did the run access the live Web during the evaluation window: No Did this run judge each candidate slot filler independently of all other candidate slot fillers in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling run independently of all other slot-filling runs in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling team independently of all other slot-filling teams in the evaluation dataset: Yes Did this run make use of the slot filler or justification offsets provided for each candidate slot filler: Yes Did this run make use of the confidence values provided for each candidate slot filler: Yes Did this run make use of the system profiles for the slot filling runs: No Did this run make use of the preliminary assessments provided for some of the slot filler candidates: Yes CSSF micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin3.CMN.ensemble 0.1732 0.3635 0.2347 0.0256 0.2780 0.0469 0.0818 0.3431 0.1321 hltcoe_KB_CMN_4 (best hop0 F1 input) 0.5200 0.1994 0.2882 0.2905 0.2537 0.2708 0.4242 0.2124 0.2830 hltcoe_KB_CMN_3 (best hop1 F1 input) 0.5246 0.1963 0.2857 0.2989 0.2537 0.2744 0.4306 0.2100 0.2824 hltcoe_KB_CMN_4 (best ALL F1 input) 0.5200 0.1994 0.2882 0.2905 0.2537 0.2708 0.4242 0.2124 0.2830 MAX micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin3.CMN.ensemble 0.1827 0.4640 0.2621 0.0249 0.3152 0.0462 0.0825 0.4254 0.1382 hltcoe_KB_CMN_4 (best hop0 F1 input) 0.5378 0.2564 0.3472 0.2905 0.3152 0.3023 0.4282 0.2716 0.3324 hltcoe_KB_CMN_1 (best hop1 F1 input) 0.5459 0.2521 0.3449 0.3023 0.3152 0.3086 0.4385 0.2684 0.3330 hltcoe_KB_CMN_1 (best ALL F1 input) 0.5459 0.2521 0.3449 0.3023 0.3152 0.3086 0.4385 0.2684 0.3330 MEAN macro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin3.CMN.ensemble 0.2208 0.3328 0.2343 0.2544 0.3213 0.2706 0.2363 0.3275 0.2510 hltcoe_KB_CMN_4 (best hop0 F1 input) 0.1856 0.1796 0.1752 0.1327 0.1465 0.1348 0.1613 0.1644 0.1566 Stanford_KB_CMN_2 (best hop1 F1 input) 0.1206 0.2031 0.1324 0.1242 0.1804 0.1391 0.1223 0.1927 0.1355 hltcoe_KB_CMN_4 (best ALL F1 input) 0.1856 0.1796 0.1752 0.1327 0.1465 0.1348 0.1613 0.1644 0.1566 ************************************************************* Run ID: UTAustin4 Did the run access the live Web during the evaluation window: No Did this run judge each candidate slot filler independently of all other candidate slot fillers in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling run independently of all other slot-filling runs in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling team independently of all other slot-filling teams in the evaluation dataset: Yes Did this run make use of the slot filler or justification offsets provided for each candidate slot filler: Yes Did this run make use of the confidence values provided for each candidate slot filler: Yes Did this run make use of the system profiles for the slot filling runs: No Did this run make use of the preliminary assessments provided for some of the slot filler candidates: Yes CSSF micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin4.CMN.ensemble 0.2990 0.2316 0.2610 0.0535 0.0976 0.0691 0.1945 0.1995 0.1970 hltcoe_KB_CMN_4 (best hop0 F1 input) 0.5200 0.1994 0.2882 0.2905 0.2537 0.2708 0.4242 0.2124 0.2830 hltcoe_KB_CMN_3 (best hop1 F1 input) 0.5246 0.1963 0.2857 0.2989 0.2537 0.2744 0.4306 0.2100 0.2824 hltcoe_KB_CMN_4 (best ALL F1 input) 0.5200 0.1994 0.2882 0.2905 0.2537 0.2708 0.4242 0.2124 0.2830 MAX micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin4.CMN.ensemble 0.3122 0.2818 0.2962 0.0542 0.1212 0.0749 0.1925 0.2402 0.2137 hltcoe_KB_CMN_4 (best hop0 F1 input) 0.5378 0.2564 0.3472 0.2905 0.3152 0.3023 0.4282 0.2716 0.3324 hltcoe_KB_CMN_1 (best hop1 F1 input) 0.5459 0.2521 0.3449 0.3023 0.3152 0.3086 0.4385 0.2684 0.3330 hltcoe_KB_CMN_1 (best ALL F1 input) 0.5459 0.2521 0.3449 0.3023 0.3152 0.3086 0.4385 0.2684 0.3330 MEAN macro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin4.CMN.ensemble 0.1159 0.1443 0.1167 0.0936 0.1237 0.1033 0.1056 0.1348 0.1105 hltcoe_KB_CMN_4 (best hop0 F1 input) 0.1856 0.1796 0.1752 0.1327 0.1465 0.1348 0.1613 0.1644 0.1566 Stanford_KB_CMN_2 (best hop1 F1 input) 0.1206 0.2031 0.1324 0.1242 0.1804 0.1391 0.1223 0.1927 0.1355 hltcoe_KB_CMN_4 (best ALL F1 input) 0.1856 0.1796 0.1752 0.1327 0.1465 0.1348 0.1613 0.1644 0.1566 ************************************************************* Run ID: UTAustin5 Did the run access the live Web during the evaluation window: No Did this run judge each candidate slot filler independently of all other candidate slot fillers in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling run independently of all other slot-filling runs in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling team independently of all other slot-filling teams in the evaluation dataset: Yes Did this run make use of the slot filler or justification offsets provided for each candidate slot filler: Yes Did this run make use of the confidence values provided for each candidate slot filler: Yes Did this run make use of the system profiles for the slot filling runs: No Did this run make use of the preliminary assessments provided for some of the slot filler candidates: Yes CSSF micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin5.CMN.ensemble 0.1664 0.3175 0.2184 0.0520 0.3317 0.0899 0.1078 0.3209 0.1614 hltcoe_KB_CMN_4 (best hop0 F1 input) 0.5200 0.1994 0.2882 0.2905 0.2537 0.2708 0.4242 0.2124 0.2830 hltcoe_KB_CMN_3 (best hop1 F1 input) 0.5246 0.1963 0.2857 0.2989 0.2537 0.2744 0.4306 0.2100 0.2824 hltcoe_KB_CMN_4 (best ALL F1 input) 0.5200 0.1994 0.2882 0.2905 0.2537 0.2708 0.4242 0.2124 0.2830 MAX micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin5.CMN.ensemble 0.1650 0.3856 0.2311 0.0510 0.3818 0.0900 0.1048 0.3846 0.1647 hltcoe_KB_CMN_4 (best hop0 F1 input) 0.5378 0.2564 0.3472 0.2905 0.3152 0.3023 0.4282 0.2716 0.3324 hltcoe_KB_CMN_1 (best hop1 F1 input) 0.5459 0.2521 0.3449 0.3023 0.3152 0.3086 0.4385 0.2684 0.3330 hltcoe_KB_CMN_1 (best ALL F1 input) 0.5459 0.2521 0.3449 0.3023 0.3152 0.3086 0.4385 0.2684 0.3330 MEAN macro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin5.CMN.ensemble 0.2345 0.4312 0.2644 0.1826 0.2341 0.1945 0.2106 0.3406 0.2322 hltcoe_KB_CMN_4 (best hop0 F1 input) 0.1856 0.1796 0.1752 0.1327 0.1465 0.1348 0.1613 0.1644 0.1566 Stanford_KB_CMN_2 (best hop1 F1 input) 0.1206 0.2031 0.1324 0.1242 0.1804 0.1391 0.1223 0.1927 0.1355 hltcoe_KB_CMN_4 (best ALL F1 input) 0.1856 0.1796 0.1752 0.1327 0.1465 0.1348 0.1613 0.1644 0.1566