========================================================================= TAC KBP 2016 SLOT FILLER VALIDATION SPANISH ENSEMBLING EVALUATION RESULTS ========================================================================= Team ID: UTAustin Organization: University of Texas at Austin ************************************************************* Run ID: UTAustin1 Did the run access the live Web during the evaluation window: No Did this run judge each candidate slot filler independently of all other candidate slot fillers in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling run independently of all other slot-filling runs in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling team independently of all other slot-filling teams in the evaluation dataset: Yes Did this run make use of the slot filler or justification offsets provided for each candidate slot filler: Yes Did this run make use of the confidence values provided for each candidate slot filler: Yes Did this run make use of the system profiles for the slot filling runs: No Did this run make use of the preliminary assessments provided for some of the slot filler candidates: Yes CSSF micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin1.SPA.ensemble 0.0754 0.3299 0.1227 0.0323 0.0193 0.0242 0.0716 0.2016 0.1056 UMass_IESL_SF_SPA_4 (best hop0 F1 input) 0.1458 0.2653 0.1882 0.0000 0.0000 0.0000 0.1458 0.1557 0.1506 UMass_IESL_KB_SPA_3 (best hop1 F1 input) 0.2991 0.1190 0.1703 0.0513 0.0290 0.0370 0.1752 0.0818 0.1116 UMass_IESL_SF_SPA_4 (best ALL F1 input) 0.1458 0.2653 0.1882 0.0000 0.0000 0.0000 0.1458 0.1557 0.1506 MAX micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin1.SPA.ensemble 0.0718 0.3073 0.1164 0.0000 0.0000 0.0000 0.0699 0.1982 0.1034 UMass_IESL_SF_SPA_4 (best hop0 F1 input) 0.1335 0.2615 0.1767 0.0000 0.0000 0.0000 0.1335 0.1686 0.1490 UMass_IESL_KB_SPA_3 (best hop1 F1 input) 0.2842 0.1239 0.1725 0.0600 0.0500 0.0545 0.1692 0.0976 0.1238 UMass_IESL_SF_SPA_4 (best ALL F1 input) 0.1335 0.2615 0.1767 0.0000 0.0000 0.0000 0.1335 0.1686 0.1490 MEAN macro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin1.SPA.ensemble 0.1593 0.2907 0.1715 0.0062 0.0181 0.0081 0.1072 0.1978 0.1158 UMass_IESL_SF_SPA_4 (best hop0 F1 input) 0.1652 0.2610 0.1815 0.0000 0.0000 0.0000 0.1089 0.1721 0.1196 UMass_IESL_KB_SPA_2 (best hop1 F1 input) 0.1031 0.1414 0.1031 0.0168 0.0316 0.0196 0.0737 0.1040 0.0746 UMass_IESL_SF_SPA_4 (best ALL F1 input) 0.1652 0.2610 0.1815 0.0000 0.0000 0.0000 0.1089 0.1721 0.1196 ************************************************************* Run ID: UTAustin2 Did the run access the live Web during the evaluation window: No Did this run judge each candidate slot filler independently of all other candidate slot fillers in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling run independently of all other slot-filling runs in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling team independently of all other slot-filling teams in the evaluation dataset: Yes Did this run make use of the slot filler or justification offsets provided for each candidate slot filler: Yes Did this run make use of the confidence values provided for each candidate slot filler: Yes Did this run make use of the system profiles for the slot filling runs: No Did this run make use of the preliminary assessments provided for some of the slot filler candidates: Yes CSSF micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin2.SPA.ensemble 0.0740 0.3027 0.1189 0.0252 0.0193 0.0219 0.0683 0.1856 0.0998 UMass_IESL_SF_SPA_4 (best hop0 F1 input) 0.1458 0.2653 0.1882 0.0000 0.0000 0.0000 0.1458 0.1557 0.1506 UMass_IESL_KB_SPA_3 (best hop1 F1 input) 0.2991 0.1190 0.1703 0.0513 0.0290 0.0370 0.1752 0.0818 0.1116 UMass_IESL_SF_SPA_4 (best ALL F1 input) 0.1458 0.2653 0.1882 0.0000 0.0000 0.0000 0.1458 0.1557 0.1506 MAX micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin2.SPA.ensemble 0.0716 0.2706 0.1132 0.0000 0.0000 0.0000 0.0650 0.1746 0.0947 UMass_IESL_SF_SPA_4 (best hop0 F1 input) 0.1335 0.2615 0.1767 0.0000 0.0000 0.0000 0.1335 0.1686 0.1490 UMass_IESL_KB_SPA_3 (best hop1 F1 input) 0.2842 0.1239 0.1725 0.0600 0.0500 0.0545 0.1692 0.0976 0.1238 UMass_IESL_SF_SPA_4 (best ALL F1 input) 0.1335 0.2615 0.1767 0.0000 0.0000 0.0000 0.1335 0.1686 0.1490 MEAN macro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin2.SPA.ensemble 0.1544 0.3014 0.1696 0.0062 0.0181 0.0081 0.1040 0.2049 0.1146 UMass_IESL_SF_SPA_4 (best hop0 F1 input) 0.1652 0.2610 0.1815 0.0000 0.0000 0.0000 0.1089 0.1721 0.1196 UMass_IESL_KB_SPA_2 (best hop1 F1 input) 0.1031 0.1414 0.1031 0.0168 0.0316 0.0196 0.0737 0.1040 0.0746 UMass_IESL_SF_SPA_4 (best ALL F1 input) 0.1652 0.2610 0.1815 0.0000 0.0000 0.0000 0.1089 0.1721 0.1196 ************************************************************* Run ID: UTAustin3 Did the run access the live Web during the evaluation window: No Did this run judge each candidate slot filler independently of all other candidate slot fillers in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling run independently of all other slot-filling runs in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling team independently of all other slot-filling teams in the evaluation dataset: Yes Did this run make use of the slot filler or justification offsets provided for each candidate slot filler: Yes Did this run make use of the confidence values provided for each candidate slot filler: Yes Did this run make use of the system profiles for the slot filling runs: No Did this run make use of the preliminary assessments provided for some of the slot filler candidates: Yes CSSF micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin3.SPA.ensemble 0.0509 0.4116 0.0905 0.0455 0.0290 0.0354 0.0506 0.2535 0.0843 UMass_IESL_SF_SPA_4 (best hop0 F1 input) 0.1458 0.2653 0.1882 0.0000 0.0000 0.0000 0.1458 0.1557 0.1506 UMass_IESL_KB_SPA_3 (best hop1 F1 input) 0.2991 0.1190 0.1703 0.0513 0.0290 0.0370 0.1752 0.0818 0.1116 UMass_IESL_SF_SPA_4 (best ALL F1 input) 0.1458 0.2653 0.1882 0.0000 0.0000 0.0000 0.1458 0.1557 0.1506 MAX micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin3.SPA.ensemble 0.0552 0.3945 0.0968 0.0606 0.0167 0.0261 0.0553 0.2604 0.0912 UMass_IESL_SF_SPA_4 (best hop0 F1 input) 0.1335 0.2615 0.1767 0.0000 0.0000 0.0000 0.1335 0.1686 0.1490 UMass_IESL_KB_SPA_3 (best hop1 F1 input) 0.2842 0.1239 0.1725 0.0600 0.0500 0.0545 0.1692 0.0976 0.1238 UMass_IESL_SF_SPA_4 (best ALL F1 input) 0.1335 0.2615 0.1767 0.0000 0.0000 0.0000 0.1335 0.1686 0.1490 MEAN macro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin3.SPA.ensemble 0.1831 0.3512 0.1983 0.0095 0.0235 0.0121 0.1239 0.2396 0.1349 UMass_IESL_SF_SPA_4 (best hop0 F1 input) 0.1652 0.2610 0.1815 0.0000 0.0000 0.0000 0.1089 0.1721 0.1196 UMass_IESL_KB_SPA_2 (best hop1 F1 input) 0.1031 0.1414 0.1031 0.0168 0.0316 0.0196 0.0737 0.1040 0.0746 UMass_IESL_SF_SPA_4 (best ALL F1 input) 0.1652 0.2610 0.1815 0.0000 0.0000 0.0000 0.1089 0.1721 0.1196 ************************************************************* Run ID: UTAustin4 Did the run access the live Web during the evaluation window: No Did this run judge each candidate slot filler independently of all other candidate slot fillers in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling run independently of all other slot-filling runs in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling team independently of all other slot-filling teams in the evaluation dataset: Yes Did this run make use of the slot filler or justification offsets provided for each candidate slot filler: Yes Did this run make use of the confidence values provided for each candidate slot filler: Yes Did this run make use of the system profiles for the slot filling runs: No Did this run make use of the preliminary assessments provided for some of the slot filler candidates: Yes CSSF micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin4.SPA.ensemble 0.0469 0.3844 0.0836 0.0370 0.0193 0.0254 0.0465 0.2335 0.0776 UMass_IESL_SF_SPA_4 (best hop0 F1 input) 0.1458 0.2653 0.1882 0.0000 0.0000 0.0000 0.1458 0.1557 0.1506 UMass_IESL_KB_SPA_3 (best hop1 F1 input) 0.2991 0.1190 0.1703 0.0513 0.0290 0.0370 0.1752 0.0818 0.1116 UMass_IESL_SF_SPA_4 (best ALL F1 input) 0.1458 0.2653 0.1882 0.0000 0.0000 0.0000 0.1458 0.1557 0.1506 MAX micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin4.SPA.ensemble 0.0504 0.3624 0.0885 0.0444 0.0167 0.0242 0.0502 0.2396 0.0831 UMass_IESL_SF_SPA_4 (best hop0 F1 input) 0.1335 0.2615 0.1767 0.0000 0.0000 0.0000 0.1335 0.1686 0.1490 UMass_IESL_KB_SPA_3 (best hop1 F1 input) 0.2842 0.1239 0.1725 0.0600 0.0500 0.0545 0.1692 0.0976 0.1238 UMass_IESL_SF_SPA_4 (best ALL F1 input) 0.1335 0.2615 0.1767 0.0000 0.0000 0.0000 0.1335 0.1686 0.1490 MEAN macro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin4.SPA.ensemble 0.1707 0.3407 0.1854 0.0050 0.0074 0.0057 0.1143 0.2271 0.1242 UMass_IESL_SF_SPA_4 (best hop0 F1 input) 0.1652 0.2610 0.1815 0.0000 0.0000 0.0000 0.1089 0.1721 0.1196 UMass_IESL_KB_SPA_2 (best hop1 F1 input) 0.1031 0.1414 0.1031 0.0168 0.0316 0.0196 0.0737 0.1040 0.0746 UMass_IESL_SF_SPA_4 (best ALL F1 input) 0.1652 0.2610 0.1815 0.0000 0.0000 0.0000 0.1089 0.1721 0.1196 ************************************************************* Run ID: UTAustin5 Did the run access the live Web during the evaluation window: No Did this run judge each candidate slot filler independently of all other candidate slot fillers in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling run independently of all other slot-filling runs in the evaluation dataset: Yes Did this run judge candidate slot fillers for each slot-filling team independently of all other slot-filling teams in the evaluation dataset: Yes Did this run make use of the slot filler or justification offsets provided for each candidate slot filler: Yes Did this run make use of the confidence values provided for each candidate slot filler: Yes Did this run make use of the system profiles for the slot filling runs: No Did this run make use of the preliminary assessments provided for some of the slot filler candidates: Yes CSSF micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin5.SPA.ensemble 0.1357 0.3333 0.1929 0.0274 0.0193 0.0227 0.1175 0.2036 0.1490 UMass_IESL_SF_SPA_4 (best hop0 F1 input) 0.1458 0.2653 0.1882 0.0000 0.0000 0.0000 0.1458 0.1557 0.1506 UMass_IESL_KB_SPA_3 (best hop1 F1 input) 0.2991 0.1190 0.1703 0.0513 0.0290 0.0370 0.1752 0.0818 0.1116 UMass_IESL_SF_SPA_4 (best ALL F1 input) 0.1458 0.2653 0.1882 0.0000 0.0000 0.0000 0.1458 0.1557 0.1506 MAX micro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin5.SPA.ensemble 0.1314 0.3165 0.1857 0.0000 0.0000 0.0000 0.1206 0.2041 0.1516 UMass_IESL_SF_SPA_4 (best hop0 F1 input) 0.1335 0.2615 0.1767 0.0000 0.0000 0.0000 0.1335 0.1686 0.1490 UMass_IESL_KB_SPA_3 (best hop1 F1 input) 0.2842 0.1239 0.1725 0.0600 0.0500 0.0545 0.1692 0.0976 0.1238 UMass_IESL_SF_SPA_4 (best ALL F1 input) 0.1335 0.2615 0.1767 0.0000 0.0000 0.0000 0.1335 0.1686 0.1490 MEAN macro-average Precision, Recall, and F1, at each hop level: hop0_P hop0_R hop0_F hop1_P hop0_R hop1_F ALL_P ALL_R ALL_F UTAustin5.SPA.ensemble 0.1763 0.3272 0.1936 0.0062 0.0181 0.0081 0.1183 0.2219 0.1304 UMass_IESL_SF_SPA_4 (best hop0 F1 input) 0.1652 0.2610 0.1815 0.0000 0.0000 0.0000 0.1089 0.1721 0.1196 UMass_IESL_KB_SPA_2 (best hop1 F1 input) 0.1031 0.1414 0.1031 0.0168 0.0316 0.0196 0.0737 0.1040 0.0746 UMass_IESL_SF_SPA_4 (best ALL F1 input) 0.1652 0.2610 0.1815 0.0000 0.0000 0.0000 0.1089 0.1721 0.1196