=========================================================================
TAC KBP 2016 SLOT FILLER VALIDATION ENGLISH ENSEMBLING EVALUATION RESULTS
=========================================================================


Team ID:  IRTSX
Organization:  IRT-SystemX & LIMSI


*************************************************************

Run ID:  IRTSX1
Did the run access the live Web during the evaluation window:  No
Did this run judge each candidate slot filler independently of all other candidate slot fillers in the evaluation dataset: No
Did this run judge candidate slot fillers for each slot-filling run independently of all other slot-filling runs in the evaluation dataset: No
Did this run judge candidate slot fillers for each slot-filling team independently of all other slot-filling teams in the evaluation dataset: Yes
Did this run make use of the slot filler or justification offsets provided for each candidate slot filler: Yes
Did this run make use of the confidence values provided for each candidate slot filler: Yes
Did this run make use of the system profiles for the slot filling runs: Yes
Did this run make use of the preliminary assessments provided for some of the slot filler candidates: No

CSSF micro-average Precision, Recall, and F1, at each hop level:
					hop0_P	hop0_R	hop0_F		hop1_P	hop0_R	hop1_F		ALL_P	ALL_R	ALL_F
IRTSX1.ENG.ensemble			0.2469	0.3205	0.2789		0.1489	0.1437	0.1463		0.2202	0.2613	0.2390
BBN_KB_ENG_4 (best hop0 F1 input)	0.4609	0.2437	0.3188		0.2528	0.1320	0.1734		0.3918	0.2063	0.2703
BBN_KB_ENG_1 (best hop1 F1 input)	0.4657	0.2408	0.3174		0.2542	0.1320	0.1737		0.3947	0.2043	0.2693
BBN_KB_ENG_4 (best ALL F1 input)	0.4609	0.2437	0.3188		0.2528	0.1320	0.1734		0.3918	0.2063	0.2703

MAX micro-average Precision, Recall, and F1, at each hop level:
					hop0_P	hop0_R	hop0_F		hop1_P	hop0_R	hop1_F		ALL_P	ALL_R	ALL_F
IRTSX1.ENG.ensemble			0.2659	0.3282	0.2938		0.1402	0.1467	0.1434		0.2287	0.2679	0.2468
BBN_KB_ENG_4 (best hop0 F1 input)	0.4839	0.2591	0.3375		0.2237	0.1313	0.1655		0.3921	0.2167	0.2791
BBN_KB_ENG_1 (best hop1 F1 input)	0.4838	0.2572	0.3358		0.2252	0.1313	0.1659		0.3925	0.2154	0.2781
BBN_KB_ENG_4 (best ALL F1 input)	0.4839	0.2591	0.3375		0.2237	0.1313	0.1655		0.3921	0.2167	0.2791

MEAN macro-average Precision, Recall, and F1, at each hop level:
					hop0_P	hop0_R	hop0_F		hop1_P	hop0_R	hop1_F		ALL_P	ALL_R	ALL_F
IRTSX1.ENG.ensemble			0.2618	0.3661	0.2695		0.1306	0.1665	0.1377		0.2100	0.2873	0.2174
Stanford_SF_ENG_3 (best hop0 F1 input)	0.2716	0.3016	0.2640		0.1567	0.1977	0.1638		0.2262	0.2605	0.2244
Stanford_SF_ENG_3 (best hop1 F1 input)	0.2716	0.3016	0.2640		0.1567	0.1977	0.1638		0.2262	0.2605	0.2244
Stanford_SF_ENG_3 (best ALL F1 input)	0.2716	0.3016	0.2640		0.1567	0.1977	0.1638		0.2262	0.2605	0.2244


*************************************************************

Run ID:  IRTSX2
Did the run access the live Web during the evaluation window:  No
Did this run judge each candidate slot filler independently of all other candidate slot fillers in the evaluation dataset: No
Did this run judge candidate slot fillers for each slot-filling run independently of all other slot-filling runs in the evaluation dataset: No
Did this run judge candidate slot fillers for each slot-filling team independently of all other slot-filling teams in the evaluation dataset: Yes
Did this run make use of the slot filler or justification offsets provided for each candidate slot filler: Yes
Did this run make use of the confidence values provided for each candidate slot filler: Yes
Did this run make use of the system profiles for the slot filling runs: Yes
Did this run make use of the preliminary assessments provided for some of the slot filler candidates: No

CSSF micro-average Precision, Recall, and F1, at each hop level:
					hop0_P	hop0_R	hop0_F		hop1_P	hop0_R	hop1_F		ALL_P	ALL_R	ALL_F
IRTSX2.ENG.ensemble			0.2644	0.3397	0.2973		0.1348	0.1466	0.1404		0.2256	0.2750	0.2479
BBN_KB_ENG_4 (best hop0 F1 input)	0.4609	0.2437	0.3188		0.2528	0.1320	0.1734		0.3918	0.2063	0.2703
BBN_KB_ENG_1 (best hop1 F1 input)	0.4657	0.2408	0.3174		0.2542	0.1320	0.1737		0.3947	0.2043	0.2693
BBN_KB_ENG_4 (best ALL F1 input)	0.4609	0.2437	0.3188		0.2528	0.1320	0.1734		0.3918	0.2063	0.2703

MAX micro-average Precision, Recall, and F1, at each hop level:
					hop0_P	hop0_R	hop0_F		hop1_P	hop0_R	hop1_F		ALL_P	ALL_R	ALL_F
IRTSX2.ENG.ensemble			0.2736	0.3397	0.3031		0.1284	0.1467	0.1369		0.2280	0.2756	0.2496
BBN_KB_ENG_4 (best hop0 F1 input)	0.4839	0.2591	0.3375		0.2237	0.1313	0.1655		0.3921	0.2167	0.2791
BBN_KB_ENG_1 (best hop1 F1 input)	0.4838	0.2572	0.3358		0.2252	0.1313	0.1659		0.3925	0.2154	0.2781
BBN_KB_ENG_4 (best ALL F1 input)	0.4839	0.2591	0.3375		0.2237	0.1313	0.1655		0.3921	0.2167	0.2791

MEAN macro-average Precision, Recall, and F1, at each hop level:
					hop0_P	hop0_R	hop0_F		hop1_P	hop0_R	hop1_F		ALL_P	ALL_R	ALL_F
IRTSX2.ENG.ensemble			0.3214	0.4234	0.3314		0.1380	0.1703	0.1398		0.2489	0.3234	0.2557
Stanford_SF_ENG_3 (best hop0 F1 input)	0.2716	0.3016	0.2640		0.1567	0.1977	0.1638		0.2262	0.2605	0.2244
Stanford_SF_ENG_3 (best hop1 F1 input)	0.2716	0.3016	0.2640		0.1567	0.1977	0.1638		0.2262	0.2605	0.2244
Stanford_SF_ENG_3 (best ALL F1 input)	0.2716	0.3016	0.2640		0.1567	0.1977	0.1638		0.2262	0.2605	0.2244


*************************************************************

Run ID:  IRTSX3
Did the run access the live Web during the evaluation window:  No
Did this run judge each candidate slot filler independently of all other candidate slot fillers in the evaluation dataset: No
Did this run judge candidate slot fillers for each slot-filling run independently of all other slot-filling runs in the evaluation dataset: No
Did this run judge candidate slot fillers for each slot-filling team independently of all other slot-filling teams in the evaluation dataset: Yes
Did this run make use of the slot filler or justification offsets provided for each candidate slot filler: Yes
Did this run make use of the confidence values provided for each candidate slot filler: Yes
Did this run make use of the system profiles for the slot filling runs: Yes
Did this run make use of the preliminary assessments provided for some of the slot filler candidates: No

CSSF micro-average Precision, Recall, and F1, at each hop level:
					hop0_P	hop0_R	hop0_F		hop1_P	hop0_R	hop1_F		ALL_P	ALL_R	ALL_F
IRTSX3.ENG.ensemble			0.1996	0.2747	0.2312		0.1082	0.1085	0.1083		0.1750	0.2191	0.1946
BBN_KB_ENG_4 (best hop0 F1 input)	0.4609	0.2437	0.3188		0.2528	0.1320	0.1734		0.3918	0.2063	0.2703
BBN_KB_ENG_1 (best hop1 F1 input)	0.4657	0.2408	0.3174		0.2542	0.1320	0.1737		0.3947	0.2043	0.2693
BBN_KB_ENG_4 (best ALL F1 input)	0.4609	0.2437	0.3188		0.2528	0.1320	0.1734		0.3918	0.2063	0.2703

MAX micro-average Precision, Recall, and F1, at each hop level:
					hop0_P	hop0_R	hop0_F		hop1_P	hop0_R	hop1_F		ALL_P	ALL_R	ALL_F
IRTSX3.ENG.ensemble			0.2149	0.2821	0.2440		0.1042	0.1158	0.1097		0.1821	0.2269	0.2021
BBN_KB_ENG_4 (best hop0 F1 input)	0.4839	0.2591	0.3375		0.2237	0.1313	0.1655		0.3921	0.2167	0.2791
BBN_KB_ENG_1 (best hop1 F1 input)	0.4838	0.2572	0.3358		0.2252	0.1313	0.1659		0.3925	0.2154	0.2781
BBN_KB_ENG_4 (best ALL F1 input)	0.4839	0.2591	0.3375		0.2237	0.1313	0.1655		0.3921	0.2167	0.2791

MEAN macro-average Precision, Recall, and F1, at each hop level:
					hop0_P	hop0_R	hop0_F		hop1_P	hop0_R	hop1_F		ALL_P	ALL_R	ALL_F
IRTSX3.ENG.ensemble			0.2518	0.3514	0.2568		0.1112	0.1270	0.1111		0.1962	0.2627	0.1992
Stanford_SF_ENG_3 (best hop0 F1 input)	0.2716	0.3016	0.2640		0.1567	0.1977	0.1638		0.2262	0.2605	0.2244
Stanford_SF_ENG_3 (best hop1 F1 input)	0.2716	0.3016	0.2640		0.1567	0.1977	0.1638		0.2262	0.2605	0.2244
Stanford_SF_ENG_3 (best ALL F1 input)	0.2716	0.3016	0.2640		0.1567	0.1977	0.1638		0.2262	0.2605	0.2244


*************************************************************

Run ID:  IRTSX4
Did the run access the live Web during the evaluation window:  No
Did this run judge each candidate slot filler independently of all other candidate slot fillers in the evaluation dataset: No
Did this run judge candidate slot fillers for each slot-filling run independently of all other slot-filling runs in the evaluation dataset: No
Did this run judge candidate slot fillers for each slot-filling team independently of all other slot-filling teams in the evaluation dataset: Yes
Did this run make use of the slot filler or justification offsets provided for each candidate slot filler: Yes
Did this run make use of the confidence values provided for each candidate slot filler: Yes
Did this run make use of the system profiles for the slot filling runs: Yes
Did this run make use of the preliminary assessments provided for some of the slot filler candidates: No

CSSF micro-average Precision, Recall, and F1, at each hop level:
					hop0_P	hop0_R	hop0_F		hop1_P	hop0_R	hop1_F		ALL_P	ALL_R	ALL_F
IRTSX4.ENG.ensemble			0.2047	0.3840	0.2671		0.0523	0.1261	0.0739		0.1448	0.2976	0.1949
BBN_KB_ENG_4 (best hop0 F1 input)	0.4609	0.2437	0.3188		0.2528	0.1320	0.1734		0.3918	0.2063	0.2703
BBN_KB_ENG_1 (best hop1 F1 input)	0.4657	0.2408	0.3174		0.2542	0.1320	0.1737		0.3947	0.2043	0.2693
BBN_KB_ENG_4 (best ALL F1 input)	0.4609	0.2437	0.3188		0.2528	0.1320	0.1734		0.3918	0.2063	0.2703

MAX micro-average Precision, Recall, and F1, at each hop level:
					hop0_P	hop0_R	hop0_F		hop1_P	hop0_R	hop1_F		ALL_P	ALL_R	ALL_F
IRTSX4.ENG.ensemble			0.2173	0.3896	0.2790		0.0466	0.1236	0.0677		0.1450	0.3013	0.1958
BBN_KB_ENG_4 (best hop0 F1 input)	0.4839	0.2591	0.3375		0.2237	0.1313	0.1655		0.3921	0.2167	0.2791
BBN_KB_ENG_1 (best hop1 F1 input)	0.4838	0.2572	0.3358		0.2252	0.1313	0.1659		0.3925	0.2154	0.2781
BBN_KB_ENG_4 (best ALL F1 input)	0.4839	0.2591	0.3375		0.2237	0.1313	0.1655		0.3921	0.2167	0.2791

MEAN macro-average Precision, Recall, and F1, at each hop level:
					hop0_P	hop0_R	hop0_F		hop1_P	hop0_R	hop1_F		ALL_P	ALL_R	ALL_F
IRTSX4.ENG.ensemble			0.3750	0.4796	0.3856		0.1482	0.1646	0.1483		0.2854	0.3551	0.2918
Stanford_SF_ENG_3 (best hop0 F1 input)	0.2716	0.3016	0.2640		0.1567	0.1977	0.1638		0.2262	0.2605	0.2244
Stanford_SF_ENG_3 (best hop1 F1 input)	0.2716	0.3016	0.2640		0.1567	0.1977	0.1638		0.2262	0.2605	0.2244
Stanford_SF_ENG_3 (best ALL F1 input)	0.2716	0.3016	0.2640		0.1567	0.1977	0.1638		0.2262	0.2605	0.2244