=============================================================
TAC KBP 2016 CROSS-LINGUAL KB CONSTRUCTION EVALUATION RESULTS
=============================================================


Team ID:  Stanford
Organization:  Stanford University


*************************************************************

Run ID:  Stanford_KB_XLING_1
Did the run access the live Web during the evaluation window:  No
Did the run extract relations from the Cold Start source corpus: Yes
Did the run generate meaningful confidence values: Yes
Run number of the English KB system that is most closely configured to the English component of this run: 1
Run number of the Spanish KB system that is most closely configured to the Spanish component of this run: NA
Run number of the Chinese KB system that is most closely configured to the Chinese component of this run: 1

Entity Discovery Evaluation:

ALL English, Chinese, and Spanish documents:
Prec	Recall	F1	Metric
0.750	0.350	0.478	strong_mention_match
0.677	0.316	0.431	strong_typed_mention_match
0.000	0.000	0.000	entity_match
0.667	0.173	0.275	b_cubed
0.663	0.310	0.422	mention_ceaf
0.607	0.283	0.386	typed_mention_ceaf

ONLY English documents:
Prec	Recall	F1	Metric
0.733	0.538	0.620	strong_mention_match
0.664	0.487	0.562	strong_typed_mention_match
0.000	0.000	0.000	entity_match
0.632	0.344	0.445	b_cubed
0.647	0.475	0.548	mention_ceaf
0.600	0.441	0.508	typed_mention_ceaf

ONLY Chinese documents:
Prec	Recall	F1	Metric
0.774	0.431	0.554	strong_mention_match
0.694	0.386	0.496	strong_typed_mention_match
0.000	0.000	0.000	entity_match
0.728	0.273	0.397	b_cubed
0.734	0.408	0.524	mention_ceaf
0.661	0.368	0.472	typed_mention_ceaf

ONLY Spanish documents:
Prec	Recall	F1	Metric
0.000	0.000	0.000	strong_mention_match
0.000	0.000	0.000	strong_typed_mention_match
0.000	0.000	0.000	entity_match
0.000	0.000	0.000	b_cubed
0.000	0.000	0.000	mention_ceaf
0.000	0.000	0.000	typed_mention_ceaf


Slot Filling Evaluation:

Metric                   RunID               Hop Prec   Recall F1    
SF-ALL-Micro             Stanford_KB_XLING_1 0   0.4017 0.0684 0.1169
SF-ALL-Micro             Stanford_KB_XLING_1 1   0.1890 0.0351 0.0592
SF-ALL-Micro             Stanford_KB_XLING_1 ALL 0.3286 0.0576 0.0980
SF-ALL-Macro             Stanford_KB_XLING_1 0   0.1317 0.1065 0.1030
SF-ALL-Macro             Stanford_KB_XLING_1 1   0.0394 0.0383 0.0374
SF-ALL-Macro             Stanford_KB_XLING_1 ALL 0.0845 0.0716 0.0694
LDC-MAX-ALL-Micro        Stanford_KB_XLING_1 0   0.4626 0.1332 0.2068
LDC-MAX-ALL-Micro        Stanford_KB_XLING_1 1   0.2323 0.0754 0.1139
LDC-MAX-ALL-Micro        Stanford_KB_XLING_1 ALL 0.3791 0.1138 0.1751
LDC-MAX-ALL-Macro        Stanford_KB_XLING_1 0   0.2444 0.1922 0.1912
LDC-MAX-ALL-Macro        Stanford_KB_XLING_1 1   0.0848 0.0835 0.0816
LDC-MAX-ALL-Macro        Stanford_KB_XLING_1 ALL 0.1654 0.1384 0.1370
LDC-MEAN-ALL-Macro       Stanford_KB_XLING_1 0   0.1371 0.1149 0.1116
LDC-MEAN-ALL-Macro       Stanford_KB_XLING_1 1   0.0478 0.0455 0.0446
LDC-MEAN-ALL-Macro       Stanford_KB_XLING_1 ALL 0.0929 0.0806 0.0785

*ALL-Macro Prec, Recall and F1 refer to mean-precision, mean-recall and mean-F1.

NIL-DETECTION P/R/F1:				0.2995 0.9350 0.4537


*************************************************************

Run ID:  Stanford_KB_XLING_2
Did the run access the live Web during the evaluation window:  No
Did the run extract relations from the Cold Start source corpus: Yes
Did the run generate meaningful confidence values: Yes
Run number of the English KB system that is most closely configured to the English component of this run: 3
Run number of the Spanish KB system that is most closely configured to the Spanish component of this run: NA
Run number of the Chinese KB system that is most closely configured to the Chinese component of this run: 2

Entity Discovery Evaluation:

ALL English, Chinese, and Spanish documents:
Prec	Recall	F1	Metric
0.751	0.350	0.478	strong_mention_match
0.677	0.316	0.431	strong_typed_mention_match
0.000	0.000	0.000	entity_match
0.667	0.173	0.275	b_cubed
0.663	0.310	0.422	mention_ceaf
0.607	0.283	0.387	typed_mention_ceaf

ONLY English documents:
Prec	Recall	F1	Metric
0.733	0.538	0.620	strong_mention_match
0.664	0.487	0.562	strong_typed_mention_match
0.000	0.000	0.000	entity_match
0.632	0.344	0.445	b_cubed
0.647	0.475	0.548	mention_ceaf
0.600	0.441	0.508	typed_mention_ceaf

ONLY Chinese documents:
Prec	Recall	F1	Metric
0.775	0.431	0.554	strong_mention_match
0.695	0.386	0.497	strong_typed_mention_match
0.000	0.000	0.000	entity_match
0.729	0.273	0.397	b_cubed
0.734	0.408	0.525	mention_ceaf
0.661	0.368	0.473	typed_mention_ceaf

ONLY Spanish documents:
Prec	Recall	F1	Metric
0.000	0.000	0.000	strong_mention_match
0.000	0.000	0.000	strong_typed_mention_match
0.000	0.000	0.000	entity_match
0.000	0.000	0.000	b_cubed
0.000	0.000	0.000	mention_ceaf
0.000	0.000	0.000	typed_mention_ceaf


Slot Filling Evaluation:

Metric                   RunID               Hop Prec   Recall F1    
SF-ALL-Micro             Stanford_KB_XLING_2 0   0.1855 0.1302 0.1530
SF-ALL-Micro             Stanford_KB_XLING_2 1   0.0297 0.0672 0.0412
SF-ALL-Micro             Stanford_KB_XLING_2 ALL 0.0909 0.1098 0.0995
SF-ALL-Macro             Stanford_KB_XLING_2 0   0.1122 0.1224 0.1030
SF-ALL-Macro             Stanford_KB_XLING_2 1   0.0630 0.0772 0.0647
SF-ALL-Macro             Stanford_KB_XLING_2 ALL 0.0870 0.0992 0.0834
LDC-MAX-ALL-Micro        Stanford_KB_XLING_2 0   0.2076 0.2440 0.2243
LDC-MAX-ALL-Micro        Stanford_KB_XLING_2 1   0.0411 0.1311 0.0626
LDC-MAX-ALL-Micro        Stanford_KB_XLING_2 ALL 0.1113 0.2062 0.1446
LDC-MAX-ALL-Macro        Stanford_KB_XLING_2 0   0.2132 0.2235 0.1930
LDC-MAX-ALL-Macro        Stanford_KB_XLING_2 1   0.1291 0.1559 0.1328
LDC-MAX-ALL-Macro        Stanford_KB_XLING_2 ALL 0.1716 0.1901 0.1632
LDC-MEAN-ALL-Macro       Stanford_KB_XLING_2 0   0.1224 0.1333 0.1144
LDC-MEAN-ALL-Macro       Stanford_KB_XLING_2 1   0.0726 0.0888 0.0738
LDC-MEAN-ALL-Macro       Stanford_KB_XLING_2 ALL 0.0977 0.1113 0.0944

*ALL-Macro Prec, Recall and F1 refer to mean-precision, mean-recall and mean-F1.

NIL-DETECTION P/R/F1:				0.2690 0.8049 0.4032


*************************************************************

Run ID:  Stanford_KB_XLING_3
Did the run access the live Web during the evaluation window:  No
Did the run extract relations from the Cold Start source corpus: Yes
Did the run generate meaningful confidence values: Yes
Run number of the English KB system that is most closely configured to the English component of this run: 2
Run number of the Spanish KB system that is most closely configured to the Spanish component of this run: NA
Run number of the Chinese KB system that is most closely configured to the Chinese component of this run: 1

Entity Discovery Evaluation:

ALL English, Chinese, and Spanish documents:
Prec	Recall	F1	Metric
0.750	0.350	0.478	strong_mention_match
0.677	0.316	0.431	strong_typed_mention_match
0.000	0.000	0.000	entity_match
0.667	0.173	0.275	b_cubed
0.663	0.310	0.422	mention_ceaf
0.607	0.283	0.386	typed_mention_ceaf

ONLY English documents:
Prec	Recall	F1	Metric
0.733	0.538	0.620	strong_mention_match
0.664	0.487	0.562	strong_typed_mention_match
0.000	0.000	0.000	entity_match
0.632	0.344	0.445	b_cubed
0.647	0.475	0.548	mention_ceaf
0.600	0.441	0.508	typed_mention_ceaf

ONLY Chinese documents:
Prec	Recall	F1	Metric
0.774	0.431	0.554	strong_mention_match
0.694	0.386	0.496	strong_typed_mention_match
0.000	0.000	0.000	entity_match
0.728	0.273	0.397	b_cubed
0.734	0.408	0.524	mention_ceaf
0.661	0.368	0.472	typed_mention_ceaf

ONLY Spanish documents:
Prec	Recall	F1	Metric
0.000	0.000	0.000	strong_mention_match
0.000	0.000	0.000	strong_typed_mention_match
0.000	0.000	0.000	entity_match
0.000	0.000	0.000	b_cubed
0.000	0.000	0.000	mention_ceaf
0.000	0.000	0.000	typed_mention_ceaf


Slot Filling Evaluation:

Metric                   RunID               Hop Prec   Recall F1    
SF-ALL-Micro             Stanford_KB_XLING_3 0   0.5277 0.0489 0.0894
SF-ALL-Micro             Stanford_KB_XLING_3 1   0.4000 0.0163 0.0313
SF-ALL-Micro             Stanford_KB_XLING_3 ALL 0.5054 0.0383 0.0712
SF-ALL-Macro             Stanford_KB_XLING_3 0   0.0979 0.0707 0.0720
SF-ALL-Macro             Stanford_KB_XLING_3 1   0.0215 0.0224 0.0215
SF-ALL-Macro             Stanford_KB_XLING_3 ALL 0.0588 0.0460 0.0462
LDC-MAX-ALL-Micro        Stanford_KB_XLING_3 0   0.5616 0.0943 0.1615
LDC-MAX-ALL-Micro        Stanford_KB_XLING_3 1   0.4800 0.0393 0.0727
LDC-MAX-ALL-Micro        Stanford_KB_XLING_3 ALL 0.5455 0.0759 0.1332
LDC-MAX-ALL-Macro        Stanford_KB_XLING_3 0   0.1809 0.1325 0.1352
LDC-MAX-ALL-Macro        Stanford_KB_XLING_3 1   0.0520 0.0522 0.0511
LDC-MAX-ALL-Macro        Stanford_KB_XLING_3 ALL 0.1171 0.0928 0.0936
LDC-MEAN-ALL-Macro       Stanford_KB_XLING_3 0   0.1016 0.0775 0.0785
LDC-MEAN-ALL-Macro       Stanford_KB_XLING_3 1   0.0281 0.0274 0.0269
LDC-MEAN-ALL-Macro       Stanford_KB_XLING_3 ALL 0.0652 0.0527 0.0530

*ALL-Macro Prec, Recall and F1 refer to mean-precision, mean-recall and mean-F1.

NIL-DETECTION P/R/F1:				0.3085 0.9756 0.4688


*************************************************************

Run ID:  Stanford_KB_XLING_4
Did the run access the live Web during the evaluation window:  No
Did the run extract relations from the Cold Start source corpus: Yes
Did the run generate meaningful confidence values: Yes
Run number of the English KB system that is most closely configured to the English component of this run: 1
Run number of the Spanish KB system that is most closely configured to the Spanish component of this run: NA
Run number of the Chinese KB system that is most closely configured to the Chinese component of this run: 2

Entity Discovery Evaluation:

ALL English, Chinese, and Spanish documents:
Prec	Recall	F1	Metric
0.750	0.350	0.478	strong_mention_match
0.677	0.316	0.431	strong_typed_mention_match
0.000	0.000	0.000	entity_match
0.667	0.173	0.275	b_cubed
0.663	0.310	0.422	mention_ceaf
0.607	0.284	0.387	typed_mention_ceaf

ONLY English documents:
Prec	Recall	F1	Metric
0.732	0.538	0.620	strong_mention_match
0.664	0.488	0.563	strong_typed_mention_match
0.000	0.000	0.000	entity_match
0.631	0.344	0.445	b_cubed
0.647	0.475	0.548	mention_ceaf
0.601	0.441	0.508	typed_mention_ceaf

ONLY Chinese documents:
Prec	Recall	F1	Metric
0.775	0.431	0.554	strong_mention_match
0.695	0.386	0.497	strong_typed_mention_match
0.000	0.000	0.000	entity_match
0.729	0.273	0.397	b_cubed
0.734	0.408	0.525	mention_ceaf
0.661	0.368	0.473	typed_mention_ceaf

ONLY Spanish documents:
Prec	Recall	F1	Metric
0.000	0.000	0.000	strong_mention_match
0.000	0.000	0.000	strong_typed_mention_match
0.000	0.000	0.000	entity_match
0.000	0.000	0.000	b_cubed
0.000	0.000	0.000	mention_ceaf
0.000	0.000	0.000	typed_mention_ceaf


Slot Filling Evaluation:

Metric                   RunID               Hop Prec   Recall F1    
SF-ALL-Micro             Stanford_KB_XLING_4 0   0.1934 0.1622 0.1764
SF-ALL-Micro             Stanford_KB_XLING_4 1   0.0311 0.0891 0.0461
SF-ALL-Micro             Stanford_KB_XLING_4 ALL 0.0927 0.1385 0.1110
SF-ALL-Macro             Stanford_KB_XLING_4 0   0.1164 0.1492 0.1167
SF-ALL-Macro             Stanford_KB_XLING_4 1   0.0826 0.1049 0.0867
SF-ALL-Macro             Stanford_KB_XLING_4 ALL 0.0991 0.1265 0.1014
LDC-MAX-ALL-Micro        Stanford_KB_XLING_4 0   0.2024 0.3036 0.2429
LDC-MAX-ALL-Micro        Stanford_KB_XLING_4 1   0.0271 0.1738 0.0470
LDC-MAX-ALL-Micro        Stanford_KB_XLING_4 ALL 0.0827 0.2600 0.1255
LDC-MAX-ALL-Macro        Stanford_KB_XLING_4 0   0.2189 0.2719 0.2179
LDC-MAX-ALL-Macro        Stanford_KB_XLING_4 1   0.1593 0.2009 0.1669
LDC-MAX-ALL-Macro        Stanford_KB_XLING_4 ALL 0.1894 0.2368 0.1927
LDC-MEAN-ALL-Macro       Stanford_KB_XLING_4 0   0.1287 0.1614 0.1296
LDC-MEAN-ALL-Macro       Stanford_KB_XLING_4 1   0.0890 0.1151 0.0932
LDC-MEAN-ALL-Macro       Stanford_KB_XLING_4 ALL 0.1091 0.1385 0.1116

*ALL-Macro Prec, Recall and F1 refer to mean-precision, mean-recall and mean-F1.

NIL-DETECTION P/R/F1:				0.2590 0.7642 0.3869