==========================================================
TAC KBP 2016 CROSS-LINGUAL SLOT FILLING EVALUATION RESULTS
==========================================================


Team ID:  Stanford
Organization:  Stanford University


*************************************************************

Run ID:  Stanford_SF_XLING_1
Did the run access the live Web during the evaluation window:  No
Did the run extract relations from the Cold Start source corpus: Yes
Did the run generate meaningful confidence values: Yes
Run number of the English SF system that is most closely configured to the English component of this run: 1
Run number of the Spanish SF system that is most closely configured to the Spanish component of this run: NA
Run number of the Chinese SF system that is most closely configured to the Chinese component of this run: 1

Slot Filling Evaluation:

Metric                   RunID               Hop Prec   Recall F1    
SF-ALL-Micro             Stanford_SF_XLING_1 0   0.4313 0.0552 0.0979
SF-ALL-Micro             Stanford_SF_XLING_1 1   0.1616 0.0188 0.0337
SF-ALL-Micro             Stanford_SF_XLING_1 ALL 0.3493 0.0434 0.0772
SF-ALL-Macro             Stanford_SF_XLING_1 0   0.1034 0.0819 0.0807
SF-ALL-Macro             Stanford_SF_XLING_1 1   0.0235 0.0235 0.0228
SF-ALL-Macro             Stanford_SF_XLING_1 ALL 0.0626 0.0521 0.0511
LDC-MAX-ALL-Micro        Stanford_SF_XLING_1 0   0.5000 0.1216 0.1956
LDC-MAX-ALL-Micro        Stanford_SF_XLING_1 1   0.1543 0.0410 0.0648
LDC-MAX-ALL-Micro        Stanford_SF_XLING_1 ALL 0.3772 0.0946 0.1512
LDC-MAX-ALL-Macro        Stanford_SF_XLING_1 0   0.2188 0.1754 0.1747
LDC-MAX-ALL-Macro        Stanford_SF_XLING_1 1   0.0508 0.0497 0.0487
LDC-MAX-ALL-Macro        Stanford_SF_XLING_1 ALL 0.1357 0.1132 0.1124
LDC-MEAN-ALL-Macro       Stanford_SF_XLING_1 0   0.1083 0.0898 0.0877
LDC-MEAN-ALL-Macro       Stanford_SF_XLING_1 1   0.0291 0.0276 0.0273
LDC-MEAN-ALL-Macro       Stanford_SF_XLING_1 ALL 0.0691 0.0590 0.0578

*ALL-Macro Prec, Recall and F1 refer to mean-precision, mean-recall and mean-F1.

NIL-DETECTION P/R/F1:				0.2940 0.9106 0.4445


*************************************************************

Run ID:  Stanford_SF_XLING_2
Did the run access the live Web during the evaluation window:  No
Did the run extract relations from the Cold Start source corpus: Yes
Did the run generate meaningful confidence values: Yes
Run number of the English SF system that is most closely configured to the English component of this run: 1
Run number of the Spanish SF system that is most closely configured to the Spanish component of this run: NA
Run number of the Chinese SF system that is most closely configured to the Chinese component of this run: 1

Slot Filling Evaluation:

Metric                   RunID               Hop Prec   Recall F1    
SF-ALL-Micro             Stanford_SF_XLING_2 0   0.4099 0.0584 0.1022
SF-ALL-Micro             Stanford_SF_XLING_2 1   0.0947 0.0254 0.0401
SF-ALL-Micro             Stanford_SF_XLING_2 ALL 0.2601 0.0477 0.0806
SF-ALL-Macro             Stanford_SF_XLING_2 0   0.1235 0.0902 0.0936
SF-ALL-Macro             Stanford_SF_XLING_2 1   0.0301 0.0298 0.0291
SF-ALL-Macro             Stanford_SF_XLING_2 ALL 0.0757 0.0593 0.0606
LDC-MAX-ALL-Micro        Stanford_SF_XLING_2 0   0.4697 0.1282 0.2014
LDC-MAX-ALL-Micro        Stanford_SF_XLING_2 1   0.1020 0.0590 0.0748
LDC-MAX-ALL-Micro        Stanford_SF_XLING_2 ALL 0.2796 0.1050 0.1527
LDC-MAX-ALL-Macro        Stanford_SF_XLING_2 0   0.2659 0.1981 0.2054
LDC-MAX-ALL-Macro        Stanford_SF_XLING_2 1   0.0661 0.0637 0.0629
LDC-MAX-ALL-Macro        Stanford_SF_XLING_2 ALL 0.1670 0.1316 0.1349
LDC-MEAN-ALL-Macro       Stanford_SF_XLING_2 0   0.1331 0.1034 0.1061
LDC-MEAN-ALL-Macro       Stanford_SF_XLING_2 1   0.0381 0.0363 0.0360
LDC-MEAN-ALL-Macro       Stanford_SF_XLING_2 ALL 0.0861 0.0702 0.0714

*ALL-Macro Prec, Recall and F1 refer to mean-precision, mean-recall and mean-F1.

NIL-DETECTION P/R/F1:				0.2995 0.9350 0.4537


*************************************************************

Run ID:  Stanford_SF_XLING_3
Did the run access the live Web during the evaluation window:  No
Did the run extract relations from the Cold Start source corpus: Yes
Did the run generate meaningful confidence values: Yes
Run number of the English SF system that is most closely configured to the English component of this run: 1
Run number of the Spanish SF system that is most closely configured to the Spanish component of this run: NA
Run number of the Chinese SF system that is most closely configured to the Chinese component of this run: 1

Slot Filling Evaluation:

Metric                   RunID               Hop Prec   Recall F1    
SF-ALL-Micro             Stanford_SF_XLING_3 0   0.2010 0.0889 0.1233
SF-ALL-Micro             Stanford_SF_XLING_3 1   0.0348 0.0443 0.0390
SF-ALL-Micro             Stanford_SF_XLING_3 ALL 0.1047 0.0744 0.0870
SF-ALL-Macro             Stanford_SF_XLING_3 0   0.1242 0.1218 0.1091
SF-ALL-Macro             Stanford_SF_XLING_3 1   0.0418 0.0534 0.0442
SF-ALL-Macro             Stanford_SF_XLING_3 ALL 0.0821 0.0868 0.0759
LDC-MAX-ALL-Micro        Stanford_SF_XLING_3 0   0.2837 0.1985 0.2336
LDC-MAX-ALL-Micro        Stanford_SF_XLING_3 1   0.0441 0.0951 0.0603
LDC-MAX-ALL-Micro        Stanford_SF_XLING_3 ALL 0.1379 0.1638 0.1497
LDC-MAX-ALL-Macro        Stanford_SF_XLING_3 0   0.2445 0.2485 0.2249
LDC-MAX-ALL-Macro        Stanford_SF_XLING_3 1   0.0925 0.1169 0.0975
LDC-MAX-ALL-Macro        Stanford_SF_XLING_3 ALL 0.1693 0.1834 0.1619
LDC-MEAN-ALL-Macro       Stanford_SF_XLING_3 0   0.1321 0.1317 0.1186
LDC-MEAN-ALL-Macro       Stanford_SF_XLING_3 1   0.0451 0.0567 0.0469
LDC-MEAN-ALL-Macro       Stanford_SF_XLING_3 ALL 0.0891 0.0946 0.0832

*ALL-Macro Prec, Recall and F1 refer to mean-precision, mean-recall and mean-F1.

NIL-DETECTION P/R/F1:				0.2690 0.8049 0.4032


*************************************************************

Run ID:  Stanford_SF_XLING_4
Did the run access the live Web during the evaluation window:  No
Did the run extract relations from the Cold Start source corpus: Yes
Did the run generate meaningful confidence values: Yes
Run number of the English SF system that is most closely configured to the English component of this run: 4
Run number of the Spanish SF system that is most closely configured to the Spanish component of this run: NA
Run number of the Chinese SF system that is most closely configured to the Chinese component of this run: 4

Slot Filling Evaluation:

Metric                   RunID               Hop Prec   Recall F1    
SF-ALL-Micro             Stanford_SF_XLING_4 0   0.5756 0.0437 0.0813
SF-ALL-Micro             Stanford_SF_XLING_4 1   0.2544 0.0148 0.0279
SF-ALL-Micro             Stanford_SF_XLING_4 ALL 0.4894 0.0343 0.0642
SF-ALL-Macro             Stanford_SF_XLING_4 0   0.0990 0.0708 0.0740
SF-ALL-Macro             Stanford_SF_XLING_4 1   0.0189 0.0174 0.0174
SF-ALL-Macro             Stanford_SF_XLING_4 ALL 0.0580 0.0435 0.0451
LDC-MAX-ALL-Micro        Stanford_SF_XLING_4 0   0.6333 0.0943 0.1641
LDC-MAX-ALL-Micro        Stanford_SF_XLING_4 1   0.2958 0.0344 0.0617
LDC-MAX-ALL-Micro        Stanford_SF_XLING_4 ALL 0.5378 0.0742 0.1304
LDC-MAX-ALL-Macro        Stanford_SF_XLING_4 0   0.2109 0.1557 0.1615
LDC-MAX-ALL-Macro        Stanford_SF_XLING_4 1   0.0440 0.0378 0.0387
LDC-MAX-ALL-Macro        Stanford_SF_XLING_4 ALL 0.1283 0.0974 0.1007
LDC-MEAN-ALL-Macro       Stanford_SF_XLING_4 0   0.1081 0.0846 0.0866
LDC-MEAN-ALL-Macro       Stanford_SF_XLING_4 1   0.0265 0.0230 0.0235
LDC-MEAN-ALL-Macro       Stanford_SF_XLING_4 ALL 0.0677 0.0541 0.0554

*ALL-Macro Prec, Recall and F1 refer to mean-precision, mean-recall and mean-F1.

NIL-DETECTION P/R/F1:				0.3085 0.9756 0.4688