Incorrect Test Set Evaluation in Line 291: F1 Score Discrepancy between thre_loader and test_loader #62

mojtaba-nafez · 2023-12-06T23:55:14Z

Hi there,

Excellent job on this! However, I've identified a potential issue in your code related to testing. I'm currently working with the MSL dataset and, upon reviewing your code—specifically at line 291 (following the comment: # (3) evaluation on the test set)—I noticed that the model is being evaluated on thre_loader instead of test_loader. Since thre_loader only contains 1% of the test data, the reported F1 score in the paper is 93.59%. However, upon correction, by using test_loader instead of thre_loader, the final F1 score dropped to 86.49%.

I will be looking forward to hearing your thoughts on this potential bug.

elwoodwgd · 2023-12-14T10:41:50Z

I also have this question. Why use thre_loader?

BITGJW · 2023-12-18T15:31:13Z

Same question.

lzz19980125 · 2024-03-02T06:57:10Z

Same question.

DarkFT · 2024-09-05T03:46:21Z

Same question. I think test_loader should be used to find the threshold. It's unfair to find threshold on these:
train energy (5821800,) (train_loader)
test energy (73700,) (thre_loader)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect Test Set Evaluation in Line 291: F1 Score Discrepancy between thre_loader and test_loader #62

Incorrect Test Set Evaluation in Line 291: F1 Score Discrepancy between thre_loader and test_loader #62

mojtaba-nafez commented Dec 6, 2023 •

edited

Loading

elwoodwgd commented Dec 14, 2023

BITGJW commented Dec 18, 2023

lzz19980125 commented Mar 2, 2024

DarkFT commented Sep 5, 2024

Incorrect Test Set Evaluation in Line 291: F1 Score Discrepancy between thre_loader and test_loader #62

Incorrect Test Set Evaluation in Line 291: F1 Score Discrepancy between thre_loader and test_loader #62

Comments

mojtaba-nafez commented Dec 6, 2023 • edited Loading

elwoodwgd commented Dec 14, 2023

BITGJW commented Dec 18, 2023

lzz19980125 commented Mar 2, 2024

DarkFT commented Sep 5, 2024

mojtaba-nafez commented Dec 6, 2023 •

edited

Loading