Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Test Set Evaluation in Line 291: F1 Score Discrepancy between thre_loader and test_loader #62

Open
mojtaba-nafez opened this issue Dec 6, 2023 · 4 comments

Comments

@mojtaba-nafez
Copy link

mojtaba-nafez commented Dec 6, 2023

Hi there,

Excellent job on this! However, I've identified a potential issue in your code related to testing. I'm currently working with the MSL dataset and, upon reviewing your code—specifically at line 291 (following the comment: # (3) evaluation on the test set)—I noticed that the model is being evaluated on thre_loader instead of test_loader. Since thre_loader only contains 1% of the test data, the reported F1 score in the paper is 93.59%. However, upon correction, by using test_loader instead of thre_loader, the final F1 score dropped to 86.49%.

I will be looking forward to hearing your thoughts on this potential bug.

@elwoodwgd
Copy link

I also have this question. Why use thre_loader?
1702550496871

@BITGJW
Copy link

BITGJW commented Dec 18, 2023

Same question.

1 similar comment
@lzz19980125
Copy link

Same question.

@DarkFT
Copy link

DarkFT commented Sep 5, 2024

Same question. I think test_loader should be used to find the threshold. It's unfair to find threshold on these:
train energy (5821800,) (train_loader)
test energy (73700,) (thre_loader)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants