Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hi,When I train the network, the used GPU memery keep going up? #1

Open
leoozy opened this issue Jun 29, 2018 · 4 comments
Open

Hi,When I train the network, the used GPU memery keep going up? #1

leoozy opened this issue Jun 29, 2018 · 4 comments

Comments

@leoozy
Copy link

leoozy commented Jun 29, 2018

Hi,
Thank for your nice code. It is really a beautiful project. When I am trying to train the dataset refcoco, the used GPU memery keep going up. Initially, the used memery is about 4.8GB. After about 1000 iters, there used memery has been added up to 9.2GB. And I read the code carefully, you feed the net one image in every batch (I am not sure). But why the batchsize is constant, but the GPU memery used keep going up?
My GPU is : GTX1080 TI, 11GB.
I want to know is this your final code? And have you ever meet this error?
Thank you!

@yuleiniu
Copy link
Owner

Hi, the problem is that the number of bounding boxes and referring expressions (sentences) varies in different images. You can try to feed only one sentence per iteration, which can help to reduce GPU memory but slightly hurt performance. It should be noted that the batch size in our code is defined over sentences rather than images.

@leoozy
Copy link
Author

leoozy commented Jun 30, 2018

Thank you, I fixed it with two 1080Ti

@leoozy
Copy link
Author

leoozy commented Jul 2, 2018

Hi, I may get some problems when training the unsupervised setting. Everything is fine in the supervised setting. While in unsupervised setting, the result goes wrong:
image
I train the model follow your command on the refcoco datasets. The printed [Nan ...] is the value of scores_val. Could you please tell me how to fix it?

@yuleiniu
Copy link
Owner

yuleiniu commented Jul 4, 2018

I think the problem might be computing log(0) in unsupervised setting. I have modified vc_model.py to prevent this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants