Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

U_net_graph_dict key, Ising model #11

Closed
mengxi7 opened this issue Mar 16, 2025 · 11 comments
Closed

U_net_graph_dict key, Ising model #11

mengxi7 opened this issue Mar 16, 2025 · 11 comments

Comments

@mengxi7
Copy link

mengxi7 commented Mar 16, 2025

It seems that U_net needs the value of "U_net_graph_dict" in lattice data, but during the generation of data, there is no computation of "U_net_graph_dict".

@sanokows
Copy link
Collaborator

can you give me some more details of what you have tried?

@mengxi7
Copy link
Author

mengxi7 commented Mar 16, 2025

i use python prepare_datasets.py to generate Ising model data
--datasets parameter ["NxNLattice_4x4", "NxNLattice_8x8", "NxNLattice_16x16"]
--problems parameter IsingModel = ["IsingModel"]

after generation of IsingModel data, i use the command

python argparse_ray_main.py --lrs 0.002 --GPUs 5 --MCMC_steps 400 --temps 0.6 --IsingMode NxNLattice_4x4 --EnergyFunction IsingModel --N_anneal 2000 --n_diffusion_steps 300 --batch_size 20 --n_basis_states 10 --noise_potential bernoulli --project_name IsingRun --seed 123 --graph_mode U_net

and there are two errors:
the first problem is in the initialization of Base class in BaseTrainer.py, which is self.beta = 1 / self.T_target, where self.T_target = 0, but i change parameter parser.add_argument('--T_target', default=1e-5, type = float, help='Define target temperature'), i don't know, is it right or wrong?
the second problem:

Traceback (most recent call last):
  File "/DiffUCO/argparse_ray_main.py", line 348, in <module>
       meanfield_run()
  File "/DiffUCO/argparse_ray_main.py", line 136, in meanfield_run
    detect_and_run_for_loops()
  File "/DiffUCO/argparse_ray_main.py", line 250, in detect_and_run_for_loops
    run(flexible_config=flexible_config, overwrite=True)
  File "/DiffUCO/argparse_ray_main.py", line 337, in run
    train = TrainMeanField(config)
            ^^^^^^^^^^^^^^^^^^^^^^
  File "/DiffUCO/train.py", line 264, in __init__
    self.__init_optimizer_and_params()
  File "/DiffUCO/train.py", line 362, in __init_optimizer_and_params
    self.__init_params()
  File "/DiffUCO/train.py", line 477, in __init_params
    input_graph_list, energy_graphs = self._prepare_graphs(jraph_graph_dict)
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/DiffUCO/train.py", line 1061, in _prepare_graphs
    input_graph_dict = pmap_batch_U_net_graph_dict_and_pad(batch_dict["U_net_graph_dict"], k = self.pad_k)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/DiffUCO/jraph_utils.py", line 400, in pmap_batch_U_net_graph_dict_and_pad
    keys = U_net_graph_dict_list[0].keys()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'keys'

seems need "U_net_graph_dict", but "U_net_graph_dict = None" set by SolutionDataset_InMemory.getitem() function

@sanokows
Copy link
Collaborator

Do you have the latest version of the code?
In the lates version there is no '--graph-mode U_net', there is only '--graph_mode Unet'.

The default for --T_target is 0 because this is the target temperature for CO problems. But for the Ising model, a target temperature of 0 is not supported. Usually, one considers the Ising model at a target temperature T >> 0, so you should set --T_target to a higher temperature.

@mengxi7
Copy link
Author

mengxi7 commented Mar 16, 2025

cause the parameter judge in train.py is U_net, so i change the parameter:
Image

and about the temperature, the corresponding distribution $p(x) \propto exp(-\frac{1}{T} H(x))$ is the same for both co and ising model, why the temperature schedule is different? is the reason that COs are all set as maximization problem but ising model is minimization problem ?

@sanokows
Copy link
Collaborator

The U_Net graph mode is a graph unet that I have tried out and it is depreciated as it did not work well. The Unet which is supported is a standard conv net based unet.

The temperature schedule is the same for Ising and for co. but I think that unbiased sampling is not possible at T = 0 as a target temperature because of the computation of 1/T. so it is not supported for the Ising model. In CO we never have terms that compute 1/T so it is not a problem there

@mengxi7
Copy link
Author

mengxi7 commented Mar 17, 2025

The problem is solved!
Thank you for your detail respond!

@mengxi7
Copy link
Author

mengxi7 commented Mar 17, 2025

I met another problem, when i run:

python argparse_ray_main.py --train_mode PPO --lrs 0.0005 --temps 0.5 --GPUs 0 --minib_diff_steps 7 --n_diffusion_steps 14 --batch_size 140 --n_basis_states 4 --minib_basis_states 4 --relaxed --n_GNN_layers 8 --N_anneal 2000 --IsingMode BA_small --EnergyFunction MaxCut --mode Diffusion --beta_factor 1. --noise_potential bernoulli --multi_gpu --project_name final_runs --mov_average 0.09 --n_rand_nodes 3 --seed 123 --graph_mode normal --inner_loop_steps 1 --diff_schedule exp

but i seems met cuda memory error:

Image

is there any method we can use multiple GPU?

@sanokows
Copy link
Collaborator

you can use multiple gpus by using '--GPUs 0 1 2 3' etc...

@mengxi7
Copy link
Author

mengxi7 commented Mar 17, 2025

sorry for asking problem once again, is there any possible to resume training from a checkpoint?

@sanokows
Copy link
Collaborator

sanokows commented Mar 17, 2025

yes, you can resume training with the usage of the script continue_training.py you just need to specify the GPUs and the wandb ID.

For models trained with PPO there is a small bug that when training is resumed because the PPO uses a moving average of the average and std of the reward and I forgot to implement that those should also be stored and loaded. So for PPO resuming training does not work perfectly. But in most cases this should not be a problem.

@mengxi7
Copy link
Author

mengxi7 commented Mar 18, 2025

OK and thank you for detailed and patient explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants