Memory Efficient Storing Scheme

For my research..
You can train or test MobileNet/MobileNetV2/ResNet/VGG on CIFAR10/CIFAR100/ImageNet.
Specially, you can train or test on any device (CPU/sinlge GPU/multi GPU) and resume on different device environment available.

Requirements

python 3.5+
pytorch 1.0+
torchvision 0.4+
numpy
tqdm
rich (for beautiful log on console) (not yet)
requests (for downloading pretrained checkpoint and imagenet dataset)
sacred (for logging on omniboard)
pymongo (for logging on omniboard)
cv2 (opencv-python)(for guided image filter loss) (not yet)

Details of storing version

v1: just using k
v2
- v2
- v2q ($\alpha$, $\beta$ quantization and adding epsilon in denom of $\alpha$)
- v2qq (quantization ver with $\alpha$, $\beta$ quantization and adding epsilon in denom of $\alpha$)
- v2f (fixed index $k$ during retraining time with v2qq)
- v2nb (no $\beta$ with v2)
- v2qnb (no $\beta$ with v2q)
- v2qqnb (no $\beta$ with v2qq)

Details of training each dataset

CIFAR
- initial lr: 0.1
- wd: 5e-4
- epochs: 200
ImageNet
- initial lr: 0.1
- wd: 4e-5? 1e-5?
- epochs: 90

Files

check_model_params.py: optional file for calculating number of parameters
config.py: set configuration
data.py: data loading
down_ckpt.py: download checkpoints of pretrained models
down_ckpt_all.sh: shell file of downloading all checkpoints of pretrained models
down_imagenet.py: download the ImageNet dataset (ILSVRC2012 ver.)
find_similar_kernel.py: find similar kernel
main.py: main python file for training or testing
models
- __init__.py
- mobilenet.py
- mobilenetv2.py
- resnet.py
- vgg.py
quantize.py
sh_mobile_imagenet_v2qq.sh
sh_resnet_imagenet_v2qq.sh
sh_v2nb_cifar_test.sh
sh_v2qq_cifar_run.sh
sh_v2qq_cifar_test.sh
sh_vgg16_cifar_v2qq.sh
torchvision_to_ours.py: change checkpoints files from pretrained models in torchvision to our states
utils.py

How to download the ImageNet data

usage: down_imagenet.py [-h] [--datapath PATH]

optional arguments:
  -h, --help       show this help message and exit
  --datapath PATH  Where you want to save ImageNet? (default: ../data)

usage

$ python3 down_imagenet.py

Please check the datapath
Match the same as the datapath argument used by main.py.

How to download a pretrained Model

The pretrained models of VGG trained on ImegeNet is not available now..
All the checkpoint files of ResNets trained on ImageNet are from the official torchvision models.
So, if you use that checkpoints, you can't resume right condition..
But, you can retrain or apply the MESS using those checkpoints.

Usage

$ python down_ckpt.py imagenet -a mobilenet -o pretrained_model.pth

for downloading all checkpoints

$ ./down_ckpt_all.sh

How to train / test networks

usage: main.py [-h] [-a ARCH] [-j N] [--epochs N] [-b N] [--lr LR]
               [--momentum M] [--wd W] [--layers N] [--bn] [--width-mult WM]
               [--groups N] [-p N] [--ckpt PATH] [-R] [-E] [-C] [-T]
               [-g GPUIDS [GPUIDS ...]] [--datapath PATH] [-v V] [-d N]
               [-pwd N] [-N] [-s N] [--nl] [--nls NLS] [--pl] [--pls PLS] [-Q]
               [--np] [--qb N] [--qba N] [--qbb N]
               DATA

positional arguments:
  DATA                  dataset: cifar10 | cifar100 | imagenet (default:
                        cifar10)

optional arguments:
  -h, --help            show this help message and exit
  -a ARCH, --arch ARCH  model architecture: mobilenet | mobilenetv2 | resnet |
                        vgg (default: mobilenet)
  -j N, --workers N     number of data loading workers (default: 8)
  --epochs N            number of total epochs to run (default: 200)
  -b N, --batch-size N  mini-batch size (default: 128), this is the total
                        batch size of all GPUs on the current node when using
                        Data Parallel
  --lr LR, --learning-rate LR
                        initial learning rate (defualt: 0.1)
  --momentum M          momentum (default: 0.9)
  --wd W, --weight-decay W
                        weight decay (default: 5e-4)
  --layers N            number of layers in VGG/ResNet/WideResNet
                        (default: 16)
  --bn, --batch-norm    Use batch norm in VGG?
  --width-mult WM       width multiplier to thin a network uniformly at each
                        layer (default: 1.0)
  -p N, --print-freq N  print frequency (default: 100)
  --ckpt PATH           Path of checkpoint for resuming/testing or retraining
                        model (Default: none)
  -R, --resume          Resume model?
  -E, --evaluate        Test model?
  -C, --cuda            Use cuda?
  -T, --retrain         Retraining?
  -g GPUIDS [GPUIDS ...], --gpuids GPUIDS [GPUIDS ...]
                        GPU IDs for using (Default: 0)
  --datapath PATH       where you want to load/save your dataset? (default:
                        ../data)
  -v V, --version V     version: v2 | v2q | v2qq | v2f | v2nb
                        (find kernel version (default: none))
  -d N, --bind-size N   the number of binding channels in convolution
                        (subvector size) on version 2.5 (v2.5) (default: 2)
  -pwd N, --pw-bind-size N
                        the number of binding channels in pointwise
                        convolution (subvector size) (default: 8)
  -N, --new             new method?
  -s N, --save-epoch N  number of epochs to save checkpoint and to apply new
                        method
  --nl, --nuc-loss      nuclear norm loss?
  --nls NLS, --nl-scale NLS
                        scale factor of nuc_loss
  --pl, --pcc-loss      pearson correlation coefficient loss?
  --pls PLS, --pl-scale PLS
                        scale factor of pcc_loss
  -Q, --quant           use quantization?
  --np                  no v2-like method in pointwise convolutional layer?
  --qb N, --quant-bit N
                        number of bits for quantization (Default: 8)
  --qba N, --quant_bit_a N
                        number of bits for quantizing alphas (Default: 8)
  --qbb N, --quant_bit_b N
                        number of bits for quantizing betas (Default: 8)

Training

Train one network with a certain dataset

$ python main.py cifar10 -a mobilenet -C -g 0 1 2 3 -b 256

Resume training

$ python main.py cifar10 -a mobilenet -C -g 0 1 2 3 -b 256 -R --ckpt ckpt_epoch_50.pth

Test

$ python main.py cifar10 -a mobilenet -C -g 0 1 2 3 -b 256 -E --ckpt ckpt_best.pth

Delete Checkpoints (without best validation accuracy checkpoint)

$ rm -f checkpoint/*/*/ckpt_epoch_*.pth

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
analysis.ipynb		analysis.ipynb
check_model_params.py		check_model_params.py
config.py		config.py
data.py		data.py
down_ckpt.py		down_ckpt.py
down_ckpt_all.sh		down_ckpt_all.sh
down_imagenet.py		down_imagenet.py
find_similar_kernel.py		find_similar_kernel.py
main.py		main.py
profile_test.py		profile_test.py
quantize.py		quantize.py
requirements.txt		requirements.txt
sh_baseline_cifar.sh		sh_baseline_cifar.sh
sh_cifar_corl_run_v2.sh		sh_cifar_corl_run_v2.sh
sh_cifar_groupcorl_run_v2.sh		sh_cifar_groupcorl_run_v2.sh
sh_cifar_orthocorl_run_v2.sh		sh_cifar_orthocorl_run_v2.sh
sh_cifar_orthol_n_corl_run_v2.sh		sh_cifar_orthol_n_corl_run_v2.sh
sh_cifar_orthol_run_v2.sh		sh_cifar_orthol_run_v2.sh
sh_cifar_quantize_finetune.sh		sh_cifar_quantize_finetune.sh
sh_cifar_quantize_no_finetune.sh		sh_cifar_quantize_no_finetune.sh
sh_v1_cifar_test.sh		sh_v1_cifar_test.sh
sh_v2_cifar_run.sh		sh_v2_cifar_run.sh
sh_v2_cifar_test.sh		sh_v2_cifar_test.sh
sh_v2_cifar_tvl_warm_run.sh		sh_v2_cifar_tvl_warm_run.sh
sh_v2_cifar_warm_run.sh		sh_v2_cifar_warm_run.sh
sh_v2_ustv1_cifar_test.sh		sh_v2_ustv1_cifar_test.sh
sh_v2_ustv1v2_cifar_test.sh		sh_v2_ustv1v2_cifar_test.sh
sh_v2_ustv2_cifar_test.sh		sh_v2_ustv2_cifar_test.sh
sh_v2nb_cifar_test.sh		sh_v2nb_cifar_test.sh
sh_v2q_cifar_test.sh		sh_v2q_cifar_test.sh
sh_v2qq_cifar_run.sh		sh_v2qq_cifar_run.sh
sh_v2qq_cifar_test.sh		sh_v2qq_cifar_test.sh
torchvision_to_ours.py		torchvision_to_ours.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Memory Efficient Storing Scheme

Requirements

Details of storing version

Details of training each dataset

Files

How to download the ImageNet data

usage

How to download a pretrained Model

Usage

How to train / test networks

Training

Train one network with a certain dataset

Resume training

Test

Delete Checkpoints (without best validation accuracy checkpoint)

References

About

Releases

Packages

Languages

License

mlvc-lab/ILKP

Folders and files

Latest commit

History

Repository files navigation

Memory Efficient Storing Scheme

Requirements

Details of storing version

Details of training each dataset

Files

How to download the ImageNet data

usage

How to download a pretrained Model

Usage

How to train / test networks

Training

Train one network with a certain dataset

Resume training

Test

Delete Checkpoints (without best validation accuracy checkpoint)

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages