python - Training ResNetv1 From Scratch using Tensorflow Slim -


although stated in slim model train_image_classifier.py can used train models scratch, found hard in practice. in case, trying train resnet scratch on local machine 6xk80s. used this:

dataset_dir=/nv/hmart1/ashaban6/scratch/data/imagenet_rf_record train_dir=/nv/hmart1/ashaban6/scratch/train_dir depth=50 num_clones=8  cuda_visible_devices="0,1,2,3,4,5,6,7,8" python train_image_classifier.py --train_dir=${train_dir} --dataset_name=imagenet --model_name=resnet_v1_${depth} --max_number_of_steps=100000000 --batch_size=32 --learning_rate=0.1 --learning_rate_decay_type=exponential --dataset_split_name=train --dataset_dir=${dataset_dir} --optimizer=momentum --momentum=0.9 --learning_rate_decay_factor=0.1 --num_epochs_per_decay=30 --weight_decay=0.0001 --num_readers=12 --num_clones=$num_clones 

i followed same settings suggested in paper. using 8 gpus on local machine batch_size 32 effective batch size 32x8=256. learning rate set 0.1 , decayed 10 every 30 epochs. after 70k steps (70000x256/1.2e6 ~ 15 epochs), top-1 performance on validation set low ~14% while should around 50% after many iterations. used command top-1 performance:

dataset_dir=/nv/hmart1/ashaban6/scratch/data/imagenet_rf_record checkpoint_file=/nv/hmart1/ashaban6/scratch/train_dir/ depth=50  cuda_visible_devices="10" python eval_image_classifier.py --alsologtostderr --checkpoint_path=${checkpoint_file} --dataset_dir=${dataset_dir} --dataset_name=imagenet --dataset_split_name=validation --model_name=resnet_v1_${depth} 

with lack of working examples hard if there bug in slim training code or problem in script. wrong in script? has trained resent scratch?


Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

jquery - Responsive Navbar with Sub Navbar -