Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PyTorch Benchmarks #121

Open
chetkhatri opened this issue Feb 17, 2017 · 5 comments
Open

Add PyTorch Benchmarks #121

chetkhatri opened this issue Feb 17, 2017 · 5 comments

Comments

@chetkhatri
Copy link

No description provided.

@HarshaVardhanP
Copy link

My device specs: NVIDIA GTX 1060, Cuda 8.0, Cudnn: v5

I downloaded some folders from Imagenet. Loaded pre-trained alexnet model and pass images through the network with BatchSize=128. I use inbuilt torch dataloader to load jpeg images.

I am getting 7ms as forward pass which is very fast. I am not sure what I am calculating is correct forward pass time. Please help !

Please look at the code and logs here: https://github.com/HarshaVardhanP/Random

@HarshaVardhanP
Copy link

HarshaVardhanP commented Feb 20, 2017

Using PyTorch example (https://github.com/pytorch/examples/blob/master/imagenet/main.py) with pretrained alexnet and evaluate mode using same data (Batchsize=128). Logs look like these:

=> using pre-trained model 'alexnet'
Test: [0/82] Time 1.680 (1.680) Loss 13.5178 (13.5178) Prec@1 0.000 (0.000) Prec@5 0.000 (0.000)
Test: [10/82] Time 0.279 (0.288) Loss 13.4128 (13.2985) Prec@1 0.000 (0.000) Prec@5 0.000 (0.000)
Test: [20/82] Time 0.048 (0.248) Loss 11.2954 (12.6275) Prec@1 0.000 (0.000) Prec@5 0.000 (0.000)
Test: [30/82] Time 0.048 (0.235) Loss 11.2725 (12.2873) Prec@1 0.000 (0.000) Prec@5 0.781 (0.101)
Test: [40/82] Time 0.136 (0.233) Loss 13.2035 (12.2026) Prec@1 0.000 (0.019) Prec@5 0.000 (0.171)
Test: [50/82] Time 0.457 (0.239) Loss 13.4102 (12.4278) Prec@1 0.000 (0.015) Prec@5 0.000 (0.138)
Test: [60/82] Time 0.067 (0.230) Loss 14.2645 (12.7405) Prec@1 0.000 (0.013) Prec@5 0.000 (0.115)
Test: [70/82] Time 0.334 (0.228) Loss 20.0007 (13.6347) Prec@1 0.000 (0.011) Prec@5 0.000 (0.099)
Test: [80/82] Time 0.236 (0.229) Loss 15.3086 (13.8212) Prec@1 0.000 (0.010) Prec@5 0.000 (0.106)

  • Prec@1 0.010 Prec@5 0.115

I can observe, this code includes data loading time as well while calculating batch time. So, it is 200ms per batch which is slower compared to TF. Please share how to estimate correct forward pass time in PyTorch?
Thanks

@lolz0r
Copy link

lolz0r commented Feb 23, 2017

@HarshaVardhanP I am running a Titan X pascal. Running the script you linked to (https://github.com/HarshaVardhanP/Random) I see an average forward time of around 88ms on pytorch+alexnet. I did increase num_workers to 12 and let it run for more then 20 steps.

The second script you link to also has a backprop + optimization step... that is probably why you see an increased step time.

@HarshaVardhanP
Copy link

@lolz0r Thanks for your comment.

  1. Yeah, I am getting 58ms as avg forward pass when I setup 12 workers. Look at https://github.com/HarshaVardhanP/Random/blob/master/output2.log. You can observe first batch takes lot of time and rest are very fast. Would you please post your logs ?
    By the way, isn't 88ms very slow for Alexnet on Titan X Pascal with 12 workers ?

  2. I am running that script on pretrained and eval mode. There is no backprop in the test step I guess. What would you say ?

@soumith
Copy link
Owner

soumith commented Feb 23, 2017

you need to add two lines to the near top of your script:

import torch.backends.cudnn as cudnn
cudnn.benchmark = True

That will turn on the cudnn autotuner that selects efficient algorithms.

Secondly, convnet-benchmarks itself is based on synthetic data, so if you want to simulate that you simply use dummy data. Here's is your script partly modified to do so.

def main():
	t = time.time()
	global args
    	args = parser.parse_args()
	data_, target_ = torch.randn(128, 3, 224, 224).cuda(). torch.range(1, 128).long().cuda()
	net = models.alexnet() # no need to load pre-trained weights for dummy data
	net.cuda()
	net.eval()
	print(net)

	#passing images thru network
	n = 1
        batch_avgtime=0
	for data, target in val_loader:
		td = time.time()
		data, target = Variable(data_, volatile=True), Variable(target_)
		p = net(data)
                batch_time = time.time()-td	
                batch_avgtime=batch_avgtime+batch_time
                if n==20:
                    break
                n=n+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants