This is an impelementaion of C3D using Pytorch on UCF101 dataset.
I tried both training from scratch and fine-funed the pre-train model on the sport-1M provided by C3D-caffe.However,the results are not so good,both about 30% top5 accuracy which is faraway from the paper Learning Spatiotemporal Features with 3D Convolutional Networks.I will be appreciate if anybody can give me some advice to improve the accuracy.
I trained the model on UCF101.
For convenience, I sampled pictures from the raw video 10FPS using the ./raw/video2img.sh