State of the art

State of the art video classification

	UFC-101 Accuracy (3-Fold)	Notes
LRCN+CNN (Donahue)	82.92	Weighted average of RGB (1/3) and Flow (2/3) networks. LRCN after first fully connected CNN Layer
2stream CNN (Simonyan)	88.0	Temporal + Spatial ConvNet. Fusion using SVM. Multi-task learning for temporal ConvNet. SpatialConv net pre-trained on ILSVRC-2012 and fine-tuning only on last layer.
LSTM + 30 Frame Unroll (Yue-Hei Ng)	88.6	Optical Flow + Image Frames