Skip to content
Tom Bocklisch edited this page Apr 24, 2015 · 10 revisions

State of the art video classification

UFC-101 Accuracy (3-Fold) Notes
LRCN+CNN (Donahue) 82.92 Weighted average of RGB (1/3) and Flow (2/3) networks. LRCN after first fully connected CNN Layer
2stream CNN (Simonyan) 88.0 Temporal + Spatial ConvNet. Fusion using SVM. Multi-task learning for temporal ConvNet. SpatialConv net pre-trained on ILSVRC-2012 and fine-tuning only on last layer.
LSTM + 30 Frame Unroll (Yue-Hei Ng) 88.6    Optical Flow + Image Frames

Clone this wiki locally