Skip to content

zoeyliu1999/DL-paper-implementation

Repository files navigation

Paper得来终觉浅,绝知此事要coding。

Knowledge obtained on the papers always feels shallow, and it must be known that this thing requires coding.

Purpose

  1. Minimal Practice
  2. Project Notes
  3. Optimization
  4. Algorithm Competition

Basic

1. CNN

Model Link Paper Code
Resnet Deep Residual Learning for Image Recognition
InceptionV3 Rethinking the Inception Architecture for Computer Vision
InceptionV4 Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
MobileNet MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
EfficientNet EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Residual Attention Network Residual Attention Network for Image Classification
Non-deep Networks Non-deep Networks

2. RNN

Model Link Paper Code
LSTM Long Short-term Memory
BiLSTM Bidirectional recurrent neural networks
GRU Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

3. Transformer

Model Link Paper Code
Transformer Attention Is All You Need
BERT BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
GPT-3 Language Models are Few-Shot Learners
ViT An image is worth 16x16 words: Transformers for image recognition at scale

4. Generation

Model Link Paper Code
GAN Generative Adversarial Networks
pix2pix Image-to-Image Translation with Conditional Adversarial Networks
CycleGAN Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
VAE Auto-Encoding Variational Bayes
DDPM Denoising Diffusion Probabilistic Models
Guided Diffusion Diffusion Models Beat GANs on Image Synthesis
DALL.E 2 Hierarchical Text-Conditional Image Generation with CLIP Latents

5. Multimodal

Model Link Paper Code
CLIP Learning Transferable Visual Models From Natural Language Supervision(Connecting Text and Images)
ViLT ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
SimVLM SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
ALBEF Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
VLMo VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
BLIP BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
CYCLIP CyCLIP: Cyclic Contrastive Language-Image Pretraining
+MAE Training Vision-Language Transformers from Captions Alone
VLMixer VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix

Project

1. Object Detection

Model Link Paper Code
R-CNN Rich feature hierarchies for accurate object detection and semantic segmentation
Faster R-CNN Faster R-CNN
YoloV3 You Only Look Once: Unified, Real-time Object Detection
DETR End-to-End Object Detection with Transformers

3. Audio-visual

Model Link Paper Code
Syncnet Out of time: automated lip sync in the wild
Wav2lip A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild

About

Implementation of deep learning papers.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages