更新文档

wAnFen1017 · Feb 26, 2022 · e981136 · e981136
1 parent 608bfec
commit e981136
Show file tree

Hide file tree

Showing 6 changed files with 220 additions and 20 deletions.
diff --git a/README.md b/README.md
@@ -1 +1,162 @@
-# dddd_trainer
+# dddd_trainer 带带弟弟OCR训练工具
+
+### 带带弟弟OCR所用的训练工具今天正式开源啦！ [ddddocr](https://github.com/sml2h3/ddddocr)
+
+### 项目仅支持N卡训练，A卡或其他卡就先别看啦
+
+### 项目基于Pytorch进行开发，支持cnn与crnn进行训练、断点恢复、自动导出onnx模型，并同时支持无缝使用[ddddocr](https://github.com/sml2h3/ddddocr) 与 [ocr_api_server](https://gitee.com/fkgeek/ocr_api_server) 的无缝部署
+
+### 训练环境支持
+
+Windows/Linux
+
+Macos仅支持cpu训练
+
+## 1、深度学习必备环境配置（非仅本项目要求，而是所有深度学习项目要求，cpu训练除外）
+
+### 开始本教程前请先前往[pytorch](https://pytorch.org/get-started/locally/) 官网查看自己系统与硬件支持的pytorch版本，注意30系列之前的N卡，如2080Ti等请选择cuda11以下的版本（例：CUDA 10.2），如果为30系N卡，仅支持CUDA 11版本，请选择CUDA 11以上版本（例：CUDA 11.3），然后根据选择的条件显示的pytorch安装命令完成pytorch安装，由于pytorch的版本更新速度导致很多pypi源仅缓存了cpu版本，CUDA版本需要自己在官网安装。
+
+### 安装CUDA和CUDNN
+
+根据自己显卡型号与系统选择
+
+[cuda](https://developer.nvidia.com/cuda-downloads)
+
+[cudnn](https://developer.nvidia.com/zh-cn/cudnn)
+
+注意cudnn支持的cuda版本号要与你安装的cuda版本号对应，不同版本的cuda支持的显卡不一样，<b>20系无脑选择10.2版本cuda，30系无脑选择11.3版本cuda</b>,这里有啥问题就百度吧，算是一个基础问题。
+
+## 2、训练部分 
+
+- 以下所有变量均以 {param} 格式代替，表示可根据自己需要修改，而使用时并不需要带上{}，如步骤创建新的训练项目，使用时可以直接写
+
+`python app.py create test_project`
+
+- ### 1、Clone本项目到本地
+
+`git clone https://github.com/sml2h3/dddd_trainer.git`
+
+- ### 2、进入项目目录并安装本项目所需依赖
+
+`pip install -r requirements.txt -i https://pypi.douban.com/simple`
+
+- ### 3、创建新的训练项目
+
+`python app.py create {project_name}`
+
+如果想要创建一个CNN的项目，则可以加上--single参数，CNN项目适用于图片宽高是一个确定值和识别结果位数一定的情况
+
+`python app.py create {project_name} --single`
+
+project_name 为项目名称，尽量不要以特殊符号命名
+
+- ### 4、准备数据
+
+    项目支持两种形式的数据
+
+    ### A、从文件名导入
+        
+    图片均在同一个文件夹中，且命名为类似，其中/root/images_set为图片所在目录，可以为任意目录地址
+
+    ```
+  /root/images_set/
+    |---- abcde_随机hash值.jpg
+    |---- sdae_随机hash值.jpg
+    |---- 酱闷肘子_随机hash值.jpg
+  
+  ```
+
+    如下图所示
+
+    ![image](https://cdn.wenanzhe.com/img/mkGu_000001d00f140741741ed9916240d8d5.jpg)
+
+    那么图片命名可以是 
+
+    `mkGu_000001d00f140741741ed9916240d8d5.jpg`
+
+    ### 为考虑各种情况，dddd_trainer不会自动去处理大小写问题，如果想训练大小写，则在样本标注时就需要自己标注好大小写，如上面例子
+
+    ### B、从文件中导入
+
+    受限于可能样本组织形式或者特殊字符，本项目支持从txt文档中导入数据，数据集目录必须包含有`labels.txt`文件和`images`文件夹, 其中/root/images_set为图片所在目录，可以为任意目录地址
+
+    `labels.txt`文件中包含了所有在`/root/images_set/images`目录下基于`/root/images_set/images`的图片相对路径，`/root/images_set/images`下可以有目录。
+
+    #### 当然，在这种模式下，图片的文件名随意，可以有具体label也可以没有，因为咱们不从这里获取图片的label
+
+    如下所示
+- 
+   a.images下无目录的形式
+
+    ```
+  /root/images_set/
+    |---- labels.txt
+    |---- images
+          |---- 随机hash值.jpg
+          |---- 随机hash值.jpg
+          |---- 酱闷肘子_随机hash值.jpg
+
+  labels.txt文件内容为（其中\t制表符为每行文件名与label的分隔符）
+  随机hash值.jpg\tabcd
+  随机hash值.jpg\tsdae
+  酱闷肘子_随机hash值.jpg\t酱闷肘子
+  ```
+  b.images下有目录的形式
+    ```
+  /root/images_set/
+    |---- labels.txt
+    |---- images
+          |---- aaaa
+                |---- 随机hash值.jpg
+          |---- 酱闷肘子_随机hash值.jpg
+  
+  labels.txt文件内容为（其中\t制表符为每行文件名与label的分隔符）
+  aaaa/随机hash值.jpg\tabcd
+  aaaa/随机hash值.jpg\tsdae
+  酱闷肘子_随机hash值.jpg\t酱闷肘子
+  
+  ```
+
+  ### 为了新手更好的理解本部分的内容，本项目也提供了两套基础数据集提供测试
+
+- ### 5、修改配置文件
+```yaml
+Model:
+    CharSet: []     # 字符集，不要动，会自动生成
+    ImageChannel: 1 # 图片通道数，如果你想以灰度图进行训练，则设置为1，彩图，则设置为3。如果设置为1，数据集是彩图，项目会在训练的过程中自动在内存中将读取到的彩图转为灰度图，并不需要提前自己修改并且该设置不会修改本地图片
+    ImageHeight: 64 # 图片自动缩放后的高度，单位为px
+    ImageWidth: -1  # 图片自动缩放后的宽度，单位为px，本项若设置为-1，将自动根据情况调整
+    Word: false     # 是否为CNN模型，这里在创建项目的时候通过参数控制，不要自己修改
+System:
+    Allow_Ext: [jpg, jpeg, png, bmp]  # 支持的图片后缀，不满足的图片将会被自动忽略
+    GPU: true                         # 是否启用GPU去训练，使用GPU训练需要参考步骤一安装好环境
+    GPU_ID: 0                         # GPU设备号，0为第一张显卡
+    Path: ''                          # 数据集根目录，在缓存图片步骤会自动生成，不需要自己改，除非数据集地址改了
+    Project: test                     # 项目名称 也就是{project_name}
+    Val: 0.03                         # 验证集的数据量比例，0.03就是3%，在缓存数据时，会自动选则3%的图片用作训练过程中的数据验证，修改本值之后需要重新缓存数据
+Train:
+    BATCH_SIZE: 32                                    # 训练时每一个batch_size的大小，主要取决于你的显存或内存大小，可以根据自己的情况，多测试，一般为16的倍数,如16，32，64，128
+    CNN: {NAME: ddddocr}                              # 特征提取的模型，目前支持的值为ddddocr,effnetv2_l,effnetv2_m,effnetv2_xl,effnetv2_s,mobilenetv2,mobilenetv3_s,mobilenetv3_l
+    DROPOUT: 0.3                                      # 非专业人员不要动
+    LR: 0.01                                          # 初始学习率
+    OPTIMIZER: SGD                                    # 优化器，不要动
+    SAVE_CHECKPOINTS_STEP: 2000                       # 每多少step保存一次模型
+    TARGET: {Accuracy: 0.97, Cost: 0.05, Epoch: 20}   # 训练结束的目标，同时满足时自动结束训练并保存onnx模型，Accuracy为需要满足的最小准确率，Cost为需要满足的最小损失，Epoch为需要满足的最小训练轮数
+    TEST_BATCH_SIZE: 32                               # 测试时每一个batch_size的大小，主要取决于你的显存或内存大小，可以根据自己的情况，多测试，一般为16的倍数,如16，32，64，128
+    TEST_STEP: 1000                                   # 每多少step进行一次测试
+
+
+```
+配置文件位于本项目根目录下`projects/{project_name}/config.yaml`
+
+- ### 6、缓存数据
+
+`python app.py cache /root/images_set/`
+
+- ### 7、开始训练或恢复训练
+
+`python app.py train {project_name}`
+
+- ### 8、部署
+
+`你们先训练着，我去适配ddddocr和ocr_api_server了，适配完我再继续更新文档`
diff --git a/nets/__init__.py b/nets/__init__.py
@@ -39,35 +39,40 @@ def __init__(self, conf):
         self.charset_len = len(self.charset)
         self.backbone = self.conf['Train']['CNN']['NAME']
         self.paramters = []
-
+        self.word = self.conf['Model']['Word']
         if self.backbone in self.backbones_list:
             test_cnn = self.backbones_list[self.backbone](nc=1)
-            x = torch.randn(1, 1, 64, 224)
+            x = torch.randn(1, 1, self.resize[1], self.resize[1])
             test_features = test_cnn(x)
             del x
             del test_cnn
-            self.out_size = test_features.size()[1] * test_features.size()[2]
+            if self.word:
+                self.out_size = test_features.size()[1] * test_features.size()[2] * test_features.size()[3]
+            else:
+                self.out_size = test_features.size()[1] * test_features.size()[2]
             self.cnn = self.backbones_list[self.backbone](nc=self.image_channel)
         else:
             raise Exception("{} is not found in backbones! backbone list : {}".format(self.backbone, json.dumps(
                 list(self.backbones_list.keys()))))
         self.paramters.append({'params': self.cnn.parameters()})
 
-        self.word = self.conf['Model']['Word']
+
         if not self.word:
             self.dropout = self.conf['Train']['DROPOUT']
             self.lstm = torch.nn.LSTM(input_size=self.out_size, hidden_size=self.out_size, bidirectional=True,
                                       num_layers=1, dropout=self.dropout)
             self.paramters.append({'params': self.lstm.parameters()})
 
             self.loss = torch.nn.CTCLoss(blank=0, reduction='mean')
+            self.fc = torch.nn.Linear(in_features=self.out_size * 2, out_features=self.charset_len)
+
         else:
             self.lstm = None
             self.loss = torch.nn.CrossEntropyLoss()
+            self.fc = torch.nn.Linear(in_features=self.out_size, out_features=self.charset_len)
 
         self.paramters.append({'params': self.loss.parameters()})
 
-        self.fc = torch.nn.Linear(in_features=self.out_size * 2, out_features=self.charset_len)
         self.paramters.append({'params': self.fc.parameters()})
 
         self.lr = self.conf['Train']['LR']
@@ -104,6 +109,7 @@ def get_features(self, inputs):
             outputs = self.fc(outputs)
             outputs = outputs.view(time_step, batch_size, -1)
         else:
+            outputs = outputs.view(outputs.size(0), -1)
             outputs = self.fc(outputs)
         return outputs
 
@@ -146,7 +152,15 @@ def tester(self, inputs, labels, labels_length):
             raise Exception("origin labels length is {}, but pred labels length is {}".format(
                 len(labels_list), len(pred_decode_labels)))
         for ids in range(len(labels_list)):
-            if labels_list[ids] == pred_decode_labels[ids]:
+            if self.word:
+                label_res = labels_list[ids][0]
+
+                pred_res = pred_decode_labels[ids].item()
+            else:
+                label_res = labels_list[ids]
+
+                pred_res = pred_decode_labels[ids]
+            if label_res == pred_res:
                 correct_list.append(ids)
             else:
                 error_list.append(ids)
@@ -183,10 +197,13 @@ def get_random_tensor(self):
         width = self.resize[0]
         height = self.resize[1]
         if width == -1:
-            w = 240
+            if self.word:
+                w = height
+            else:
+                w = 240
             h = height
         else:
-            w = width
+            w = height
             h = height
         return torch.randn(1, self.image_channel, h, w, device='cpu')
 

diff --git a/nets/backbone/ddddocr/ddddocrv1.py b/nets/backbone/ddddocr/ddddocrv1.py
@@ -51,13 +51,9 @@ def forward(self, input):
 
 def test():
     net = DdddOcr(1)
-    x = torch.randn(1, 1, 64, 224)
+    x = torch.randn(1, 1, 128, 128)
     y = net(x)
     print(y.size())
-    y = y.permute(3, 0, 1, 2)
-    w, b, c, h = y.shape
-    y = y.view(w, b, c * h)
-    print(y.size())
 
 if __name__ == '__main__':
     test()
diff --git a/nets/backbone/effcientnet/efficientnetv2.py b/nets/backbone/effcientnet/efficientnetv2.py
@@ -223,13 +223,10 @@ def effnetv2_xl(**kwargs):
 
 def test():
     net = effnetv2_s(nc=1)
-    x = torch.randn(2, 3, 50, 224)
+    x = torch.randn(1, 1, 128, 128)
     y = net(x)
     print(y.size())
-    y = y.permute(3, 0, 1, 2)
-    w, b, c, h = y.shape
-    y = y.view(w, b, c * h)
-    print(y.size())
+
 
 if __name__ == '__main__':
     test()
diff --git a/projects/hcaptcha/config.yaml b/projects/hcaptcha/config.yaml
@@ -0,0 +1,23 @@
+Model:
+    CharSet: [火车, 船, 自行车, 飞机, 水上飞机, 巴士, 一条船, 摩托车, 卡车]
+    ImageChannel: 1
+    ImageHeight: 128
+    ImageWidth: -1
+    Word: true
+System:
+    Allow_Ext: [jpg, jpeg, png, bmp]
+    GPU: true
+    GPU_ID: 0
+    Path: C:\Users\sml2h3\PycharmProjects\hcaptcha\samples
+    Project: hcaptcha
+    Val: 0.03
+Train:
+    BATCH_SIZE: 32
+    CNN: {NAME: ddddocr}
+    DROPOUT: 0.3
+    LR: 0.01
+    OPTIMIZER: SGD
+    SAVE_CHECKPOINTS_STEP: 2000
+    TARGET: {Accuracy: 0.97, Cost: 0.05, Epoch: 20}
+    TEST_BATCH_SIZE: 32
+    TEST_STEP: 1000
diff --git a/utils/load_cache.py b/utils/load_cache.py
@@ -60,7 +60,10 @@ def __getitem__(self, idx):
             width = self.resize[0]
             height = self.resize[1]
             if self.resize[0] == -1:
-                image = image.resize((int(image_width * (height / image_height)), height))
+                if self.word:
+                    image = image.resize((height, height))
+                else:
+                    image = image.resize((int(image_width * (height / image_height)), height))
             else:
                 image = image.resize((width, height))
             label = [int(self.charset.index(item)) for item in list(image_label)]
@@ -97,12 +100,15 @@ def __init__(self, project_name: str):
             exit()
 
         self.config = Config(project_name)
+
         self.conf = self.config.load_config()
 
         self.charset = self.conf['Model']['CharSet']
+
         logger.info("\nCharsets is {}".format(json.dumps(self.charset, ensure_ascii=False)))
 
         self.resize = [int(self.conf['Model']['ImageWidth']), int(self.conf['Model']['ImageHeight'])]
+
         logger.info("\nImage Resize is {}".format(json.dumps(self.resize)))
 
         self.ImageChannel = self.conf['Model']['ImageChannel']