-
Notifications
You must be signed in to change notification settings - Fork 31
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add documentation in jupyter-book * Add Trademark Notice
- Loading branch information
Jianjie Liu
authored
Jul 19, 2021
1 parent
6180948
commit 0e982f2
Showing
40 changed files
with
1,497 additions
and
327 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,7 +2,7 @@ | |
|
||
[![Build Status](https://dev.azure.com/genalog-dev/genalog/_apis/build/status/Nightly-Build?branchName=main)](https://dev.azure.com/genalog-dev/genalog/_build/latest?definitionId=4&branchName=main) ![Azure DevOps tests (compact)](https://img.shields.io/azure-devops/tests/genalog-dev/genalog/4?compact_message) ![Azure DevOps coverage (main)](https://img.shields.io/azure-devops/coverage/genalog-dev/genalog/4/main) ![Python Versions](https://img.shields.io/badge/py-3.6%20%7C%203.7%20%7C%203.8%20-blue) ![Supported OSs](https://img.shields.io/badge/platform-%20linux--64%20-red) ![MIT license](https://img.shields.io/badge/License-MIT-blue.svg) | ||
|
||
Genalog is an open source, cross-platform python package allowing to generate synthetic document images with text data. Tool also allows you to add various text degradations to these images. The purpose of this tool is to provide a fast and efficient way to generate synthetic documents from text data by leveraging layout from templates that you create in simple HTML format. | ||
`Genalog` is an open source, cross-platform python package for **gen**erating document images with synthetic noise that mimics scanned an**alog** documents (thus the name `genalog`). You can also add various text degradations to these images. The purpose of this tool is to provide a fast and efficient way to generate synthetic documents from text data by leveraging layout from templates that you create in simple HTML format. | ||
|
||
Overview | ||
------------------------------------- | ||
|
@@ -85,16 +85,23 @@ If you are running on Windows, MacOS, or other Linux distributions, please see [ | |
|
||
Repo Structure | ||
------------------- | ||
Tools-Synthetic-Data-Generator | ||
genalog | ||
├────genalog | ||
│ ├─── generation # generate text images | ||
│ ├──── degradation # methods for image degradation | ||
│ ├──── ocr # running the Azure Search Pipeline | ||
│ └──── text # methods to Align OCR Output Text with Input Text | ||
├────examples # Example Jupyter Notebooks for Various Synthetic Data Generation Scenarios | ||
├────tests # PyTest files | ||
├────README.md # Main Readme file | ||
└────LICENSE # License file | ||
│ └──── text # methods to Align OCR Output Text with | ||
├────devops # CI/CD pipelines | ||
├────docs # containing online documentaions | ||
├────examples # example Jupyter Notebooks for Various | ||
├────tests # tests | ||
├────tox.ini # CI orchestration and configurations | ||
├────README.md | ||
└────LICENSE | ||
|
||
Trademark Notice | ||
-------------------- | ||
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies. | ||
|
||
Microsoft Open Source Code of Conduct | ||
------------------------------------- | ||
|
@@ -118,7 +125,6 @@ For more information see the [Code of Conduct FAQ](https://opensource.microsoft. | |
or contact [[email protected]](mailto:[email protected]) with any additional questions or comments. | ||
|
||
|
||
|
||
Collaborators | ||
------------------------------------- | ||
Genalog was originally developed by the [MAIDAP team at Microsoft Cambridge NERD](http://www.microsoftnewengland.com/nerd-ai/) in association with the Text Analytics Team in Redmond. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,3 @@ | ||
_build/ | ||
_static/ | ||
_templates/ | ||
**/example.txt | ||
**/_build | ||
**/data |
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
title : <h1 style="font-size:2em;text-align:center;color:#FF5733">Genalog</h1> | ||
author: Jianjie Liu and Amit Gupte | ||
# logo: 'qe-logo-large.png' | ||
|
||
# Short description about the book | ||
description: >- | ||
Guide for end-to-end synthetic analog document generation | ||
execute: | ||
execute_notebooks : off | ||
|
||
# Interact link settings | ||
notebook_interface : "notebook" | ||
|
||
# Launch button settings | ||
repository: | ||
url : https://github.com/microsoft/genalog | ||
path_to_book : /docs/genalog_docs | ||
branch : main | ||
|
||
launch_buttons: | ||
notebook_interface : classic | ||
|
||
# HTML-specific settings | ||
html: | ||
home_page_in_navbar : false | ||
use_repository_button : true | ||
|
||
# # LaTeX settings | ||
# bibtex_bibfiles: | ||
# - _bibliography/references.bib | ||
# latex: | ||
# latex_engine : "xelatex" | ||
# latex_documents: | ||
# targetname: book.tex | ||
|
||
sphinx: | ||
extra_extensions: | ||
- sphinx_inline_tabs | ||
- sphinx.ext.autodoc | ||
- sphinx.ext.napoleon | ||
- sphinx.ext.viewcode | ||
config: | ||
napoleon_google_docstring: True | ||
autodoc_member_order: groupwise | ||
autoclass_content: both |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
root: index | ||
format: jb-book | ||
defaults: | ||
numbered: false | ||
parts: | ||
- caption: Getting Started | ||
chapters: | ||
- file: installation | ||
- file: generation_pipeline | ||
- file: e2e_dataset_pipeline | ||
- caption: Fabricating Document & Noise | ||
chapters: | ||
- file: doc_generation | ||
- file: doc_degradation | ||
- caption: Handling Noisy Text | ||
chapters: | ||
- file: text_alignment | ||
- file: ocr_label_propagation | ||
- caption: API Documentation | ||
chapters: | ||
- file: docstring/genalog.degradation | ||
- file: docstring/genalog.generation | ||
- file: docstring/genalog.ocr | ||
- file: docstring/genalog.text |
Oops, something went wrong.