Digital-HTC-Architecture

This is a simple program for OCR and Machine Translation of PDF documents.

The backend uses Imagemagick for image processing, Tesseract for OCR, and DeepL for translation. The frontend is powered by Streamlit.

Example

Original Document OCR text in original language Translated text

Instructions

DeepL is used for translation, and an authentication key is needed to access their API. You can get a key for free, but registration is required.

macOS

Strictly speaking, you need macOS Catalina (10.15) or higher with a 64-bit processor. However, I've tested that this works on macOS Mojave (10.14).

To start, one option is to clone this repo. Open a terminal and run:

git clone https://github.com/christopher-w-murphy/Digital-HTC-Architecture.git

Alternatively, one may download the code by clicking the green Code button and then Download ZIP.

In either case, move into the repo directory and run the installer script:

cd Digital-HTC-Architecture/
bash macos_installer.sh

While still in the Digital-HTC-Architecture directory, start the program by running the following command in a terminal:

bash ocr_and_machine_translation.sh

You can now view the Streamlit app in your browser. To stop running the app enter control+c in the terminal.

Docker

Docker users can pull the image from Dockerhub

docker pull murphycw/digital-htc-architecture

Run the image as a container to start the program

docker run -d -p 8501:8501 murphycw/digital-htc-architecture

Note that the Docker image has Tesseract v4 as opposed to v5, and can only OCR English and French documents.

Windows

Programming Historian has instructions for a Windows for an OCR and translation program.

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.github		.github
.streamlit		.streamlit
pages		pages
resources		resources
src		src
tests		tests
.gitignore		.gitignore
0_🦃_Digital_HTC_Architecture.py		0_🦃_Digital_HTC_Architecture.py
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
build_docker.sh		build_docker.sh
macos_installer.sh		macos_installer.sh
ocr_and_machine_translation.sh		ocr_and_machine_translation.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Digital-HTC-Architecture

Example

Instructions

macOS

Docker

Windows

About

Releases

Packages

Languages

License

christopher-w-murphy/Digital-HTC-Architecture

Folders and files

Latest commit

History

Repository files navigation

Digital-HTC-Architecture

Example

Instructions

macOS

Docker

Windows

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages