GitHub - mrname5/bardDraw: Screen Painting/Image2Speech and further extensions with Bard/Gemini coupled with Google Quickdraw and OpenCV

Draw and Tell with Gemini

This repo is a dodgy experimental prototype for Google ~~Bard~~Gemini, Google Quickdraw, OpenCV, speech recognition and Python. The idea is that a person/child draws a doodle with their fingers using OpenCV. Gemini tries to guess what the doodle is. Google Quickdraw is used to generate more doodles of what the drawing might be. The person/child can tell Gemini if the guess is correct or wrong via one of the following modes: keyboard input, microphone/speech input, or hand sign. If asked, Gemini will tell a story about the drawings. The end.

Sort of. The other purpose of this prototype is to try and guide Gemini to generate the most helpful and appropriate content for the user. A primitive attempt at this is to include text files that give explicit instructions and context to Gemini. Users may easily modify these but be sure to keep the same formatting.

Requirements

It is recommended to setup a Conda or venv and run pip install -r requirements.txt in the command line with the environment active.

Also, you will need to install Firefox and have a wifi connection. You may need to open a Gemini session in Firefox first so the program can access the cookies. However, in my use this only needs to be done once - afterwards the cookies can be accessed even if the Firefox app is closed.

This repo includes about 10 categories of Google Quickdraw data. If you want all 345 categories, download the bin files here and update the doodle_list.txt file accordingly

Running the code, drawing and interacting with Gemini

python main.py will run the program with default mode of keyboard input. --mode speech and --mode sign are the other options. Once running, raise your right index finger to draw a thin line, and right index and middle finger to draw a thicker line. Right index, middle and ring finger will erase the drawing. In keyboard mode, type in the instructions (see the .txt files with "terms" in the title) you want Gemini to perform. Default .txt terms are "guess" to guess the drawing and "wrong" "try" "again" to ask Gemini to try again. Gemini is annoying and talks too much about itself, so the "instruction" .txt files are designed to try and task Gemini. Sometimes it sort of works.

Speech mode is the same as keyboard mode, except you will need to press "s" to send the speech to Gemini. It should automatically record when clear speech is detected. Sign mode is more limited. Raise your left index finger to have Gemini guess about the drawing. Raise your left index and middle finger to have Gemini guess again. Raise your left index, middle and ring finger to have Gemini tell a story about the drawing (what Gemini guessed).

The other args --smooth and --mode_len control the drawing and number of frames to compute gesture recognition. Smooth must be between 0 and 1 while mode_len can be 1 through any number, but keep in mind we probably have 30 or so frames a second, so 30 might be a reasonable upper limit.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.quickdrawcache		.quickdrawcache
__pycache__		__pycache__
README.md		README.md
cv2draw.py		cv2draw.py
doodle_extension_instruction.txt		doodle_extension_instruction.txt
doodle_extension_terms.txt		doodle_extension_terms.txt
doodle_guess_again_instruction.txt		doodle_guess_again_instruction.txt
doodle_guess_again_terms.txt		doodle_guess_again_terms.txt
doodle_guess_instruction.txt		doodle_guess_instruction.txt
doodle_guess_terms.txt		doodle_guess_terms.txt
doodle_list.txt		doodle_list.txt
doodle_recap_instruction.txt		doodle_recap_instruction.txt
main.py		main.py
quickdrawim.py		quickdrawim.py
requirements.txt		requirements.txt
t2s.py		t2s.py
user_info.txt		user_info.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Draw and Tell with Gemini

Requirements

Running the code, drawing and interacting with Gemini

About

Releases

Packages

Languages

mrname5/bardDraw

Folders and files

Latest commit

History

Repository files navigation

Draw and Tell with Gemini

Requirements

Running the code, drawing and interacting with Gemini

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages