Skip to content
forked from J-Fo-S/bardDraw

Screen Painting/Image2Speech and further extensions with Bard/Gemini coupled with Google Quickdraw and OpenCV

Notifications You must be signed in to change notification settings

mrname5/bardDraw

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Draw and Tell with Gemini

This repo is a dodgy experimental prototype for Google BardGemini, Google Quickdraw, OpenCV, speech recognition and Python. The idea is that a person/child draws a doodle with their fingers using OpenCV. Gemini tries to guess what the doodle is. Google Quickdraw is used to generate more doodles of what the drawing might be. The person/child can tell Gemini if the guess is correct or wrong via one of the following modes: keyboard input, microphone/speech input, or hand sign. If asked, Gemini will tell a story about the drawings. The end.

Sort of. The other purpose of this prototype is to try and guide Gemini to generate the most helpful and appropriate content for the user. A primitive attempt at this is to include text files that give explicit instructions and context to Gemini. Users may easily modify these but be sure to keep the same formatting.

Requirements

It is recommended to setup a Conda or venv and run pip install -r requirements.txt in the command line with the environment active.

Also, you will need to install Firefox and have a wifi connection. You may need to open a Gemini session in Firefox first so the program can access the cookies. However, in my use this only needs to be done once - afterwards the cookies can be accessed even if the Firefox app is closed.

This repo includes about 10 categories of Google Quickdraw data. If you want all 345 categories, download the bin files here and update the doodle_list.txt file accordingly

Running the code, drawing and interacting with Gemini

python main.py will run the program with default mode of keyboard input. --mode speech and --mode sign are the other options. Once running, raise your right index finger to draw a thin line, and right index and middle finger to draw a thicker line. Right index, middle and ring finger will erase the drawing. In keyboard mode, type in the instructions (see the .txt files with "terms" in the title) you want Gemini to perform. Default .txt terms are "guess" to guess the drawing and "wrong" "try" "again" to ask Gemini to try again. Gemini is annoying and talks too much about itself, so the "instruction" .txt files are designed to try and task Gemini. Sometimes it sort of works.

Speech mode is the same as keyboard mode, except you will need to press "s" to send the speech to Gemini. It should automatically record when clear speech is detected. Sign mode is more limited. Raise your left index finger to have Gemini guess about the drawing. Raise your left index and middle finger to have Gemini guess again. Raise your left index, middle and ring finger to have Gemini tell a story about the drawing (what Gemini guessed).

The other args --smooth and --mode_len control the drawing and number of frames to compute gesture recognition. Smooth must be between 0 and 1 while mode_len can be 1 through any number, but keep in mind we probably have 30 or so frames a second, so 30 might be a reasonable upper limit.

About

Screen Painting/Image2Speech and further extensions with Bard/Gemini coupled with Google Quickdraw and OpenCV

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%