-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
447adad
commit bef8212
Showing
1 changed file
with
13 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
# Gemini Voice Chat Demo | ||
|
||
A low-latency bidirectional streaming pure-python application that enables voice conversations with Google's Gemini model using Gradio. This demo showcases real-time audio streaming capabilities, allowing natural conversations where you can speak with Gemini and receive audio responses back. | ||
A low-latency bidirectional streaming pure-python application that enables voice conversations with Gemini using Gradio. This demo showcases real-time audio streaming capabilities, allowing natural conversations where you can speak with Gemini and receive audio responses back. | ||
|
||
## Features | ||
|
||
|
@@ -15,6 +15,7 @@ A low-latency bidirectional streaming pure-python application that enables voice | |
Before running the application, you need to: | ||
|
||
1. Enable Vertex AI in your Google Cloud Project | ||
|
||
- Visit [Enable Vertex AI](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com) | ||
|
||
2. Set up authentication | ||
|
@@ -23,31 +24,36 @@ Before running the application, you need to: | |
## Installation | ||
|
||
1. Clone this repository: | ||
|
||
```bash | ||
git clone [email protected]:GoogleCloudPlatform/generative-ai.git | ||
cd generative-ai/gemini/multimodal-live-api/gradio-voice | ||
``` | ||
|
||
2. Install the required dependencies: | ||
|
||
```bash | ||
pip install -r requirements.txt | ||
``` | ||
|
||
The main dependencies are: | ||
|
||
- gradio_webrtc>=0.0.27,<1.0 | ||
- librosa | ||
- google-genai==0.3.0 | ||
|
||
## Usage | ||
|
||
1. Run the application: | ||
|
||
```bash | ||
python app.py | ||
``` | ||
|
||
2. Open your web browser and navigate to the local Gradio interface (typically http://localhost:7860) | ||
|
||
3. Configure the application: | ||
|
||
- Enter your Google Cloud Project ID | ||
- Select your preferred region (default: us-central1) | ||
- Choose a voice for Gemini (options: Puck, Charon, Kore, Fenrir, Aoede) | ||
|
@@ -60,7 +66,9 @@ python app.py | |
## Configuration Options | ||
|
||
### Regions | ||
|
||
The application supports any Google Cloud regions, with presets for: | ||
|
||
- us-central1 (default) | ||
- us-east5 | ||
- us-south1 | ||
|
@@ -72,7 +80,9 @@ The application supports any Google Cloud regions, with presets for: | |
Additional locations can be found in the [Vertex AI documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/locations#united-states). | ||
|
||
### Voice Options | ||
|
||
Gemini can respond using different voice personalities: | ||
|
||
- Puck (default) | ||
- Charon | ||
- Kore | ||
|
@@ -82,12 +92,14 @@ Gemini can respond using different voice personalities: | |
## Technical Details | ||
|
||
The application uses: | ||
|
||
- Gradio for the web interface | ||
- WebRTC for real-time audio streaming | ||
- Google Gen AI SDK for Gemini integration | ||
- Vertex AI for model hosting and inference | ||
|
||
The audio streaming implementation uses: | ||
|
||
- 16kHz input sample rate | ||
- 24kHz output sample rate | ||
- PCM audio format | ||
|